# RAG (Retrieval Augmented Generation) Patterns with Amazon Nova

This notebook demonstrates how to build RAG systems using Amazon Nova.

## Setup

In [None]:
%env NOVA_API_KEY=<YOUR-API-KEY>
%env NOVA_BASE_URL=https://api.nova.amazon.com/v1/

In [2]:
from langchain_amazon_nova import ChatAmazonNova
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Initialize the model
llm = ChatAmazonNova(model="nova-pro-v1", temperature=0)

## 1. Create Sample Documents

First, let's create a knowledge base about LangChain.

In [3]:
documents = [
    Document(
        page_content="LangChain is a framework for developing applications powered by language models. It provides tools for prompt management, chains, and agents.",
        metadata={"source": "intro", "page": 1},
    ),
    Document(
        page_content="LCEL (LangChain Expression Language) is a declarative way to compose chains. It uses the pipe operator to connect components.",
        metadata={"source": "lcel", "page": 2},
    ),
    Document(
        page_content="LangChain supports multiple model providers including OpenAI, Anthropic, and others. It provides a unified interface.",
        metadata={"source": "providers", "page": 3},
    ),
    Document(
        page_content="Retrieval Augmented Generation (RAG) combines retrieval with generation. Documents are retrieved and used as context.",
        metadata={"source": "rag", "page": 4},
    ),
]

print(f"Created {len(documents)} documents")
for doc in documents:
    print(f"  - {doc.metadata['source']}: {len(doc.page_content)} chars")

Created 4 documents
  - intro: 141 chars
  - lcel: 125 chars
  - providers: 117 chars
  - rag: 117 chars


## 2. Simple Keyword-Based Retrieval

Basic retrieval using keyword matching.

In [4]:
def simple_retriever(query: str, docs: list):
    """Simple keyword-based retrieval."""
    query_words = set(query.lower().split())
    relevant = []

    for doc in docs:
        doc_words = set(doc.page_content.lower().split())
        if query_words & doc_words:
            relevant.append(doc)

    return relevant


query = "What is LCEL?"
retrieved = simple_retriever(query, documents)

print(f"Query: '{query}'")
print(f"Retrieved {len(retrieved)} documents:")
for doc in retrieved:
    print(f"  - {doc.metadata['source']}: {doc.page_content[:60]}...")

Query: 'What is LCEL?'
Retrieved 2 documents:
  - intro: LangChain is a framework for developing applications powered...
  - lcel: LCEL (LangChain Expression Language) is a declarative way to...


## 3. Basic RAG Chain

Combine retrieval with generation.

In [5]:
def format_docs(docs):
    """Format documents for context."""
    return "\n\n".join(doc.page_content for doc in docs)


# Create RAG prompt
rag_prompt = ChatPromptTemplate.from_template(
    """Use the following context to answer the question. If the context doesn't contain the answer, say so.

Context:
{context}

Question: {question}

Answer:"""
)

# Retrieve and answer
query = "What is LCEL?"
retrieved_docs = simple_retriever(query, documents)
context = format_docs(retrieved_docs)

chain = rag_prompt | llm | StrOutputParser()
answer = chain.invoke({"context": context, "question": query})

print(f"Question: {query}")
print(f"Answer: {answer}")

Question: What is LCEL?
Answer: LCEL (LangChain Expression Language) is a declarative way to compose chains within the LangChain framework. It utilizes the pipe operator to connect various components, allowing for a more intuitive and streamlined approach to building applications powered by language models.


## 4. RAG Chain with LCEL

Build a more sophisticated RAG chain using runnables.

In [6]:
def retrieve_top_k(query: str, k: int = 2):
    """Retrieve top k documents by keyword scoring."""
    query_words = set(query.lower().split())

    scored_docs = []
    for doc in documents:
        doc_words = set(doc.page_content.lower().split())
        score = len(query_words & doc_words)
        scored_docs.append((score, doc))

    scored_docs.sort(reverse=True, key=lambda x: x[0])
    return [doc for score, doc in scored_docs[:k]]


# RAG chain with runnable
rag_chain = (
    {
        "context": lambda x: format_docs(retrieve_top_k(x["question"])),
        "question": RunnablePassthrough(),
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

result = rag_chain.invoke({"question": "How does LangChain support different models?"})
print(result)

LangChain supports different models by providing a unified interface that allows developers to interact with multiple model providers, including but not limited to, OpenAI and Anthropic. This abstraction enables developers to switch between different models or providers without needing to change the underlying code significantly, thus promoting flexibility and ease of integration.


## 5. Multi-Query RAG

Generate multiple query variations for better retrieval.

In [7]:
# Generate alternative queries
multi_query_prompt = ChatPromptTemplate.from_template(
    "Generate 2 different versions of this question:\n{question}\n\nReturn only the questions, one per line."
)

original_query = "What is LangChain used for?"

queries_result = (multi_query_prompt | llm | StrOutputParser()).invoke(
    {"question": original_query}
)
alternative_queries = [q.strip() for q in queries_result.split("\n") if q.strip()]

print(f"Original: {original_query}")
print(f"Alternatives:")
for q in alternative_queries:
    print(f"  - {q}")

Original: What is LangChain used for?
Alternatives:
  - 1. Can you explain the primary purposes and applications of LangChain in various contexts?
  - 2. How is LangChain utilized in different industries and what specific problems does it solve?


In [8]:
# Retrieve for all queries
all_retrieved = []
for q in [original_query] + alternative_queries[:2]:
    all_retrieved.extend(retrieve_top_k(q, k=1))

# Deduplicate
unique_docs = {doc.metadata["source"]: doc for doc in all_retrieved}.values()

print(f"\nTotal unique documents retrieved: {len(unique_docs)}")
for doc in unique_docs:
    print(f"  - {doc.metadata['source']}")


Total unique documents retrieved: 1
  - intro


In [9]:
# Answer based on combined context
context = format_docs(unique_docs)
answer = (rag_prompt | llm | StrOutputParser()).invoke(
    {"context": context, "question": original_query}
)

print(f"\nAnswer: {answer}")


Answer: LangChain is used for developing applications powered by language models. It provides tools for prompt management, chains, and agents.


## 6. RAG with Source Attribution

Track which documents contributed to the answer.

In [10]:
# Enhanced prompt with source tracking
rag_with_sources_prompt = ChatPromptTemplate.from_template(
    """Use the following numbered context to answer the question. 
Cite sources using [1], [2], etc.

Context:
{numbered_context}

Question: {question}

Answer:"""
)


def format_docs_with_numbers(docs):
    """Format documents with numbers for citation."""
    return "\n\n".join(f"[{i + 1}] {doc.page_content}" for i, doc in enumerate(docs))


query = "What does LCEL stand for?"
retrieved = retrieve_top_k(query, k=2)

chain = rag_with_sources_prompt | llm | StrOutputParser()
answer = chain.invoke(
    {"numbered_context": format_docs_with_numbers(retrieved), "question": query}
)

print(f"Question: {query}")
print(f"Answer: {answer}\n")
print("Sources:")
for i, doc in enumerate(retrieved, 1):
    print(f"[{i}] {doc.metadata['source']} (page {doc.metadata['page']})")

Question: What does LCEL stand for?
Answer: LCEL stands for LangChain Expression Language. It is a declarative way to compose chains, utilizing the pipe operator to connect components [1].

Sources:
[1] lcel (page 2)
[2] intro (page 1)


## 7. Contextual Compression

Filter retrieved documents to only relevant portions.

In [11]:
def extract_relevant_sentences(doc: Document, query: str) -> Document:
    """Extract sentences relevant to the query."""
    query_words = set(query.lower().split())
    sentences = doc.page_content.split(". ")

    relevant_sentences = []
    for sentence in sentences:
        sentence_words = set(sentence.lower().split())
        if query_words & sentence_words:
            relevant_sentences.append(sentence)

    if relevant_sentences:
        return Document(
            page_content=". ".join(relevant_sentences), metadata=doc.metadata
        )
    return doc


query = "What is RAG?"
retrieved = retrieve_top_k(query, k=3)

# Compress documents
compressed = [extract_relevant_sentences(doc, query) for doc in retrieved]

print("Before compression:")
print(f"Total chars: {sum(len(d.page_content) for d in retrieved)}\n")

print("After compression:")
print(f"Total chars: {sum(len(d.page_content) for d in compressed)}\n")

for doc in compressed:
    print(f"{doc.metadata['source']}: {doc.page_content}")

Before compression:
Total chars: 383

After compression:
Total chars: 271

intro: LangChain is a framework for developing applications powered by language models
lcel: LCEL (LangChain Expression Language) is a declarative way to compose chains
providers: LangChain supports multiple model providers including OpenAI, Anthropic, and others. It provides a unified interface.


## 8. RAG with Self-Query

Let the model determine what to retrieve.

In [12]:
# Extract search criteria from natural language
extract_query_prompt = ChatPromptTemplate.from_template(
    "Extract search keywords from this question. Return only keywords, comma-separated:\n{question}"
)

user_question = "Tell me about how chains work in LangChain"

# Extract keywords
keywords = (extract_query_prompt | llm | StrOutputParser()).invoke(
    {"question": user_question}
)

print(f"User question: {user_question}")
print(f"Extracted keywords: {keywords}\n")

# Use keywords for retrieval
retrieved = retrieve_top_k(keywords, k=2)
context = format_docs(retrieved)

# Answer
answer = (rag_prompt | llm | StrOutputParser()).invoke(
    {"context": context, "question": user_question}
)

print(f"Answer: {answer}")

User question: Tell me about how chains work in LangChain
Extracted keywords: chains, work, LangChain

Answer: In LangChain, "chains" are a fundamental component that allows developers to create complex workflows by linking together multiple language model calls and other operations. Here's a detailed explanation of how chains work in LangChain:

### What are Chains?
Chains in LangChain are sequences of steps where each step can be a call to a language model, a utility function, or another chain. They enable the construction of more sophisticated applications by orchestrating multiple tasks in a specific order.

### Key Components of Chains:
1. **Prompts**: These are the inputs that you provide to the language model. Chains often start with a prompt that is sent to a language model to generate an initial response.
   
2. **Language Model Calls**: These are the interactions with the language models (e.g., OpenAI, Anthropic) where the model generates text based on the given prompt.

3. *

## 9. Full RAG Pipeline

Complete RAG system with all components.

In [13]:
class RAGPipeline:
    """Complete RAG pipeline."""

    def __init__(self, llm, documents, top_k=2):
        self.llm = llm
        self.documents = documents
        self.top_k = top_k

        self.prompt = ChatPromptTemplate.from_template(
            "Context:\n{context}\n\nQuestion: {question}\nAnswer:"
        )

    def retrieve(self, query: str):
        """Retrieve relevant documents."""
        return retrieve_top_k(query, k=self.top_k)

    def answer(self, question: str):
        """Answer question using RAG."""
        docs = self.retrieve(question)
        context = format_docs(docs)

        chain = self.prompt | self.llm | StrOutputParser()
        answer = chain.invoke({"context": context, "question": question})

        return {
            "question": question,
            "answer": answer,
            "sources": [doc.metadata for doc in docs],
        }


# Test the pipeline
rag = RAGPipeline(llm, documents)

questions = [
    "What is LangChain?",
    "What does LCEL do?",
    "What providers does LangChain support?",
]

for q in questions:
    result = rag.answer(q)
    print(f"Q: {result['question']}")
    print(f"A: {result['answer']}")
    print(f"Sources: {', '.join(s['source'] for s in result['sources'])}")
    print()

Q: What is LangChain?
A: LangChain is a comprehensive framework designed to facilitate the development of applications that leverage language models. It offers a variety of tools and abstractions to simplify the process of integrating and utilizing language models in various applications. Here are the key components and features of LangChain:

### Key Components of LangChain

1. **Prompt Management:**
   - LangChain provides tools for creating, managing, and optimizing prompts that are used to interact with language models. This includes templated prompts, few-shot examples, and dynamic prompt generation.

2. **Chains:**
   - Chains in LangChain are sequences of steps where the output of one step is the input to the next. This allows for complex workflows to be built by chaining together simpler components. Chains can include tasks like summarization, question-answering, and more.

3. **Agents:**
   - Agents are more advanced constructs that can make decisions based on the output of ch

## Summary

**RAG Pattern Comparison:**

| Pattern | Complexity | Use Case |
|---------|------------|----------|
| Simple Keyword | Low | Small document sets, prototyping |
| Basic RAG Chain | Medium | Most applications |
| Multi-Query RAG | Medium | Better recall, diverse queries |
| Source Attribution | Medium | Transparency, verification |
| Contextual Compression | High | Large documents, token efficiency |
| Self-Query | High | Natural language interfaces |
| Full Pipeline | High | Production systems |

**Key Components:**
1. **Documents**: Structured knowledge with metadata
2. **Retrieval**: Find relevant documents (keyword, semantic, hybrid)
3. **Context Formatting**: Prepare retrieved docs for LLM
4. **Prompt Engineering**: Instruct model to use context
5. **Generation**: LLM produces answer from context

**Best Practices:**
- **Chunk Wisely**: Balance between context and specificity
- **Include Metadata**: Source tracking for attribution
- **Score Retrieval**: Rank by relevance
- **Validate Context**: Ensure retrieved docs are actually relevant
- **Handle Missing Info**: Model should admit when answer isn't in context

**Next Steps:**
For production RAG systems, consider:
- Vector embeddings for semantic search
- Hybrid search (keyword + semantic)
- Reranking retrieved documents
- Caching for performance
- Evaluation metrics