### Building a RAG System with LangChain and ChromaDB
#### Introduction
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of large language models with external knowledge retrieval. This notebook will walk you through building a complete RAG system using:

- LangChain: A framework for developing applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- OpenAI: For embeddings and language model (you can substitute with other providers)

In [13]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [14]:
## langchain imports
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
#from langchain.schema  import Document

## Vector Store imports
from langchain_community.vectorstores import Chroma
import numpy

# Utils imports
import numpy as np
from typing import List

In [15]:
# RAG Architecture Overview
print("""
RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge
""")


RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge



In [19]:
## create sample documents
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning has revolutionized fields like computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]

sample_docs


['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    ',
 '\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective f

In [21]:
## save sample documents to files

import tempfile
temp_dir = tempfile.mkdtemp()

for i, doc in enumerate(sample_docs):
    with open(os.path.join(temp_dir, f"doc_{i+1}.txt"), "w") as f:
        f.write(doc.strip())

print(f"Sample documents saved to {temp_dir}")

Sample documents saved to /tmp/tmpd8zxwm2s


### 2. Document Loading

In [22]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader

# Load documents from directory
loader = DirectoryLoader(
    temp_dir,
    glob="*.txt",
    loader_cls=TextLoader,
    loader_kwargs={"encoding": "utf8"},
)

documents = loader.load()
print(f"Loaded {len(documents)} documents.")
print(f"\nFirst document preview:")
print(documents[0].page_content[:200] + "...")

Loaded 3 documents.

First document preview:
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...


In [23]:
documents

[Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at sequential data processing.'),
 Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_3.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relie

### Document Splitting

In [24]:
# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split documents into chunks
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks from {len(documents)} documents")

for i, chunk in enumerate(chunks):
    print(f"Content of chunk {i+1}: {chunk.page_content[:200]}...")
    print(f"Metadata of chunk {i+1}: {chunk.metadata}")
    print("-----")

Created 9 chunks from 3 documents
Content of chunk 1: Deep Learning and Neural Networks...
Metadata of chunk 1: {'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}
-----
Content of chunk 2: Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...
Metadata of chunk 2: {'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}
-----
Content of chunk 3: processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at seq...
Metadata of chunk 3: {'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}
-----
Content of chunk 4: Natural Language Processing (NLP)...
Metadata of chunk 4: {'source': '/tmp/tmpd8zxwm2s/doc_3.txt'}
-----
Content of chunk 5: NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP

### Embedding Models

In [25]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

### Intilialize the ChromaDB Vector Store And Stores the chunks in Vector Representation

In [29]:
## Create a Chromdb vector store
persistent_directory = "./chroma_db_two"

## Initialize Chromadb with Open AI embeddings
embedding_function = OpenAIEmbeddings(
     model="text-embedding-3-small"
)

vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_function,
    persist_directory=persistent_directory,
    collection_name="rag_example"
)

print("Vector store created and persisted to disk.")
print(f"Number of vectors in store: {vector_store._collection.count()}")

Vector store created and persisted to disk.
Number of vectors in store: 9


### Test Similarity Search

In [30]:
query = "What are the types of machine learning?"
similar_docs = vector_store.similarity_search(query, k=2)
similar_docs

[Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_1.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement'),
 Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_1.txt'}, page_content='Machine Learning Fundamentals')]

In [31]:
query = "What is NLP?"
similar_docs = vector_store.similarity_search(query, k=2)
similar_docs

[Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_3.txt'}, page_content='Natural Language Processing (NLP)'),
 Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_3.txt'}, page_content='NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer')]

In [32]:
query = "What is Deep Learning?"
similar_docs = vector_store.similarity_search(query, k=2)
similar_docs

[Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}, page_content='Deep Learning and Neural Networks'),
 Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language')]

In [33]:
print(f"Query: {query}\n")
print(f"Top similar documents retrieved: {len(similar_docs)}:\n")

for i, doc in enumerate(similar_docs):
    print(f"Document {i+1} metadata:\n{doc.metadata}\n")
    print(f"Document {i+1} content:\n{doc.page_content[:50]}\n")
    

Query: What is Deep Learning?

Top similar documents retrieved: 2:

Document 1 metadata:
{'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}

Document 1 content:
Deep Learning and Neural Networks

Document 2 metadata:
{'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}

Document 2 content:
Deep learning is a subset of machine learning base



### Advanced Similarity Search With Scores

In [34]:
results_score = vector_store.similarity_search_with_score(query, k=2)
results_score

[(Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}, page_content='Deep Learning and Neural Networks'),
  0.5935789346694946),
 (Document(metadata={'source': '/tmp/tmpd8zxwm2s/doc_2.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language'),
  0.6173093318939209)]

#### Understanding Similarity Scores
The similarity score represents how closely related a document chunk is to your query. The scoring depends on the distance metric used:

ChromaDB default: Uses L2 distance (Euclidean distance)

- Lower scores = MORE similar (closer in vector space)
- Score of 0 = identical vectors
- Typical range: 0 to 2 (but can be higher)


Cosine similarity (if configured):

- Higher scores = MORE similar
- Range: -1 to 1 (1 being identical)

#### Initialize LLM, RAG Chain, Prompt Template,Query the RAG system

In [35]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4.1-nano-2025-04-14")
test_response = llm.invoke("What is Large Language Models?")
test_response

AIMessage(content="Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand, generate, and interpret human language. They are built using deep learning techniques, particularly neural networks with many layers, and are trained on vast amounts of textual data from books, articles, websites, and other sources. \n\nThese models learn statistical patterns and relationships within language, enabling them to perform a variety of tasks such as:\n\n- Text generation (e.g., writing essays, stories, or code)\n- Language translation\n- Summarization\n- Question answering\n- Sentiment analysis\n\nPopular examples of large language models include OpenAI's GPT series (like GPT-3 and GPT-4), Google's Bard, and Meta's LLaMA. Due to their size and extensive training data, they can produce coherent and contextually relevant responses, making them powerful tools for natural language understanding and automation.", additional_kwargs={'refusal': None}, response_metad

In [36]:
# generic model class

from langchain.chat_models.base import init_chat_model
llm = init_chat_model("openai:gpt-4.1")
llm

ChatOpenAI(profile={'max_input_tokens': 1047576, 'max_output_tokens': 32768, 'image_inputs': True, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True, 'structured_output': True, 'image_url_inputs': True, 'pdf_inputs': True, 'pdf_tool_message': True, 'image_tool_message': True, 'tool_choice': True}, client=<openai.resources.chat.completions.completions.Completions object at 0x751d916574d0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x751d73702390>, root_client=<openai.OpenAI object at 0x751d73703cb0>, root_async_client=<openai.AsyncOpenAI object at 0x751d91656120>, model_name='gpt-4.1', model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)

In [37]:
llm.invoke("Explain the concept of Retrieval-Augmented Generation (RAG) in AI.")

AIMessage(content='**Retrieval-Augmented Generation (RAG)** is an AI architecture that combines *retrieval* and *generation* to produce more informed and accurate responses, especially in tasks like open-domain question answering.\n\n### Core Concepts\n\n1. **Retriever:**  \nRAG uses a retriever module (often based on dense vector search with models like DPR, or embeddings from BERT-like models) to fetch relevant documents/passages from a large external collection (knowledge base, Wikipedia, etc.) in response to the input query.\n\n2. **Generator:**  \nA generator module (usually a sequence-to-sequence language model, like BART or T5) fuses the retrieved documents with the query to generate an answer or a response.\n\n### How RAG Works (Simplified Steps)\n\n1. **Query:**  \nInput question or prompt is given.\n\n2. **Retrieve:**  \nThe retriever searches a large corpus to find top-K most relevant passages based on semantic similarity to the query.\n\n3. **Augment:**  \nThe retrieved pas

### Modern RAG Chain

In [38]:
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

In [39]:
## Convert vector store to retriever

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2}
)

retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x751d91965970>, search_kwargs={'k': 2})

In [40]:
## create prompt template
system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

##### What is create_stuff_documents_chain?
create_stuff_documents_chain creates a chain that "stuffs" (inserts) all retrieved documents into a single prompt and sends it to the LLM. It's called "stuff" because it literally stuffs all the documents into the context window at once.

In [41]:
### Create a document chain
document_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=prompt
)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOpenAI(profile={'max_input_tokens': 1047576, 'max_output_tokens': 32768, 'image_inputs': True, 'audio_inputs': False, 'video_inp

This chain:

- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response

#### What is create_retrieval_chain?
create_retrieval_chain is a function that combines a retriever (which fetches relevant documents) with a document chain (which processes those documents with an LLM) to create a complete RAG pipeline.

In [None]:
### Create The Final RAG Chain
rag_chain = create_retrieval_chain(
    retriever,
    document_chain
)

rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x74673469daf0>, search_kwargs={'k': 2}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you do

In [None]:
response = rag_chain.invoke({"input": "What are the types of machine learning?"})
response

{'input': 'What are the types of machine learning?',
 'context': [Document(metadata={'source': '/tmp/tmp8jfmezso/doc_1.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement'),
  Document(metadata={'source': '/tmp/tmp8jfmezso/doc_1.txt'}, page_content='Machine Learning Fundamentals')],
 'answer': 'The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, unsupervised learning finds patterns in unlabeled data, and reinforcement learning teaches agents to make decisions through trial and error. These categories cover the fundamental approaches to machine learning.'}

In [None]:
response['answer']

'The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, unsupervised learning finds patterns in unlabeled data, and reinforcement learning teaches agents to make decisions through trial and error. These categories cover the fundamental approaches to machine learning.'

In [None]:
# Function to query the modern RAG system

def query_rag_system(question: str):
    print(f"Querying RAG system with question: {question}")
    print("-" * 50)
    
    # Using create_retrieval_chain approach
    result = rag_chain.invoke({"input": question})
    print(f"Answer: {result['answer']}")
    print("\nRetrieved Context:")
    for i, doc in enumerate(result.get('context', [])):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200]+"...")
    
    return result


# Test queries
test_questions = [
    "What are the three types of machine learning?",
    "What is deep learning and how does it relate to neural networks?",
    "What are CNNs best used for?"
]

for question in test_questions:
    query_rag_system(question)
    print("\n" + "="*80 + "\n")


Querying RAG system with question: What are the three types of machine learning?
--------------------------------------------------
Answer: The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning.

Retrieved Context:

--- Source 1 ---
Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine l...

--- Source 2 ---
Machine Learning Fundamentals...


Querying RAG system with question: What is deep learning and how does it relate to neural networks?
--------------------------------------------------
Answer: Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to process complex data. Neural networks, inspired by the human brain, are the core architecture behind deep learning. Deep learning leverages these layered networks to achieve advanced

Good ‚Äî this is **core ‚ÄúTraditional RAG‚Äù code**, and you already have all the prerequisites.
I‚Äôll explain it **line by line**, but **concept-first**, not just syntax.

I‚Äôll also **pause after each block** and ask you to confirm understanding (study-mode rule).

---

## 0Ô∏è‚É£ What problem this code solves (1-line summary)

> It builds a **RAG pipeline** that:
>
> 1. retrieves relevant chunks from a vector store
> 2. injects them into a prompt
> 3. asks an LLM to answer using only that context

Keep this pipeline in mind:

```
User Question ‚Üí Retriever ‚Üí Context ‚Üí Prompt ‚Üí LLM ‚Üí Answer
```

---

## 1Ô∏è‚É£ Imports ‚Äî WHAT capabilities are we bringing in?

```python
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
```

### Conceptual meaning (important)

| Import                         | What it represents                                     |
| ------------------------------ | ------------------------------------------------------ |
| `create_retrieval_chain`       | Orchestrator that connects **retrieval + generation**  |
| `ChatPromptTemplate`           | Structured prompt (system + human messages)            |
| `create_stuff_documents_chain` | Combines retrieved documents ‚Üí stuffs them into prompt |

üìå **Key idea**
LangChain already knows common RAG patterns ‚Äî these helpers **assemble the pipeline for you**.

---

### ‚úÖ Checkpoint 1

Can you tell me:

> What two big steps does `create_retrieval_chain` connect?

(Answer in one line before moving on.)

---

## 2Ô∏è‚É£ Converting vector store ‚Üí retriever (VERY IMPORTANT)

```python
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2}
)
```

### What is happening conceptually?

Your `vector_store` (Chroma, FAISS, etc.) **cannot be used directly** in RAG.

LangChain expects a **Retriever interface**.

So this line:

```python
vector_store.as_retriever()
```

wraps your vector store into something that can:

* accept a **query**
* return **relevant documents**

---

### Parameters explained

#### `search_type="similarity"`

Means:

> ‚ÄúUse vector similarity (cosine / L2) to find nearest chunks‚Äù

Other options exist (later topics):

* `mmr`
* `similarity_score_threshold`

---

#### `search_kwargs={"k": 2}`

Means:

> ‚ÄúReturn top **2** most similar chunks‚Äù

This directly affects:

* context length
* hallucination risk
* answer quality

üìå **RAG rule of thumb**
More `k` ‚â† better answers
It often adds noise.

---

### ‚úÖ Checkpoint 2

If `k=2`, how many document chunks can the LLM *see* at maximum?

---

## 3Ô∏è‚É£ Creating the SYSTEM PROMPT (the LLM‚Äôs behavior contract)

```python
system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""
```

### Why this is critical in RAG

This prompt:

* **binds the LLM to retrieved data**
* explicitly discourages hallucination
* limits verbosity

The key placeholder:

```text
{context}
```

This is where retrieved chunks will be **injected automatically**.

üìå Without `{context}`, this is **NOT RAG** ‚Äî it becomes plain prompting.

---

### ‚úÖ Checkpoint 3

Why do we explicitly tell the LLM:

> ‚ÄúIf you don‚Äôt know, say you don‚Äôt know‚Äù?

(Think RAG failure modes.)

---

## 4Ô∏è‚É£ ChatPromptTemplate ‚Äî structuring messages

```python
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])
```

### What this creates internally

It builds a **chat-style prompt**:

```
System: instructions + context
Human: user question
```

* `{input}` ‚Üí user‚Äôs query
* `{context}` ‚Üí retrieved chunks (filled later)

üìå This separation matters because:

* system message controls behavior
* human message contains the question

---

### Mental model

This prompt does **NOT** execute yet.
It‚Äôs just a **template** waiting for:

* context
* input

---

### ‚úÖ Checkpoint 4

At this point, does the prompt contain any retrieved data yet?
(Yes / No ‚Äî and why?)

---

## 5Ô∏è‚É£ create_stuff_documents_chain ‚Äî stuffing context into the prompt

```python
document_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=prompt
)
```

### What ‚Äústuff‚Äù really means

> Take **all retrieved documents**, concatenate them, and **stuff them into `{context}`**

Internally:

1. Retrieved docs ‚Üí list of `Document`
2. Extract `page_content`
3. Join them (usually with `\n\n`)
4. Inject into `{context}`
5. Send final prompt to LLM

üìå This is the **generation half** of RAG.

---

### What this chain does NOT do

‚ùå It does NOT retrieve documents
‚ùå It does NOT embed queries

It only:

> ‚ÄúGiven documents + question ‚Üí produce answer‚Äù

---

### ‚úÖ Checkpoint 5

Which component actually decides *which* documents are relevant:

* retriever
* document_chain
* LLM

(Only one is correct.)

---

## 6Ô∏è‚É£ create_retrieval_chain ‚Äî assembling full RAG

```python
rag_chain = create_retrieval_chain(
    retriever,
    document_chain
)
```

### This is the **final wiring step**

Now the full pipeline exists:

```
User input
 ‚Üí retriever (vector search)
 ‚Üí document_chain (stuff context)
 ‚Üí LLM
 ‚Üí answer
```

You can think of `rag_chain` as:

> A callable object that runs the **entire RAG flow**

---

### Why this abstraction is powerful

* You don‚Äôt manually pass context
* You don‚Äôt manually call retriever
* You don‚Äôt manually format prompts

Everything is orchestrated.

---

### ‚úÖ Checkpoint 6

If you replace `llm` with GROQ or Gemini, does this chain break?
(Why / why not?)

---

## 7Ô∏è‚É£ Invoking the RAG chain (runtime execution)

```python
response = rag_chain.invoke({"input": "What are the types of machine learning?"})
response
```

### What happens step-by-step (very important)

1Ô∏è‚É£ `"input"` is embedded
2Ô∏è‚É£ Retriever finds top-2 chunks
3Ô∏è‚É£ Chunks ‚Üí `{context}`
4Ô∏è‚É£ Prompt is formed
5Ô∏è‚É£ LLM generates answer

---

### What `response` contains

Usually:

```python
{
  "input": "...",
  "context": [Document, Document],
  "answer": "..."
}
```

This is **debug-friendly**:

* you can inspect retrieved chunks
* verify hallucinations
* tune `k`

---

## üîÅ Ultra-short recap (memorize this)

* `as_retriever()` ‚Üí makes vector store usable
* `ChatPromptTemplate` ‚Üí defines RAG prompt
* `create_stuff_documents_chain` ‚Üí stuffs context
* `create_retrieval_chain` ‚Üí full RAG pipeline
* `.invoke()` ‚Üí executes everything

---

## Final question (answer before we proceed)

üëâ **Why is this called ‚ÄúTraditional RAG‚Äù and not ‚ÄúConversational RAG‚Äù?**

(One sentence is enough. This unlocks the next topic.)


### Create RAG Chain Alternative - Using LCEL (LangChain Expression Language)

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables   import RunnablePassthrough, RunnableParallel
from langchain_classic.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

In [None]:
# Create a custom prompt
custom_prompt = ChatPromptTemplate.from_template("""Use the following context to answer the question. 
If you don't know the answer based on the context, say you don't know.
Provide specific details from the context to support your answer.

Context:
{context}

Question: {question}

Answer:""")
custom_prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [42]:
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x751d91965970>, search_kwargs={'k': 2})

In [43]:
## Format the output documents for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [45]:
rag_chain_rcel = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | custom_prompt
    | llm
    | StrOutputParser()
)

rag_chain_rcel.invoke("What are the three types of machine learning?")

'The three types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. This is supported by the context, which states: "There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement."'