### Building a Rag System with Langchain and ChromaDB

#

In [1]:
"""
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of LLM with external knowlege retrieval.
we will build a complete rag system using langchain, chromaDB, Hugging Face Embeddings
"""

'\nRetrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of LLM with external knowlege retrieval.\nwe will build a complete rag system using langchain, chromaDB, Hugging Face Embeddings\n'

- Loading of the Document
- Document splitter
- Create Chunk
- Embeddings Model
- Vector Store/Database
- Semantic Search
- Context Augmentation : Combine retrived chunk with query
- Response Generation : LLM generates answer using context

Benefits of RAG :
- Reduce Hallucinations
- Provides up-to-date information

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document


# vectore stores
from langchain_community.vectorstores import Chroma


# Utility imports
import numpy as np
from typing import List 


In [None]:
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning has revolutionized fields like computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]



In [21]:
## save sample documents to files
import tempfile
temp_dir=tempfile.mkdtemp()

for i,doc in enumerate(sample_docs):
    with open(f"{temp_dir}/doc_{i}.txt","w") as f:
        f.write(doc)

print(f"Sample document create in : {temp_dir}")

Sample document create in : /var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy


### Document Loading

In [28]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader(
    temp_dir,
    glob ="*.txt",
    loader_cls = TextLoader,
    loader_kwargs = {'encoding': 'utf-8'}
)


documents = loader.load()

print(f"Loaded {len(documents)} documents")
print(f"\nFirst Document Preview")
print(documents[2].page_content[:200]+ "..")



# we created a list of text and then converted that to a text file 
# loaded the data from text file as usual



Loaded 3 documents

First Document Preview

    Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers..


### Chunking : Document Splitting

In [33]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len,
    separators = ["\n\n", "\n", ".", " ", ""]
)

chunks = text_splitter.split_documents(documents)


print(f"Created {len(chunks)} chunks from {len(documents)}")

print(f"Content {chunks[0].page_content[:150]} ...")
print(f"Metadata : {chunks[0].metadata}")


Created 7 chunks from 3
Content Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NL ...
Metadata : {'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_2.txt'}


### Embeddings

In [41]:
from langchain_huggingface import HuggingFaceEmbeddings

HF_embeddings = HuggingFaceEmbeddings(
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
)

In [39]:
chunks

[Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_0.txt'}, page_content='Machine Learning Fundamentals'),
 Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_0.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being 

### Store Embeddings into VectorStores : ChromaDB

In [44]:
# Create a ChromaDB Vector Store
persist_directory = "./chroma_db"

# Initialize ChromaDB with HuggingFace Embeddings
vectorstore = Chroma.from_documents(
    documents = chunks, # it should be an array
    embedding = HF_embeddings, 
    persist_directory = persist_directory, # place where the vector stores are stored
    collection_name = "rag_collection" # name of the vector store
)


print(f"Vector store created with {vectorstore._collection.count()} vectors")
print(f"Persisted to : {persist_directory}")


Vector store created with 14 vectors
Persisted to : ./chroma_db


### Test Similarity Seach

In [45]:
import numpy as np

def cosine_similarity(vector1, vector2):
    dot_product = np.dot(vector1, vector2)
    normal_v1 = np.linalg.norm(vector1)
    normal_v2 = np.linalg.norm(vector2)
    similarity  = dot_product/ (normal_v1* normal_v2)
    return similarity

def semantic_search(query, documents, embeddings_model, top_k=3):
    """ simple semantic search implementation """

    query_embedding = embeddings_model.embed_query(query)
    doc_embedding = embeddings_model.embed_documents(documents)

    # similarity score
    
    similarities = []

    for i, doc_emb in enumerate(doc_embedding) :
        similarity = cosine_similarity(query_embedding, doc_emb)
        similarities.append((similarity, documents[i])) 

    # sort it by similarity
    similarities.sort(reverse = True)
    return similarities[ : top_k]

In [46]:
query = "what is NLP?"
similar_docs = vectorstore.similarity_search(query, k=3)
similar_docs

# When we ues the vector database or stores, the coversion from text to embeddings is automati

[Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Moder

# 2. Understanding the similarity Score

The similarity score represents how closely related a document chunk is to your query. The scoring depends on the distance metric used :

Chroma default : Uses L2 distance(Euclidean distance)

Lower Score = More similar (closer in vector shape)
Score of 0 = identical vectors
Typical range : 0 to 2 (but can be higher)

Cosine Similarity (if configured)

Higher Score = Mores similar
Range : -1 to 1 (being indentical)

### Initalize LLM, RAG Chain, Prompt Template, Query the RAG system

In [68]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv() 

True

In [None]:
# Initialize the LLM (ChatAnthropic)
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-haiku-4-5-20251001",  
    temperature=0.7,
    api_key=os.getenv("claudeAPI") 
)

In [67]:
test_response = llm.invoke("What's LLM Models")
test_response

AIMessage(content='# LLM Models\n\n**LLM** stands for **Large Language Model**. Here\'s what you need to know:\n\n## What They Are\nLLMs are AI systems trained on massive amounts of text data to understand and generate human language. They predict the next word in a sequence based on patterns learned during training.\n\n## Key Characteristics\n- **Large scale**: Billions to trillions of parameters (internal variables)\n- **Neural networks**: Built on deep learning architecture (transformers)\n- **Pre-trained**: Trained on diverse internet text before being fine-tuned\n- **Generative**: Can create new text, not just analyze it\n\n## Common Examples\n- **ChatGPT** (OpenAI)\n- **Claude** (Anthropic)\n- **Gemini** (Google)\n- **Llama** (Meta)\n- **GPT-4** (OpenAI)\n\n## What They Can Do\n✓ Answer questions  \n✓ Write content  \n✓ Translate languages  \n✓ Code programming  \n✓ Summarize text  \n✓ Have conversations  \n\n## Limitations\n✗ Can hallucinate (make up false information)  \n✗ Have

### Modern RAG Chain

- Langchain moved to LCEL(LangChain Expression Language) as the primary way to build chains
- LCEL uses the pipe(|) operator instead of old chain classes
- After langchain 1.0 : langchain.chains doesn't exist
- We will use LCEL to build RAG chains in 1.0+

In [89]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate


In [90]:
# Convert vector store to retriever

retriever = vectorstore.as_retriever(
    search_kwarg = {"k":3} 
)

retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x177587a70>, search_kwargs={})

In [91]:
# Prompt Template


system_prompt = """  
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the questions.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.

Context : {context}
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}")

    ]
)


In [92]:
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="  \nYou are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the questions.\nIf you don't know the answer, just say that you don't know.\nUse three sentences maximum and keep the answer concise.\n\nContext : {context}\n"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

### Creating a Document Chain

- "Create_stuff_documents_chain" combine all the relevant chunks and give llm as a context
- For combining we use document chain

In [93]:
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

# RunnableBinding means, it will going to execute entire Chain one by one 

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="  \nYou are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the questions.\nIf you don't know the answer, just say that you don't know.\nUse three sentences maximum and keep the answer concise.\n\nContext : {context}\n"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatAnthropic(profile={'max_input_tokens': 200000, 'max_output_tokens': 64000, 'image_inputs': True, 'audio_inputs': False, 'vi

This Chain

- Takes retrieved documents
- "Stuffs" them into prompts {context} placeholder
- Send the complete prompt to the LLM
- Return the LLM's response

In [94]:
### Final Rag chain
from langchain.chains import create_retrieval_chain
rag_chain = create_retrieval_chain(retriever, document_chain)
rag_chain

# retriever will fetch the relevant information
# document_chain will process the document with the LLM 

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x177587a70>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="  \nYou are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the questions.\nIf you don

In [95]:
rag_chain.invoke({"input" : "what is deep learning?"})

{'input': 'what is deep learning?',
 'context': [Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
  Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, n

In [None]:
# Function to query the modern RAG system

def query_rag_modern(question) :
    print(f"Question : {question}")
    print("-"*50)


    # using create_retrieval_chain approach
    
    result = rag_chain.invoke({"input" : question})

    print(f"Answer : {result['answer']}")
    print("\n Retrieved Context :")

    for i, doc in enumerate(result['context']):
        print(f"\n -- Source {i+1} ---")
        print(doc.page_content[:200] + "...")
    
    return result

# Test Questions
test_questions = [
    "What are the three types of machine learning",
    "What is deep learning and how does it related to neural networks ?",
    "What are CNNs best used for?"
]

for question in test_questions :
    result = query_rag_modern(question)
    print("\n" + "="*80 + "\n")


Question : What are the three types of machine learning
--------------------------------------------------
Answer : The three types of machine learning are:

1. **Supervised learning** - uses labeled data to train models
2. **Unsupervised learning** - finds patterns in unlabeled data
3. **Reinforcement learning** - learns through interaction and feedback

 Retrieved Context :

 -- Source 1 ---
Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine l...

 -- Source 2 ---
Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine l...

 -- Source 3 ---
Machine Learning Fundamentals...

 -- Source 4 ---
Machine Learning Fundamentals...


Question : What is deep learning and how does it related to neural networ

### Building RAG using LCEL (LangChain Expression Language)


In [99]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

In [101]:
custom_prompt = ChatPromptTemplate.from_template(
    """ 
    Use the following context to answer the question.
    If you don't know the answer based on the context, say you don't know.
    Provide specific detail from the context to support your answer
    Context : {context}

    Question : {question}
    Answer :
    """
)
custom_prompt


ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template=" \n    Use the following context to answer the question.\n    If you don't know the answer based on the context, say you don't know.\n    Provide specific detail from the context to support your answer\n    Context : {context}\n\n    Question : {question}\n    Answer :\n    "), additional_kwargs={})])

### Create RAG Chain Alternative - Using LCEL(Langchain Expression Language)
- Creating your own chain

In [102]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

In [106]:
# Prompt Template

custom_prompt = ChatPromptTemplate.from_template(
    """
    Use the following context to answer the questions. If you don't know the answer based on the context, say you don't know.
    Provide specific details from the context to support your answer

    Context : {context}


    Question : {question}

    Answer :"""
)
custom_prompt


ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="\n    Use the following context to answer the questions. If you don't know the answer based on the context, say you don't know.\n    Provide specific details from the context to support your answer\n\n    Context : {context}\n\n\n    Question : {question}\n\n    Answer :"), additional_kwargs={})])

In [107]:
# Format the output document for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [112]:
# Building the RAG Chain Using LCEL
rag_chain_lcel = ({"context" : retriever | format_docs,
                 "question" : RunnablePassthrough() }
                | custom_prompt
                | llm
                | StrOutputParser()
)

rag_chain_lcel

{
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x177587a70>, search_kwargs={})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="\n    Use the following context to answer the questions. If you don't know the answer based on the context, say you don't know.\n    Provide specific details from the context to support your answer\n\n    Context : {context}\n\n\n    Question : {question}\n\n    Answer :"), additional_kwargs={})])
| ChatAnthropic(profile={'max_input_tokens': 200000, 'max_output_tokens': 64000, 'image_inputs': True, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 

In [113]:
rag_chain_lcel.invoke("what's deep learning")

'# What is Deep Learning?\n\nBased on the context provided:\n\n**Deep learning is a subset of machine learning based on artificial neural networks.** These networks are inspired by the human brain and consist of layers of interconnected nodes.\n\n## Key Characteristics:\n\n- **Foundation**: Built on artificial neural networks inspired by the human brain\n- **Structure**: Organized in layers of interconnected nodes\n- **Impact**: Has revolutionized several fields including:\n  - Computer vision\n  - Natural language processing\n  - Speech recognition\n\n## Common Applications:\n\n- **Convolutional Neural Networks (CNNs)**: Particularly effective for image processing\n- **Recurrent Neural Networks (RNNs)** and **Transformers**: Used for sequential data and language tasks\n\nIn essence, deep learning is a powerful approach within machine learning that uses layered neural networks to learn complex patterns from data.'

In [None]:
#retriever.get_relevant_documents("what's deep learning")


[Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'), Document(metadata={'source': '/var/folders/58/7rk_l2gn3lv73f_nlj98yjgc0000gn/T/tmpuuhmaghy/doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recog

In [None]:
# Query using LCEL approach - Fixed Version

def query_rag_lcel(question):
    print(f"Question : {question}")
    print("-"*50)


    # m1 : Pass String directly when use RunnablePassThrough - directly has to give the string
    answer = rag_chain_lcel.invoke(question)
    print(f"Answer : {answer}")


    # Get source documents seperately if needed
    docs = retriever.invoke(question)
    print("Source Documents : ")

    for i, doc in enumerate(docs):
        print(f"\n----Source{i+1} ---")
        print(doc.page_content[:200] + "...")


In [121]:
query_rag_lcel("what are the key concepts of reinforcement learning")

Question : what are the key concepts of reinforcement learning
--------------------------------------------------
Answer : Based on the provided context, I don't have enough information to fully answer your question about the key concepts of reinforcement learning.

The context mentions that reinforcement learning is one of the three main types of machine learning and that it "learns through," but the explanation is incomplete - it cuts off mid-sentence and doesn't provide the details about how reinforcement learning works or what its key concepts are.

To properly answer your question, I would need a more complete description of reinforcement learning in the context provided.
Source Documents : 

----Source1 ---
Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine l...

----Source2 ---
Machine learning is a subset of artificial intelligence 