## Build a RAG System with Langchain and ChromaDB

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
# Langchain Imports

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document

# vector stores
from langchain_community.vectorstores import Chroma

# Utility imports
import numpy as np
from typing import List

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
sample_docs = [
    """
    Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. The key types of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behaviors. Applications of machine learning span across various industries, including healthcare, finance, marketing, and more. Common algorithms used in machine learning include decision trees, support vector machines, neural networks, and clustering techniques. As the field continues to evolve, advancements in deep learning and neural networks are driving significant progress in areas such as computer vision, natural language processing, and autonomous systems.
    """,

    """
    Natural Language Processing (NLP)

    Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and human language. It involves the development of algorithms and models that enable machines to understand, interpret, and generate human language in a way that is meaningful. NLP encompasses various tasks such as text classification, sentiment analysis, named entity recognition, machine translation, and speech recognition. Techniques used in NLP include tokenization, part-of-speech tagging, syntactic parsing, and semantic analysis. Recent advancements in NLP have been driven by deep learning models such as transformers, which have significantly improved the performance of language models like BERT and GPT. Applications of NLP are widespread, including chatbots, virtual assistants, automated customer support, and content recommendation systems. As NLP continues to advance, it plays a crucial role in enhancing human-computer interaction and enabling more natural communication with technology.
    """,
    """
    Deep Learning and Neural Networks

    Deep learning is a subset of machine learning that focuses on using neural networks with multiple layers to model complex patterns in data. Neural networks are inspired by the structure and function of the human brain, consisting of interconnected nodes (neurons) that process information. Deep learning has revolutionized various fields, including computer vision, natural language processing, and speech recognition, by enabling the development of models that can automatically learn hierarchical representations of data. Key architectures in deep learning include convolutional neural networks (CNNs) for image processing, recurrent neural networks (RNNs) for sequential data, and transformers for language modeling. Training deep learning models typically requires large datasets and significant computational resources, often utilizing GPUs for efficient processing. The success of deep learning has led to breakthroughs in applications such as autonomous vehicles, medical image analysis, and real-time language translation. As research continues, deep learning is expected to drive further innovations in artificial intelligence and its applications across various domains.
    """
]

sample_docs

['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. The key types of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behaviors. Applications of machine learning span across various industries, including healthcare, finance, marketing, and more. Common algorithms used in machine learning include decision trees, support vector machines, neural networks, and clustering techniques. As the field continues to evolve, advancements in deep learning and neural networks 

In [7]:
### save sample documents to files
import tempfile
# temp_dir= tempfile.mkdtemp()

for i, doc in enumerate(sample_docs):
    # with open(os.path.join(temp_dir, f"doc_{i}.txt"), "w") as f:
    #     f.write(doc)
    with open(f"doc_{i}.txt", "w") as f:
        f.write(doc.strip())

print(f"Sample documents saved to root")

Sample documents saved to root


## Document Loading

In [8]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader('.', glob='doc_*.txt', loader_cls=TextLoader, loader_kwargs={'encoding': 'utf-8'})

documents = loader.load()

print(f"Loaded {len(documents)} documents.")
print("First document content:")
print(documents[0].page_content[:200])  # print first 200 characters of the first document

Loaded 3 documents.
First document content:
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various al


In [9]:
documents

[Document(metadata={'source': 'doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. The key types of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behaviors. Applications of machine learning span across various industries, including healthcare, finance, marketing, and more. Common algorithms used in machine learning include decision trees, support vector machines, neural networks, and clustering techniques. As the field continues to evolve, 

## Document Splitting

In [11]:
# Initialize Text Splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=[" "] # Hierarchy of separators
)

chunks = text_splitter.split_documents(documents)

chunks

[Document(metadata={'source': 'doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. The key types of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised'),
 Document(metadata={'source': 'doc_0.txt'}, page_content='labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behaviors. Applications of machine learning span across various industries, including healthcare, finance, marketing, and more. Common algorithms used in machine learning include decision 

## Initialize ChromaDB Vector store and store the chunks

In [12]:
## Create a ChromaDB Vector Store

persist_directory = './chroma_db'

## Initialize ChromaDB with HuggingFace Embeddings
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2"),
    persist_directory=persist_directory,
    collection_name="rag_collection"
)

print(f"Vector store created with {vectorstore._collection.count()} vectors.")
print(f"Vectors persisted in directory: {persist_directory}")

Vector store created with 9 vectors.
Vectors persisted in directory: ./chroma_db


### Test Similarity search

In [14]:
query = "What are types of machine learning?"

similar_docs = vectorstore.similarity_search(query, k=3)

similar_docs

[Document(metadata={'source': 'doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. The key types of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised'),
 Document(metadata={'source': 'doc_0.txt'}, page_content='labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behaviors. Applications of machine learning span across various industries, including healthcare, finance, marketing, and more. Common algorithms used in machine learning include decision 

In [15]:
query = "What is NLP?"

similar_docs = vectorstore.similarity_search(query, k=3)

similar_docs

[Document(metadata={'source': 'doc_1.txt'}, page_content='Natural Language Processing (NLP)\n\n    Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and human language. It involves the development of algorithms and models that enable machines to understand, interpret, and generate human language in a way that is meaningful. NLP encompasses various tasks such as text classification, sentiment analysis, named entity recognition, machine translation, and speech recognition. Techniques'),
 Document(metadata={'source': 'doc_1.txt'}, page_content='systems. As NLP continues to advance, it plays a crucial role in enhancing human-computer interaction and enabling more natural communication with technology.'),
 Document(metadata={'source': 'doc_1.txt'}, page_content='translation, and speech recognition. Techniques used in NLP include tokenization, part-of-speech tagging, syntactic parsing, and semantic analysis. Recent adva

In [16]:
query = "What is deep learning?"

similar_docs = vectorstore.similarity_search(query, k=3)

similar_docs

[Document(metadata={'source': 'doc_2.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning that focuses on using neural networks with multiple layers to model complex patterns in data. Neural networks are inspired by the structure and function of the human brain, consisting of interconnected nodes (neurons) that process information. Deep learning has revolutionized various fields, including computer vision, natural language processing, and speech recognition, by enabling the development of models that'),
 Document(metadata={'source': 'doc_2.txt'}, page_content='by enabling the development of models that can automatically learn hierarchical representations of data. Key architectures in deep learning include convolutional neural networks (CNNs) for image processing, recurrent neural networks (RNNs) for sequential data, and transformers for language modeling. Training deep learning models typically requires large datasets and significa

### Advanced Similarity Search with Scores

In [17]:
result_scores= vectorstore.similarity_search_with_score(query, k=3)
result_scores

[(Document(metadata={'source': 'doc_2.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning that focuses on using neural networks with multiple layers to model complex patterns in data. Neural networks are inspired by the structure and function of the human brain, consisting of interconnected nodes (neurons) that process information. Deep learning has revolutionized various fields, including computer vision, natural language processing, and speech recognition, by enabling the development of models that'),
  0.5512487888336182),
 (Document(metadata={'source': 'doc_2.txt'}, page_content='by enabling the development of models that can automatically learn hierarchical representations of data. Key architectures in deep learning include convolutional neural networks (CNNs) for image processing, recurrent neural networks (RNNs) for sequential data, and transformers for language modeling. Training deep learning models typically requires lar

### Understanding Similarity Scores

The similarity score represents how closely related a document chunk is to your query, The scoring depends on the distance metric used:

ChromaDB by default uses L2 distance (Euclidean distance)
- Lower scores =  MORE similar (closer in vector space)
- Score of 0 =  identical vectors
- Typical range: 0 to 2 (but can be higher)

Cosine similarity (if configured):

- Higher scores =  MORE similar
- Range: -1 to 1 (1 being identical)

## Initialize LLM, RAG Chain, Prompt Tempate, Query the RAG system

In [18]:
from langchain_ollama import ChatOllama

llm = ChatOllama(model="mistral:latest")

In [20]:
test_response =  llm.invoke("What is Large Language Models?")

test_response

AIMessage(content=" Large Language Models (LLMs) are a type of artificial intelligence model designed to process and generate human-like text. They are trained on vast amounts of internet text, learning patterns in the language that allow them to understand context, answer questions, write essays, translate languages, summarize information, and even carry on conversations.\n\nThe size of these models refers to the number of parameters they have â€“ essentially the number of adjustable components in the model's architecture. Larger models generally perform better but require more computational resources and time to train. Examples of large language models include Google's BERT, Microsoft's Turing NLG, and OpenAI's GPT-3. They are increasingly being used in various applications, such as chatbots, virtual assistants, content generation, and language translation services.", additional_kwargs={}, response_metadata={'model': 'mistral:latest', 'created_at': '2025-10-26T10:25:18.583939Z', 'don

In [None]:
# Alternative waqy to initialize chat models

from langchain.chat_models.base import init_chat_model

chat_model = init_chat_model("ollama:mistral:latest")
#llm = init_chat_model("groq:")

chat_model

ChatOllama(model='mistral:latest')

In [23]:
chat_model.invoke("Explain the concept of reinforcement learning.")

AIMessage(content=" Reinforcement Learning (RL) is a type of machine learning that involves training an agent to make decisions by interacting with its environment. The goal is for the agent to learn a policy, which is a strategy or rule set that maps states of the environment to actions that maximize some notion of cumulative reward.\n\nHere's a simplified breakdown of the concept:\n\n1. **Agent**: In reinforcement learning, an agent interacts with its environment by performing actions and receiving rewards or punishments. The agent's goal is to learn how to take actions that maximize some notion of cumulative reward (or minimize negative reward).\n\n2. **State**: Each state represents a situation in the environment that the agent finds itself in. For example, consider an autonomous car driving on a highway. Each state could represent a specific configuration of the car's sensors data (e.g., speed, distance to other cars, traffic signs, etc.).\n\n3. **Action**: The agent chooses an ac

## Modern RAG Chain

In [27]:
from langchain_classic.chains.retrieval import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_classic.chains.combine_documents.stuff import create_stuff_documents_chain

In [28]:
## Convert vectorstore to a retriever

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}
)

retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000018060E3B090>, search_kwargs={'k': 3})

In [29]:
##Create Prompt Template
system_prompt = """You are an assistant for question-answering tasks.
Use the context provided to answer the question as accurately as possible.
If the context does not contain the answer, respond with 'I don't know'.
Use three sentences maximum to answer the question and keep it concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

In [30]:
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the context provided to answer the question as accurately as possible.\nIf the context does not contain the answer, respond with 'I don't know'.\nUse three sentences maximum to answer the question and keep it concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

In [32]:
## Create Documents Chain
documents_chain = create_stuff_documents_chain(
    llm=chat_model,
    prompt=prompt
)

documents_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the context provided to answer the question as accurately as possible.\nIf the context does not contain the answer, respond with 'I don't know'.\nUse three sentences maximum to answer the question and keep it concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOllama(model='mistral:latest')
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_document

This chain:

- Takes retrieved documents
- "Stuffs" them into prompt's {context} placeholder.
- Sends the complete prompt to the LLM.
- Returns the LLM's response.

In [37]:
### Create the Final RAG Chain
rag_chain = create_retrieval_chain(retriever,documents_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000018060E3B090>, search_kwargs={'k': 3}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the context provided to answer the question as accurately as possible

Using this create_retrieval_chain we are able to another retreval layer to our stuff document chain which uses the retriever to fetch relavant documents from our vectorstore and sends it to the stuff chain which sends it to the LLM.

In [44]:
response = rag_chain.invoke({"input": "Explain the different types of machine learning."})

response

{'input': 'Explain the different types of machine learning.',
 'context': [Document(metadata={'source': 'doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. The key types of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised'),
  Document(metadata={'source': 'doc_0.txt'}, page_content='labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behaviors. Applications of machine learning span across various industries, including healthcare, finance, market

In [45]:
response["answer"]

' The three main types of Machine Learning are Supervised Learning, Unsupervised Learning, and Reinforcement Learning. In Supervised Learning, labeled data is used to train models, while in Unsupervised Learning, patterns are found in unlabeled data. Reinforcement Learning involves training agents to make sequences of decisions by rewarding desired behaviors. Deep Learning, a subset of Machine Learning, uses neural networks with multiple layers to model complex patterns in data and has been instrumental in fields like computer vision, natural language processing, and speech recognition.'

In [46]:
# Function to query the modern RAG system

def query_rag_modern(question: str) -> str:
    print(f"Question: {question}")
    print("-"*50)

    # Using create_retrieval_chain approach
    result = rag_chain.invoke({"input": question})

    print(f"Answer: {result['answer']}")
    print("\nRetrieved Context:")
    for i, doc in enumerate(result['context']):
        print(f"\n ----- Source {i+1} -----")
        print(doc.page_content[:200]+"...")

    return result['answer']

test_questions = [
    "What are three types of machine learning?",
    "What is Deep Learning and how does it relate to Neural Networks?",
    "What are CNNs used for?"
]

for question in test_questions:
    query_rag_modern(question)
    print("\n"+"="*80+"\n")

Question: What are three types of machine learning?
--------------------------------------------------
Answer:  The three main types of machine learning are:
1. Supervised Learning: It utilizes labeled data to train models for specific tasks.
2. Unsupervised Learning: It finds patterns in unlabeled data without explicit instructions.
3. Reinforcement Learning: It trains agents to make sequences of decisions by rewarding desired behaviors.

Retrieved Context:

 ----- Source 1 -----
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data. It involves various al...

 ----- Source 2 -----
labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning involves training agents to make sequences of decisions by rewarding desired behavior...

 ----- Source 3 -----
techniques. As the field continues to evolve, advancements i

# Creating Rag Chain Alternative - Using LCEL

In [52]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [47]:
#Create a custom Prompt
custom_prompt = ChatPromptTemplate.from_template(
    """Use the context provided to answer the question as accurately as possible.
If the context does not contain the answer, respond with 'I don't know'.
Use three sentences maximum to answer the question and keep it concise.

Context: {context}

Question: {question}

Answer:"""
)

custom_prompt


ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the context provided to answer the question as accurately as possible.\nIf the context does not contain the answer, respond with 'I don't know'.\nUse three sentences maximum to answer the question and keep it concise.\n\nContext: {context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [48]:
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000018060E3B090>, search_kwargs={'k': 3})

In [50]:
## Format the output documents for the prompt
def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([doc.page_content for doc in docs])

In [54]:
## Build the RAG Chain using LCEL

rag_chain_lcel = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | custom_prompt
    | chat_model
    | StrOutputParser()
)

rag_chain_lcel

{
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000018060E3B090>, search_kwargs={'k': 3})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the context provided to answer the question as accurately as possible.\nIf the context does not contain the answer, respond with 'I don't know'.\nUse three sentences maximum to answer the question and keep it concise.\n\nContext: {context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])
| ChatOllama(model='mistral:latest')
| StrOutputParser()

In [65]:
response = rag_chain_lcel.invoke("Explain the different types of machine learning.")
response

' Machine Learning consists of three main types:\n\n1. Supervised Learning: Uses labeled data to train models, such as classification and regression tasks.\n2. Unsupervised Learning: Finds patterns in unlabeled data, like clustering and anomaly detection.\n3. Reinforcement Learning: Trains agents through trial-and-error by rewarding desired behaviors, typically used for sequential decision making problems.'

In [68]:
# Query using the LCEL RAG Chain

def query_rag_lcel(question: str) -> str:
    print(f"Question: {question}")
    print("-"*50)

    # Using create_retrieval_chain approach
    result = rag_chain_lcel.invoke({"input": question})

    print(f"Answer: {result['answer']}")
    print("\nRetrieved Context:")
    for i, doc in enumerate(result['context']):
        print(f"\n ----- Source {i+1} -----")
        print(doc.page_content[:200]+"...")

    return result['answer']

test_questions = [
    "What are three types of machine learning?",
    "What is Deep Learning and how does it relate to Neural Networks?",
    "What are CNNs used for?"
]

for question in test_questions:
    query_rag_lcel(question)
    print("\n"+"="*80+"\n")

Question: What are three types of machine learning?
--------------------------------------------------


AttributeError: 'dict' object has no attribute 'replace'