### Building a RAG System with LangChain and ChromaDB
#### Introduction
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of large language models with external knowledge retrieval. This notebook will walk you through building a complete RAG system using:

- LangChain: A framework for developing applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- OpenAI: For embeddings and language model (you can substitute with other providers)

In [16]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [20]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader  
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.schema import Document

#vectorestore
from langchain.vectorstores import Chroma
from langchain_community.vectorstores import Chroma 

##utility imports
import numpy as np
from typing import List, Dict, Any


In [None]:
# RAG Architecture Overview
print("""
RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge
""")

In [21]:
#1 create sample data

def create_sample_data() -> List[Document]:
    sample_text = [
        """
        LangChain is a framework for developing applications powered by language models.
        It provides modular components for building LLM applications, including document loaders,
        text splitters, embeddings, and vector stores.

        ChromaDB is a vector database that allows efficient storage and retrieval of high-dimensional vectors.
        It is commonly used in RAG architectures to store embeddings of text chunks.

        """,
        """    
        Machine Learning Fundamentals

        Machine learning is a subset of artificial intelligence that enables systems to learn 
        and improve from experience without being explicitly programmed. There are three main 
        types of machine learning: supervised learning, unsupervised learning, and reinforcement 
        learning. Supervised learning uses labeled data to train models, while unsupervised 
        learning finds patterns in unlabeled data. Reinforcement learning learns through 
        interaction with an environment using rewards and penalties.

        """,
        """
        Deep Learning and Neural Networks

        Deep learning is a subset of machine learning based on artificial neural networks. 
        These networks are inspired by the human brain and consist of layers of interconnected 
        nodes. Deep learning has revolutionized fields like computer vision, natural language 
        processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
        effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
        excel at sequential data processing.
        """,
        """
        Natural Language Processing (NLP)

        NLP is a field of AI that focuses on the interaction between computers and human language. 
        Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
        machine translation, and question answering. Modern NLP heavily relies on transformer 
        architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
        context and relationships between words in text.
        """ 
    ]
    return [Document(page_content=text, metadata={"source": "sample_data"}) for text in sample_text]


#1 Load sample data
documents = create_sample_data()
print(f"Loaded {len(documents)} documents.")
print("Sample document content:", documents[0].page_content)
print("Sample document metadata:", documents[0].metadata)


Loaded 4 documents.
Sample document content: 
        LangChain is a framework for developing applications powered by language models.
        It provides modular components for building LLM applications, including document loaders,
        text splitters, embeddings, and vector stores.

        ChromaDB is a vector database that allows efficient storage and retrieval of high-dimensional vectors.
        It is commonly used in RAG architectures to store embeddings of text chunks.

        
Sample document metadata: {'source': 'sample_data'}


In [22]:

#2 Split documents into smaller chunks
def split_documents(
        documents: List[Document], 
        chunk_size: int = 100, 
        chunk_overlap: int = 5) -> List[Document]:
    
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return text_splitter.split_documents(documents)
print("Splitting documents into chunks...")
chunked_documents = split_documents(documents)
print(f"Created {len(chunked_documents)} chunks from {len(documents)} documents.")

for i, chunk in enumerate(chunked_documents):
    print(f"Chunk {i+1}: {chunk.page_content[:50]}...")  # Print first 50 characters of each chunk




Splitting documents into chunks...
Created 26 chunks from 4 documents.
Chunk 1: LangChain is a framework for developing applicatio...
Chunk 2: It provides modular components for building LLM ap...
Chunk 3: text splitters, embeddings, and vector stores....
Chunk 4: ChromaDB is a vector database that allows efficien...
Chunk 5: of high-dimensional vectors....
Chunk 6: It is commonly used in RAG architectures to store ...
Chunk 7: Machine Learning Fundamentals...
Chunk 8: Machine learning is a subset of artificial intelli...
Chunk 9: and improve from experience without being explicit...
Chunk 10: types of machine learning: supervised learning, un...
Chunk 11: learning. Supervised learning uses labeled data to...
Chunk 12: learning finds patterns in unlabeled data. Reinfor...
Chunk 13: interaction with an environment using rewards and ...
Chunk 14: Deep Learning and Neural Networks...
Chunk 15: Deep learning is a subset of machine learning base...
Chunk 16: These networks are inspired by t

In [23]:

# 3 Initialize HuggingFace embeddings
# Generate embeddings for the chunks
def generate_embeddings(documents: List[Document]) -> List[np.ndarray]:
    embeddings_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    return [embeddings_model.embed_documents(chunk.page_content) for chunk in chunked_documents]

print("Generating embeddings for chunks...")
embeddings = generate_embeddings(chunked_documents)
print(f"Generated embeddings for {len(embeddings)} chunks.") 

for i, embedding in enumerate(embeddings):
    print(f"Embedding {i+1}: {embedding[:5]}...")  # Print first 5 values of each embedding



Generating embeddings for chunks...
Generated embeddings for 26 chunks.
Embedding 1: [[-0.029210269451141357, -0.00813598558306694, 0.03420502319931984, 0.040295638144016266, 0.07426184415817261, 0.059225041419267654, 0.08180944621562958, 0.037137411534786224, 0.032914865761995316, -0.03143712505698204, 0.060038626194000244, -0.07254357635974884, 0.024705274030566216, -0.004340950399637222, -0.01321366336196661, 0.018410898745059967, -0.08305701613426208, -0.014857887290418148, -0.12271381169557571, 0.0023262619506567717, -0.033769641071558, 0.027208659797906876, -0.020789364352822304, 0.02001935988664627, -0.01195458322763443, -0.016626890748739243, -0.021979263052344322, -0.004876251798123121, 0.0012886389158666134, -0.08734939992427826, 0.007423533126711845, 0.1130385547876358, 0.058595214039087296, -0.019217340275645256, 0.017139475792646408, -0.07475003600120544, -0.07950087636709213, -0.05808822810649872, 0.03741088882088661, 0.059456147253513336, -0.08294069021940231, -0.0891819

In [24]:
# 4 Store embeddings in ChromaDB
## Create a Chromdb vector store
persist_directory="./chroma_db"

embeddings_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

def store_embeddings_in_chroma(documents: List[Document], embeddings: List[np.ndarray]) -> Chroma:
    chroma_db = Chroma.from_documents(documents, embeddings_model, persist_directory=persist_directory,collection_name="rag_collection")
    return chroma_db
print("Storing embeddings in ChromaDB...")
chroma_db = store_embeddings_in_chroma(chunked_documents, embeddings_model)
print(f"Stored {len(chroma_db)} embeddings in ChromaDB.")
print("ChromaDB vector store created successfully.")

chroma_db




Storing embeddings in ChromaDB...
Stored 58 embeddings in ChromaDB.
ChromaDB vector store created successfully.


### Test Similarity Serach

In [48]:
### Test Similarity Serach
def search_similar_chunks(query: str, top_k: int = 5) -> List[Document]:
    query_embedding = embeddings_model.embed_query(query)
    results = chroma_db.similarity_search_by_vector(query_embedding, k=top_k)
    results1= chroma_db.similarity_search(query1, k=top_k)
    print(f"Query: {query}")
    return results

similar_docs= search_similar_chunks("What is LangChain?", top_k=3)
print(f"Found {len(similar_docs)} similar documents for the query.")
print("Similar documents:")
for i, doc in enumerate(similar_docs):
    print(f"Document {i+1}: {doc.page_content[:50]}...")  # Print first 50 characters of each document  
    

def search_similar_chunks1(query: str, top_k: int = 5) -> List[Document]:
    results1= chroma_db.similarity_search(query, k=top_k)
    print(f"Query: {query1}")
    return results1

query1= "What is NLP?"
similar_docs1= search_similar_chunks1(query1, top_k=3)
print(f"Found {len(similar_docs1)} similar documents for the query.")
print("Similar documents:")
for i, doc in enumerate(similar_docs1):
    print(f"Document {i+1}: {doc.page_content[:50]}...")  # Print first 50 characters of each document


Query: What is LangChain?
Found 3 similar documents for the query.
Similar documents:
Document 1: LangChain is a framework for developing applicatio...
Document 2: LangChain is a framework for developing applicatio...
Document 3: It is commonly used in RAG architectures to store ...
Query: What is NLP?
Found 3 similar documents for the query.
Similar documents:
Document 1: NLP is a field of AI that focuses on the interacti...
Document 2: Natural Language Processing (NLP)...
Document 3: Key tasks in NLP include text classification, name...


### Advaced Simillarity Search with score

In [52]:

def search_similar_chunks1(query: str, top_k: int = 5) -> List[Document]:
    results2= chroma_db.similarity_search_with_score(query, k=top_k)
    print(f"Found {len(results2)} similar documents for the query.")
    for i, (doc, score) in enumerate(results2):
        print(f"Document {i+1}: {doc.page_content[:50]}... (Score: {score})")

    print(f"Query: {query1}")
    return results2

query1= "What is NLP?"

similar_docs2= search_similar_chunks1(query1, top_k=3)
print(f"Found {len(similar_docs2)} similar documents for the query.")
print("Similar documents:")
for i, (doc, score) in enumerate(similar_docs2):
    print(f"Document {i+1}: {doc.page_content[:50]}... (Score: {score})")


Found 3 similar documents for the query.
Document 1: NLP is a field of AI that focuses on the interacti... (Score: 0.4940963089466095)
Document 2: Natural Language Processing (NLP)... (Score: 0.6653535962104797)
Document 3: Key tasks in NLP include text classification, name... (Score: 0.7872450351715088)
Query: What is NLP?
Found 3 similar documents for the query.
Similar documents:
Document 1: NLP is a field of AI that focuses on the interacti... (Score: 0.4940963089466095)
Document 2: Natural Language Processing (NLP)... (Score: 0.6653535962104797)
Document 3: Key tasks in NLP include text classification, name... (Score: 0.7872450351715088)


### Initialize LLM, RAG Chain, Prompt Templete, Query the RAG SYSTEM

In [36]:
from langchain.llms import Ollama
#from langchain-ollama import Ollama
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

# Initialize LLM, RAG Chain, Prompt Template, Query the RAG SYSTEM
llm = Ollama(model="llama2")





  llm = Ollama(model="llama2")


In [37]:
test_response=llm.invoke("What is LangChain?")
print(f"LLM Response: {test_response}")

LLM Response: 
LangChain is a programming language that is designed to be a chain of languages, rather than a single monolithic language. It is based on the idea of modularity and compositionality, where different parts of the language can be combined and reused in different ways to create new languages.

The core idea of LangChain is to provide a way to define and manipulate linguistic structures in a modular and compositional manner, allowing for the creation of complex grammars and lexicons by combining smaller, reusable components. This is achieved through the use of a set of meta-grammars, which are high-level grammars that can be used to define and combine lower-level grammars and lexicons.

LangChain is designed to be extensible and flexible, allowing users to define their own meta-grammars and incorporate them into the language. This makes it possible for LangChain to accommodate a wide range of linguistic styles and structures, from natural languages to constructed ones.

Some

### Model RAG CHAIN

In [30]:
from langchain.chains import create_retrieval_chain
from langchain.prompts import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [25]:
### convert the vector store to a retriever
retriever = chroma_db.as_retriever(search_type="similarity", search_kwargs={"k": 3})
# Create a prompt template for the RAG chain

retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000019D800BFF80>, search_kwargs={'k': 3})

In [39]:
## Create a prompt templete for the RAG chain
prompt= ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that answers questions based on the provided {context}."),
    ("human", "{input}")
])
prompt


ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='You are a helpful assistant that answers questions based on the provided {context}.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

In [38]:
### Create a Dociment Chain
doc_chain = create_stuff_documents_chain(
    llm=llm, 
    prompt=prompt
)

doc_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='You are a helpful assistant that answers questions based on the provided {context}.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| Ollama()
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documents_chain'}, config_factories=[])

In [41]:
### Create a RAG chain using the retriever and doc chain template
from langchain.chains import create_retrieval_chain
rag_chain = create_retrieval_chain(retriever,doc_chain)   
# Example query to the RAG chain
rag_chain



RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000019D800BFF80>, search_kwargs={'k': 3}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='You are a helpful assistant that answers questions based on the provided {context}.'), additional_kwargs={}), HumanMessagePro

In [42]:
rag_chain.invoke({"input": "What is LangChain?"})
rag_chain.invoke({"input": "What is NLP?"})
rag_chain.invoke({"input": "What is Machine Learning?"})
rag_chain.invoke({"input": "What is Deep Learning?"})



{'input': 'What is Deep Learning?',
 'context': [Document(metadata={'source': 'sample_data'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks.'),
  Document(metadata={'source': 'sample_data'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks.'),
  Document(metadata={'source': 'sample_data'}, page_content='Deep Learning and Neural Networks')],
 'answer': "Assistant: Hello! Deep learning is a subset of machine learning that focuses on training artificial neural networks to learn and improve from large amounts of data. These neural networks are designed to mimic the structure and function of the human brain, allowing them to learn and make decisions in a more complex and nuanced way than traditional machine learning algorithms. Deep learning is particularly well-suited for tasks that involve processing and analyzing large amounts of data, such as image and speech recognition, natural language pr

#### Create RAG using framework LCEL - LangChain Expression Language 

In [43]:
# another approch to create a RAG chain using LECL 

from langchain_core.output_parsers import StrOutputParser
from langchain_core.outputs import ChatGeneration   
from langchain_core.runnables import RunnablePassthrough, RunnableLambda, RunnableParallel


In [44]:
#create Custom prompt template
from langchain_core.prompts import ChatPromptTemplate
custom_prompt = ChatPromptTemplate.from_template(

"""
You are a helpful assistant that answers questions based on the provided context: {context}.
Please provide a concise answer to the question: {input}
"""

)

custom_prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant that answers questions based on the provided context: {context}.\nPlease provide a concise answer to the question: {input}\n'), additional_kwargs={})])

In [77]:

retriever

#convert  output documents from retriever in required format for prompt
def format_documents_for_prompt(documents: List[Document]) -> str:
    return "\n\n".join([doc.page_content for doc in documents])




In [None]:
#Build a Runnable for the RAG chain

rag_chain_icel=(
    {
        "context": retriever |format_documents_for_prompt,
        "input": RunnablePassthrough()
    }
    | custom_prompt 
    |llm
    | StrOutputParser()
)

rag_chain_icel
response=rag_chain_icel.invoke("What is Deep Learning")
response

"Of course! Here's a quick answer to your question:\n\nDeep learning is a subset of machine learning that focuses on training artificial neural networks with multiple layers to analyze and learn complex patterns in data. These neural networks are designed to mimic the structure and function of the human brain, allowing them to learn and improve on their own by analyzing large amounts of data. Deep learning has been instrumental in achieving state-of-the-art performance in a wide range of applications, including image and speech recognition, natural language processing, and autonomous driving."

: 

In [None]:
## create a prompt that includes the chat history
contextualize_q_system_prompt = """Given a chat history and the latest user question 
which might reference context in the chat history, formulate a standalone question 
which can be understood without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])


### Advanced RAG Techniques - Conversational Memory

In [59]:
from langchain.chains import create_retrieval_chain
from langchain.chains import create_history_aware_retriever
# MessagesPlaceholder -- place holder for chat history in prompts
from langchain_core.prompts import MessagesPlaceholder  
# structured message types of conversational history 
from langchain_core.messages import AIMessage, HumanMessage



In [60]:
## create a prompt that includes the chat history
contextualize_q_system_prompt = """Given a chat history and the latest user question 
which might reference context in the chat history, formulate a standalone question 
which can be understood without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])


In [67]:
## create a RAG chain with conversational memory
rag_chain_conversation = create_history_aware_retriever(
    retriever=retriever,
    llm=llm,
    prompt=contextualize_q_prompt,
)

In [70]:
# Create a new document chain with history
qa_system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

# Create conversational RAG chain
conversational_rag_chain = create_retrieval_chain(
    rag_chain_conversation, 
    question_answer_chain
)

print("Conversational RAG chain created!")

Conversational RAG chain created!


In [71]:
chat_history=[]
# First question
result1 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What is machine learning?"
})
print(f"Q: What is machine learning?")
print(f"A: {result1['answer']}")

Q: What is machine learning?
A: Sure, I'd be happy to help! Based on the context provided, here is my answer:

Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. This means that the system can automatically learn and adapt to new data or situations without human intervention. In other words, machine learning enables systems to learn and make decisions based on patterns and trends in the data they are trained on.


In [73]:
chat_history.extend([
    HumanMessage(content="What is machine learning"),
    AIMessage(content=result1['answer'])
])
chat_history

[HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Sure, I'd be happy to help! Based on the context provided, here is my answer:\n\nMachine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. This means that the system can automatically learn and adapt to new data or situations without human intervention. In other words, machine learning enables systems to learn and make decisions based on patterns and trends in the data they are trained on.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Sure, I'd be happy to help! Based on the context provided, here is my answer:\n\nMachine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. This me

In [74]:
## Follow up question
# Follow-up question
result2 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What are its main types?"  # Refers to ML from previous question
})
result2

{'chat_history': [HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Sure, I'd be happy to help! Based on the context provided, here is my answer:\n\nMachine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. This means that the system can automatically learn and adapt to new data or situations without human intervention. In other words, machine learning enables systems to learn and make decisions based on patterns and trends in the data they are trained on.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Sure, I'd be happy to help! Based on the context provided, here is my answer:\n\nMachine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly