### Building a RAG System with LangChain and ChromaDB

#### Introduction
Retrieval-Augmented Generation (RAG) is a pwoerful technique that combines the capabilities of large
language models with external knowledge retrievla. This notebook will walk you through building a complete RAG system using:  

- LangChain: A framework fordeveloping applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- OpenAI: For embeddings and language model (you can substitue with other providers)

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
## Langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain.schema import Document

# vectorestore
from langchain.vectorstores import Chroma

# utility imports
import numpy as np
from typing import List

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
## Sample data document
sample_docs = [
    "Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.",
    "Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.",
    "Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques."
]

sample_docs

['Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.',
 'Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.',
 'Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques.']

In [5]:
## Save sample documents to files

os.makedirs("data", exist_ok=True)

for i, doc in enumerate(sample_docs):
    with open(f"data/doc_{i}.txt", "w") as file:
        file.write(doc)

## Document Loading

In [6]:
from langchain_community.document_loaders import DirectoryLoader

# Load documents from directory
loader = DirectoryLoader(
    "data/",
    glob="*.txt",
    loader_cls=TextLoader,
    loader_kwargs={
        'encoding':'utf-8'
    }
)

documents = loader.load()

print(f"Loaded: {len(documents)} documents")
print(f"\nFirst Document:")
print(documents[0].page_content[:100]+"...")

Loaded: 3 documents

First Document:
Embeddings are numerical representations of words or phrases that capture their meaning and context....


## Document Splitting

In [7]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=[" "]   
)

chunks = text_splitter.split_documents(documents=documents)

print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 3 chunks from 3 documents

Chunk example:
Content: Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as in...
Metadata: {'source': 'data/doc_2.txt'}


## Embedding

In [8]:
# Text Embeddings Inference
embeddings = HuggingFaceEndpointEmbeddings(model="http://localhost:8080")
vector = embeddings.embed_documents([text.page_content for text in chunks])

## Vetorstore
Initialize ChromaDB and Store the chunks in vector representation

In [9]:
## Create ChromaDB vectorstore
persist_directory="./chromadb"

## Initialize ChromaDB
vectore_store = Chroma.from_documents(
    documents=chunks,
    embedding=HuggingFaceEndpointEmbeddings(model="http://localhost:8080"),
    persist_directory=persist_directory,
    collection_name="rag_collection"
)

print(f"Vectorestrore created with {vectore_store._collection.count()} vectors")
print(f"Persited to: {persist_directory}")

Vectorestrore created with 12 vectors
Persited to: ./chromadb


### Test the similarity search

In [10]:
### Advanced similarity search

query = "What is machine learning"

result = vectore_store.similarity_search_with_score(query, k=3)
result

[(Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.'),
  0.49546098709106445),
 (Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.'),
  0.49564552307128906),
 (Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art 

### Understandig similarity score
The similarity score represents how closly related a document chunk is to your query. The scoring depends on the distance
metric used.

ChromaDB default: Uses L2 distance (Euclidean distance)

- Lower scores = MORE similar (closer in vector space)  
- Score of 0 = identical vectors  
- Typical range: 0 to 2 (but can be higher)


Cosine similarity (if configured):  

- Higher scores = More similar  
- Range: -1 to 1 (1 being identical)

In [11]:
### Initialize LLM, RAG Chain, Prompt Template, Query the RAG System
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3.1:8b",    
    temperature=0.2,
    reasoning=False
)

In [12]:
test_response = llm.stream("What are Large Language Models")

for text in test_response:
    print(text.content, end='')

Large Language Models (LLMs) are a type of artificial intelligence (AI) designed to process and generate human-like language. They're a subset of Natural Language Processing (NLP) models, which have gained significant attention in recent years due to their impressive capabilities.

**Key characteristics:**

1. **Scale**: LLMs are trained on massive datasets, often containing billions or even trillions of words.
2. **Complexity**: These models consist of multiple layers, with each layer processing the input data in a hierarchical manner.
3. **Depth**: LLMs have many layers (typically 10-100), allowing them to capture complex relationships between words and concepts.

**How they work:**

1. **Input**: The model takes in text as input, which can be a sentence, paragraph, or even an entire book.
2. **Embeddings**: The input is converted into numerical representations called embeddings, which capture the semantic meaning of each word.
3. **Transformers**: LLMs use transformer architectures,

## Modern RAG Chain

In [13]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [14]:
## Convert vectorestore to retriever
retriever = vectore_store.as_retriever(
    search_kwarg={"k":3} # Retrieve top 3 chunks
)

In [15]:
## Create a prompt template
system_prompt = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say tha you don't know.
Use three sentences maximum and keep the answer concise.

Context: {context}
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")   
])

In [16]:
## Create document chain
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, just say tha you don't know.\nUse three sentences maximum and keep the answer concise.\n\nContext: {context}\n"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOllama(model='llama3.1:8b', reasoning=False, temperature=0.2)
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documents

**This chain**
- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response


In [17]:
### Create the final RAG Chain
rag_chain = create_retrieval_chain(retriever, document_chain)
result = rag_chain.invoke({"input":"What can you tell me about embedding?"})

result
    

{'input': 'What can you tell me about embedding?',
 'context': [Document(metadata={'source': 'data/doc_2.txt'}, page_content='Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques.'),
  Document(metadata={'source': 'data/doc_2.txt'}, page_content='Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques.'),
  Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data

## Create RAG Chain Alternative - Using LCEL (LangChain Expression Language)

In [18]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

# Create custom prompt
custom_prompt = ChatPromptTemplate.from_template(
    """
    Use the following context to answer the question.
    If you do'nt know the answer based on the context, say you don't know.
    Provide specific details from the context to support your answer.
    
    Context:
    {context}
    
    Question: {question}
    
    Answer:
    """ 
)

In [None]:
## Format the output documents for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [20]:
## Builkd the chain using LCEL
rag_chain_lcel = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | custom_prompt
    | llm
    | StrOutputParser()
)

result = rag_chain_lcel.invoke("What is deep learning")
result

'Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.\n\nThis answer is supported by the following specific details from the context:\n\n* The first paragraph states that deep learning uses neural networks with multiple layers to learn and extract features from complex data.\n* The second paragraph mentions machine learning as a subset of artificial intelligence (AI), but it does not provide further information about deep learning.'

## Add new documents to vectorstore

In [21]:
new_document = """
Reinforcment Learning in Detail:

Reinforcement learning is a type of machine learning that involves training an agent to make decisions by interacting with its environment. The goal is for the agent to learn how to maximize some form of reward, such as points or money, over time.
In reinforcement learning, the agent takes actions in an environment and receives feedback in the form of rewards or penalties. Based on this feedback, the agent learns which actions are more likely to lead to positive outcomes and adjusts its behavior accordingly.
There are two main types of reinforcement learning: supervised and unsupervised. In supervised reinforcement learning, the agent is given a set of examples of correct actions to take in different situations, and it learns by comparing its own actions to these examples. In unsupervised reinforcement learning, the agent learns by trial and error, without any explicit feedback.
Reinforcement learning has many applications, including robotics, game playing, autonomous vehicles, and more. It is a powerful tool for solving complex problems that require decision-making in dynamic environments.
"""

In [22]:
new_doc = Document(
    page_content=new_document,
    metadata={
        "source": "manual_addition",
        "topic": "reinforcement learning",
        "author": "me"
    }
)

In [23]:
### Add new documents to vectorestore
new_chunks = text_splitter.split_documents([new_doc])

vectore_store.add_documents(new_chunks)

['ea82b5f5-35a3-4996-940e-0e83de1a9871',
 'ed05bc6d-b632-4ed9-a21e-fd0b81a31d80',
 '23766089-166a-47c3-9b5d-1ce8ab3a8232']

In [24]:
result = rag_chain.stream({"input":"What is reinforcment learning?"})

for chunk in result:
    if isinstance(chunk, dict):
        for k, v in chunk.items():
            if "answer" in k:
                print(v, end='', flush=True)

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with its environment and receiving feedback in the form of rewards or penalties. The goal is for the agent to maximize some form of reward over time. This process involves trial and error, allowing the agent to learn from its experiences.

## Advanced Rag Techniqes - Conversional Memory 

- create_histroy_aware_retriever: Makes the retriever understand conversation context
- MessagesPlaceholder: Placeholder for chat history in prompts
- HumanMessage/AIMessage: Structured message types for conversation history

In [25]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

## Create a prompts including the chathistory
system_prompt_with_history = """"
Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone
question which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed an otherwise return it as is.
"""

In [26]:
prompt_with_history = ChatPromptTemplate.from_messages([
    ("system", system_prompt_with_history),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

In [27]:
## Create the history aware retriever
history_aware_retriever = create_history_aware_retriever(
    llm, 
    retriever,
    prompt_with_history
)

In [28]:
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEndpointEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7a32a19e1040>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag

In [33]:
## Create a prompt template
system_prompt = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say tha you don't know.
Use three sentences maximum and keep the answer concise.

Context: {context}
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")   
])

question_answer_chain = create_stuff_documents_chain(llm, prompt)

conversational_rag_chain = create_retrieval_chain(
    history_aware_retriever,
    question_answer_chain
)

In [38]:
chat_history = []
result = conversational_rag_chain.invoke({"chat_history": chat_history, "input":"What is deep learning??"})

print(result)

{'chat_history': [], 'input': 'What is deep learning??', 'context': [Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.'), Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.'), Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artifi

In [40]:
chat_history.extend([
    HumanMessage(content="Whas is deep learning??"),
    AIMessage(content=result["answer"])
])

In [41]:
chat_history

[HumanMessage(content='Whas is deep learning??', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition. This technique enables computers to automatically learn and improve their performance on specific tasks over time.', additional_kwargs={}, response_metadata={})]

In [43]:
## Follow up question
result = conversational_rag_chain.invoke({"chat_history": chat_history, "input":"What are it's main type?"})

print(result['answer'])

There are two main types of reinforcement learning: supervised and unsupervised. In supervised reinforcement learning, the agent is given a set of examples of correct actions to take in different situations. In unsupervised reinforcement learning, the agent learns by trial and error, without any explicit feedback.
