### Building a RAG System with LangChain and ChromaDB

#### Introduction
Retrieval-Augmented Generation (RAG) is a pwoerful technique that combines the capabilities of large
language models with external knowledge retrievla. This notebook will walk you through building a complete RAG system using:  

- LangChain: A framework fordeveloping applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- OpenAI: For embeddings and language model (you can substitue with other providers)

In [76]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [77]:
## Langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain.schema import Document

# vectorestore
from langchain.vectorstores import Chroma

# utility imports
import numpy as np
from typing import List

In [78]:
## Sample data document
sample_docs = [
    "Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.",
    "Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.",
    "Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques."
]

sample_docs

['Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.',
 'Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. It has achieved state-of-the-art results in many domains, including image recognition, natural language processing, and speech recognition.',
 'Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques.']

In [79]:
## Save sample documents to files

os.makedirs("data", exist_ok=True)

for i, doc in enumerate(sample_docs):
    with open(f"data/doc_{i}.txt", "w") as file:
        file.write(doc)

## Document Loading

In [80]:
from langchain_community.document_loaders import DirectoryLoader

# Load documents from directory
loader = DirectoryLoader(
    "data/",
    glob="*.txt",
    loader_cls=TextLoader,
    loader_kwargs={
        'encoding':'utf-8'
    }
)

documents = loader.load()

print(f"Loaded: {len(documents)} documents")
print(f"\nFirst Document:")
print(documents[0].page_content[:100]+"...")

Loaded: 3 documents

First Document:
Embeddings are numerical representations of words or phrases that capture their meaning and context....


## Document Splitting

In [81]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=[" "]   
)

chunks = text_splitter.split_documents(documents=documents)

print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 3 chunks from 3 documents

Chunk example:
Content: Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as in...
Metadata: {'source': 'data/doc_2.txt'}


## Embedding

In [92]:
# Text Embeddings Inference
embeddings = HuggingFaceEndpointEmbeddings(model="http://localhost:8080")
vector = embeddings.embed_documents([text.page_content for text in chunks])

## Vetorstore
Initialize ChromaDB and Store the chunks in vector representation

In [83]:
## Create ChromaDB vectorstore
persist_directory="./chromadb"

## Initialize ChromaDB
vectore_store = Chroma.from_documents(
    documents=chunks,
    embedding=HuggingFaceEndpointEmbeddings(model="http://localhost:8080"),
    persist_directory=persist_directory,
    collection_name="rag_collection"
)

print(f"Vectorestrore created with {vectore_store._collection.count()} vectors")
print(f"Persited to: {persist_directory}")

Vectorestrore created with 15 vectors
Persited to: ./chromadb


### Test the similarity search

In [84]:
### Advanced similarity search

query = "What is machine learning"

result = vectore_store.similarity_search_with_score(query, k=3)
result

[(Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.'),
  0.32409441471099854),
 (Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable machines to improve their performance on specific tasks over time.'),
  0.32409441471099854),
 (Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine learning involves teaching computers to learn from and make predictions on data. It is a subset of artificial intelligence (AI) that focuses on developing algorithms and statistical models to enable

### Understandig similarity score
The similarity score represents how closly related a document chunk is to your query. The scoring depends on the distance
metric used.

ChromaDB default: Uses L2 distance (Euclidean distance)

- Lower scores = MORE similar (closer in vector space)  
- Score of 0 = identical vectors  
- Typical range: 0 to 2 (but can be higher)


Cosine similarity (if configured):  

- Higher scores = More similar  
- Range: -1 to 1 (1 being identical)

In [85]:
### Initialize LLM, RAG Chain, Prompt Template, Query the RAG System
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3.1:8b",    
    temperature=0.2,
    reasoning=False
)

In [86]:
test_response = llm.stream("What are Large Language Models")

for text in test_response:
    print(text.content, end='')

Large Language Models (LLMs) are a type of artificial intelligence (AI) designed to process and generate human-like language. They're a crucial component of natural language processing (NLP), which enables computers to understand, interpret, and respond to human language.

**Key characteristics:**

1. **Scale**: LLMs are trained on massive datasets, often in the order of tens or hundreds of gigabytes.
2. **Complexity**: These models have millions or even billions of parameters, allowing them to capture subtle patterns and relationships in language.
3. **Depth**: LLMs typically consist of multiple layers, each processing different aspects of the input data.

**How they work:**

1. **Training**: LLMs are trained on a large corpus of text data using various algorithms, such as masked language modeling or next sentence prediction.
2. **Self-supervised learning**: During training, the model learns to predict missing words or entire sentences in the input data.
3. **Fine-tuning**: After init

## Modern RAG Chain

In [87]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [88]:
## Convert vectorestore to retriever
retriever = vectore_store.as_retriever(
    search_kwarg={"k":3} # Retrieve top 3 chunks
)

In [89]:
## Create a prompt template
system_prompt = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say tha you don't know.
Use three sentences maximum and keep the answer concise.

Context: {context}
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")   
])

In [90]:
## Create document chain
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, just say tha you don't know.\nUse three sentences maximum and keep the answer concise.\n\nContext: {context}\n"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOllama(model='llama3.1:8b', reasoning=False, temperature=0.2)
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documents

**This chain**
- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response


In [100]:
### Create the final RAG Chain
rag_chain = create_retrieval_chain(retriever, document_chain)
result = rag_chain.invoke({"input":"What can you tell me about embedding?"})

result
    

{'input': 'What can you tell me about embedding?',
 'context': [Document(metadata={'source': 'data/doc_2.txt'}, page_content='Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques.'),
  Document(metadata={'source': 'data/doc_2.txt'}, page_content='Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications, such as information retrieval, sentiment analysis, and recommendation systems. Embeddings can be learned from large text corpora using deep learning techniques.'),
  Document(metadata={'source': 'data/doc_2.txt'}, page_content='Embeddings are numerical representations of words or phrases that capture their meaning and context. They are used in various applications

## Create RAG Chain Alternative - Using LCEL (LangChain Expression Language)

In [96]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

# Create custom prompt
custom_prompt = ChatPromptTemplate.from_template(
    """
    Use the following context to answer the question.
    If you do'nt know the answer based on the context, say you don't know.
    Provide specific details from the context to support your answer.
    
    Context:
    {context}
    
    Question: {question}
    
    Answer:
    """ 
)

In [97]:
## Format hte output documents for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [99]:
## Builkd the chain using LCEL
rag_chain_lcel = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | custom_prompt
    | llm
    | StrOutputParser()
)

result = rag_chain_lcel.invoke("What is deep learning")
result

'Deep learning is a type of machine learning that uses neural networks with multiple layers to learn and extract features from complex data, such as images, sound, and text. \n\nThis answer is supported by the context, which repeatedly states that "Deep learning is a type of machine learning...". This indicates that deep learning is a specific subset or category within the broader field of machine learning.'