*#### RAG stands for Retrieval Augmented Generation is a powerful technique that combines the capabiity of LLM with external knowlwdge retrieval.*

A typical RAG pipeline contains:


✅Collect documents → Load data (PDFs, Word, CSVs, APIs, etc.).

✅Preprocess → Clean, normalize, and enrich with metadata.

✅Chunking → Split into small overlapping pieces (e.g., 500–1000 tokens).

✅Embedding → Convert each chunk into dense vectors using an embedding model.

✅Vector Store (Indexing) → Store embeddings + metadata in a vector database (FAISS, Pinecone, Weaviate, Chroma, etc.).

✅User Query → Convert query into an embedding vector.

✅Retrieval (via Vector Store) → Search the vector store to get top-k most similar chunks.

✅Augmentation → Add retrieved chunks as context to the query.

✅Generation → Pass augmented input into an LLM (Groq, OpenAI, Anthropic, etc.) to generate the grounded answer.

✅Post-processing → Rerank, format, or cite sources from vector store metadata.


*Collect → Preprocess → Chunk → Embed → Vector Store → Query → Retrieve → Augment → Generate → Post-process*

* Langchain: A framework for developing application powered by language models

* Chroma DB: A open-source vector database for storing and retrieval embeddings

* GROQ AI: A free LPI tool used for inferencing embedding and llm models efficiently and at free of cost

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
# Access keys
groq_api_key = os.getenv("GROQ_AI_API")
hf_api_key = os.getenv("HUGGINGFACE_API_KEY")

In [2]:
# langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader

# free llm and embedding models
from langchain_groq import ChatGroq # for llm
from langchain_huggingface import HuggingFaceEmbeddings # for embeddings

# vector store
from langchain_community.vectorstores import Chroma

# utility imports
import numpy as np
from typing import List

In [None]:
#1 document loading
from langchain_community.document_loaders import DirectoryLoader

# loads document from directory

loader =DirectoryLoader(
    path="data/text_files",
    glob="*.txt",
    loader_cls=TextLoader,
    loader_kwargs={"encoding":"utf8"}
)

documents = loader.load()

print(f"Number of documents loaded: {len(documents)}")
print(f"First document content: {documents[0].page_content[0:200]}")

Number of documents loaded: 3
First document content: Artificial Intelligence (AI)

Artificial Intelligence is the science and engineering of creating machines or systems that mimic human intelligence. It encompasses reasoning, problem-solving, decision-


In [None]:
#2 documents/ text splitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n",". ", "","\t"]
)

chunks = text_splitter.split_documents(documents)

print(f"Created chunks {len(chunks)} from {len(documents)} documents")  
print(f"First chunk: {chunks[0].page_content[0:200]}")
print(f"Last chunk: {chunks[-1].page_content[0:200]}")

Created chunks 20 from 3 documents
First chunk: Artificial Intelligence (AI)

Artificial Intelligence is the science and engineering of creating machines or systems that mimic human intelligence. It encompasses reasoning, problem-solving, decision-
Last chunk: As the ecosystem grows, LangChain continues to integrate with cutting-edge vector stores, model providers, and monitoring tools—making it the backbone of modern LLM application development.


In [14]:
# 3 Embedding Models

from langchain_community.embeddings import HuggingFaceEmbeddings

# Simple HuggingFace Embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"  # lightweight & fast
)


In [15]:
# Set up Chroma vector store and index the document chunks

persist_directory = "data/chroma_db"

vector_store = Chroma.from_documents(
    documents=chunks,
    persist_directory=persist_directory,
    embedding=embeddings,
    collection_name="RAG_collection"
)

print(f"Vector store created at: {persist_directory}")
print(f"Vector Store created with {vector_store._collection.count()} vectors")

Vector store created at: data/chroma_db
Vector Store created with 20 vectors


In [None]:
# Testing my vector store - Basic level of similarity search
query = "Where do we use Langchain "
results = vector_store.similarity_search(query, k=3)
results

[Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='As the ecosystem grows, LangChain continues to integrate with cutting-edge vector stores, model providers, and monitoring tools—making it the backbone of modern LLM application development.'),
 Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='📘 LangChain for LLM Applications\nIntroduction to LangChain\n\nLangChain is an open-source framework designed to simplify the development of applications powered by Large Language Models (LLMs). It allows developers to connect LLMs with external data sources, APIs, and computational tools, enabling the creation of production-ready AI solutions.'),
 Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='Instead of interacting with LLMs in isolation, LangChain provides modular components that allow chaining of prompts, memory, agents, and external integrat

In [22]:
# Advanced Similarity Search with score

query = "Where do we use Langchain "
results = vector_store.similarity_search_with_score(query, k=3)
results

[(Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='As the ecosystem grows, LangChain continues to integrate with cutting-edge vector stores, model providers, and monitoring tools—making it the backbone of modern LLM application development.'),
  0.7059235572814941),
 (Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='📘 LangChain for LLM Applications\nIntroduction to LangChain\n\nLangChain is an open-source framework designed to simplify the development of applications powered by Large Language Models (LLMs). It allows developers to connect LLMs with external data sources, APIs, and computational tools, enabling the creation of production-ready AI solutions.'),
  0.8082253932952881),
 (Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='Instead of interacting with LLMs in isolation, LangChain provides modular components that allow chaining 

# Similarity Metrics in ChromaDB

#### Understanding Similarity Score

## L2 Distance (Euclidean Distance) - Default
- **Definition**: Straight-line distance between vectors (L2 norm).
- **Interpretation**: 
  - Lower scores = more similar.
  - 0 = identical.
  - Typical range: 0-2 (can be higher).
- **Use Case**: Good for clustering or nearest-neighbor searches.

## Cosine Similarity (If Configured)
- **Definition**: Cosine of angle between vectors, ignores magnitude.
- **Interpretation**: 
  - Higher scores = more similar.
  - 1 = identical.
  - Range: -1 to 1.
- **Use Case**: Ideal for text analysis where direction matters more than length.

## Key Differences
- **L2**: Lower-is-better, sensitive to magnitude.
- **Cosine**: Higher-is-better, direction-only.
- ChromaDB defaults to L2; cosine is configurable.

## Practical Notes
- Choose metric based on data/task (e.g., cosine for text, L2 for images).
- Outliers possible with high-dimensional data.

In [26]:
# Initializing the LLM

from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama-3.1-8b-instant",
    api_key=groq_api_key,
    temperature=0.2
)

In [None]:
# testing llm
test_response = llm.predict("How many states are in india including ut")
print(test_response)

  test_response = llm.predict("How many states are in india including ut")


There are 28 states in India. 

However, if you are asking about the total number of union territories (UTs) in India, there are 8 union territories. 

So, the total number of states and union territories in India is 28 (states) + 8 (union territories) = 36.

Here's the list of states and union territories in India:

**States:**

1. Andhra Pradesh
2. Arunachal Pradesh
3. Assam
4. Bihar
5. Chhattisgarh
6. Goa
7. Gujarat
8. Haryana
9. Himachal Pradesh
10. Jharkhand
11. Karnataka
12. Kerala
13. Madhya Pradesh
14. Maharashtra
15. Manipur
16. Meghalaya
17. Mizoram
18. Nagaland
19. Odisha
20. Punjab
21. Rajasthan
22. Sikkim
23. Tamil Nadu
24. Telangana
25. Tripura
26. Uttar Pradesh
27. Uttarakhand
28. West Bengal

**Union Territories:**

1. Andaman and Nicobar Islands
2. Chandigarh
3. Dadra and Nagar Haveli and Daman and Diu
4. Delhi
5. Jammu and Kashmir
6. Ladakh
7. Lakshadweep
8. Puducherry


#### Modern  RAG Chain

In [29]:
# convert vector store to retriever

retriever = vector_store.as_retriever(
    search_kwargs={"k": 3} #retrieve top 3 relevant chunks
)

retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x00000239C08BB4D0>, search_kwargs={'k': 3})

In [30]:
# Create a prompt template

from langchain_core.prompts import ChatPromptTemplate

system_prompt="""You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

##### What is create_stuff_documents_chain?
create_stuff_documents_chain creates a chain that "stuffs" (inserts) all retrieved documents into a single prompt and sends it to the LLM. It's called "stuff" because it literally stuffs all the documents into the context window at once.

In [None]:
### Create a document chain

from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain=create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000239ED279940>, async_client=<groq.resources.chat.comple

This chain:

- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response

#### What is create_retrieval_chain?
create_retrieval_chain is a function that combines a retriever (which fetches relevant documents) with a document chain (which processes those documents with an LLM) to create a complete RAG pipeline.

In [32]:
### Create The Final RAG Chain

from langchain.chains import create_retrieval_chain
rag_chain=create_retrieval_chain(retriever,document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x00000239C08BB4D0>, search_kwargs={'k': 3}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \n

In [None]:
# testing the RAG chain for demo query
response=rag_chain.invoke({"input":"What is Langchain?"})
response

{'input': 'What is Langchain?',
 'context': [Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='📘 LangChain for LLM Applications\nIntroduction to LangChain\n\nLangChain is an open-source framework designed to simplify the development of applications powered by Large Language Models (LLMs). It allows developers to connect LLMs with external data sources, APIs, and computational tools, enabling the creation of production-ready AI solutions.'),
  Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='As the ecosystem grows, LangChain continues to integrate with cutting-edge vector stores, model providers, and monitoring tools—making it the backbone of modern LLM application development.'),
  Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='Instead of interacting with LLMs in isolation, LangChain provides modular components that allow chaining of 

In [37]:
print(response)

{'input': 'What is Langchain?', 'context': [Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='📘 LangChain for LLM Applications\nIntroduction to LangChain\n\nLangChain is an open-source framework designed to simplify the development of applications powered by Large Language Models (LLMs). It allows developers to connect LLMs with external data sources, APIs, and computational tools, enabling the creation of production-ready AI solutions.'), Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='As the ecosystem grows, LangChain continues to integrate with cutting-edge vector stores, model providers, and monitoring tools—making it the backbone of modern LLM application development.'), Document(metadata={'source': 'data\\text_files\\LangChain for LLM Applications.txt'}, page_content='Instead of interacting with LLMs in isolation, LangChain provides modular components that allow chaining of promp

In [38]:
response['answer']


'LangChain is an open-source framework designed to simplify the development of applications powered by Large Language Models (LLMs). It connects LLMs with external data sources, APIs, and computational tools. This enables the creation of production-ready AI solutions.'

In [39]:
# Function to query the modern RAG system
def query_rag_modern(question):
    print(f"Question: {question}")
    print("-" * 50)
    
    # Using create_retrieval_chain approach
    result = rag_chain.invoke({"input": question})
    
    print(f"Answer: {result['answer']}")
    print("\nRetrieved Context:")
    for i, doc in enumerate(result['context']):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")
    
    return result

# Test queries
test_questions = [
    "What are the three types of machine learning?",
    "What is deep learning and how does it relate to neural networks?",
    "What are CNNs best used for?"
]

for question in test_questions:
    result = query_rag_modern(question)
    print("\n" + "="*80 + "\n")

Question: What are the three types of machine learning?
--------------------------------------------------
Answer: The three types of machine learning are: 

1. Supervised Learning: Uses labeled data to make predictions.
2. Unsupervised Learning: Finds hidden structures in unlabeled data.
3. Reinforcement Learning: Agents learn by trial-and-error using rewards and penalties.

Retrieved Context:

--- Source 1 ---
Machine Learning (ML)

Machine Learning is a subset of AI where systems learn from data and improve their performance automatically. Instead of being hard-coded with rules, ML models use algorithms to...

--- Source 2 ---
General AI (Strong AI): Hypothetical AI that could perform any intellectual task a human can do.

Superintelligent AI: A future possibility where AI surpasses human intelligence.

Applications: medica...

--- Source 3 ---
Reinforcement Learning: Agents learn by trial-and-error using rewards and penalties (e.g., AlphaGo).

Applications: recommendation systems (