### A RAG Application

In [10]:
import os
from dotenv import load_dotenv
## langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain.schema import Document

## vectorstores
from langchain_community.vectorstores import Chroma

## utility imports
import numpy as np
from typing import List

# RAG Architecture Overview

RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge

### 1.Document Loading

In [11]:
# Create sample documents for a RAG application
sample_docs = [
    """
    Remote Work Policy
    
    Employees may work remotely up to 3 days per week with manager approval.
    Core working hours are 10:00–16:00 local time, and VPN must be used for 
    all company data access. Internet reimbursement is capped at $40/month.
    """,

    """
    Refund & Return Policy
    
    Customers can return items within 30 days if unused and in original packaging. 
    Refunds are processed within 7 business days after inspection and issued 
    to the original payment method. Some items like gift cards and perishables 
    are non-returnable.
    """,

    """
    Aurora X1 Headphones — Specifications
    
    Drivers: 40mm neodymium  
    Frequency Response: 20Hz–40kHz  
    Connectivity: Bluetooth 5.3 + 3.5mm wired  
    Battery: 75h (ANC off), 45h (ANC on)  
    Charging: USB-C, 10 min fast charge → 5h playback  
    """,

    """
    Payments API — Create Charge
    
    Endpoint: POST /v1/charges  
    Required fields: amount (integer), currency (string), source (string).  
    Optional: capture (boolean).  
    Response: returns a charge ID and status (e.g., "succeeded").  
    Use Idempotency-Key header to prevent duplicate charges.  
    """,

    """
    SQL Joins — Cheatsheet
    
    INNER JOIN: rows with matches in both tables.  
    LEFT JOIN: all rows from left + matches from right.  
    RIGHT JOIN: all rows from right + matches from left.  
    FULL OUTER JOIN: union of matched and unmatched rows.  
    CROSS JOIN: Cartesian product.  
    """
]
sample_docs

['\n    Remote Work Policy\n\n    Employees may work remotely up to 3 days per week with manager approval.\n    Core working hours are 10:00–16:00 local time, and VPN must be used for \n    all company data access. Internet reimbursement is capped at $40/month.\n    ',
 '\n    Refund & Return Policy\n\n    Customers can return items within 30 days if unused and in original packaging. \n    Refunds are processed within 7 business days after inspection and issued \n    to the original payment method. Some items like gift cards and perishables \n    are non-returnable.\n    ',
 '\n    Aurora X1 Headphones — Specifications\n\n    Drivers: 40mm neodymium  \n    Frequency Response: 20Hz–40kHz  \n    Connectivity: Bluetooth 5.3 + 3.5mm wired  \n    Battery: 75h (ANC off), 45h (ANC on)  \n    Charging: USB-C, 10 min fast charge → 5h playback  \n    ',
 '\n    Payments API — Create Charge\n\n    Endpoint: POST /v1/charges  \n    Required fields: amount (integer), currency (string), source (stri

In [None]:
# Sample multi-domain documents for RAG
sample_docs = [
    """
    Remote Work Policy
    
    Employees may work remotely up to 3 days per week with manager approval.
    Core working hours are 10:00–16:00 local time. 
    VPN must be used for all company data access.
    Internet reimbursement is capped at $40/month.
    """,

    """
    Refund & Return Policy
    
    Items may be returned within 30 days if unused and in original packaging. 
    Refunds are processed within 7 business days after inspection. 
    Gift cards and perishable goods are non-returnable.
    """,

    """
    Aurora X1 Headphones — Specifications
    
    Drivers: 40mm neodymium  
    Frequency Response: 20Hz–40kHz  
    Connectivity: Bluetooth 5.3 + 3.5mm wired  
    Battery: 75h (ANC off), 45h (ANC on)  
    """,

    """
    Payments API — Create Charge
    
    POST /v1/charges  
    Required: amount, currency, source.  
    Optional: capture (boolean, default true).  
    Response: JSON with id, amount, currency, status.  
    """,

    """
    SQL Joins — Cheatsheet
    
    INNER JOIN: rows with matches in both tables.  
    LEFT JOIN: all rows from left + matches from right.  
    RIGHT JOIN: all rows from right + matches from left.  
    FULL OUTER JOIN: matched + unmatched from both.  
    """,

    """
    Random Forest — Interview Notes
    
    An ensemble of decision trees using bootstrap samples and feature subsampling.  
    Reduces variance compared to single trees.  
    Key parameters: n_estimators, max_depth, max_features.  
    """,

    """
    Sleep Hygiene Basics
    
    Keep consistent sleep/wake times.  
    Limit caffeine after early afternoon.  
    Make the bedroom cool, dark, and quiet.  
    Reserve bed for sleep and intimacy only.  
    """,

    """
    One-Pot Chickpea Curry Recipe
    
    Ingredients: onion, garlic, ginger, garam masala, turmeric, tomatoes, coconut milk, chickpeas, spinach.  
    Method: Sauté aromatics, add spices, simmer tomatoes, stir in coconut milk + chickpeas, finish with spinach.  
    """,

    """
    Tokyo Two-Day Itinerary
    
    Day 1: Asakusa (Senso-ji), Ueno Park, Akihabara.  
    Day 2: Meiji Shrine, Omotesando, teamLab Planets, Odaiba.  
    Use a Suica card for fast transit.  
    """,

    """
    Incident Runbook — API Latency Spikes
    
    Symptom: P95 latency > 1s for /v1/search.  
    Mitigation: scale pods +1, warm cache with top queries, check DB CPU and cache hit ratio.  
    """,

    """
    Smart Thermostat Quick Start
    
    1) Turn off HVAC power.  
    2) Label wires and mount base plate.  
    3) Connect wires to terminals.  
    4) Restore power and pair with mobile app.  
    """,

    """
    Fabrikam Press Release
    
    Fabrikam announces carbon-neutral shipping for all domestic orders starting Q4.  
    Offsets are verified under the Gold Standard.  
    Customers can track shipment-level carbon impact.  
    """,

    """
    Audit Logging — Admin Guide
    
    Prerequisite: Admin role.  
    Steps: Settings → Security → Audit Logs → Enable.  
    Choose retention (30, 90, 365 days).  
    Events: login, permission changes, exports.  
    """,

    """
    Billing FAQ
    
    Q: Can I change my billing cycle?  
    A: Yes, switch monthly ↔ annual anytime.  
    Q: Do you offer invoices?  
    A: Yes, downloadable PDFs under Billing → Invoices.  
    """,

    """
    Python Retry with Exponential Backoff
    
    import time, random  
    
    def retry(fn, attempts=5, base=0.2, jitter=0.1):  
        for i in range(attempts):  
            try:  
                return fn()  
            except Exception:  
                if i == attempts - 1:  
                    raise  
                sleep = base * (2 ** i) + random.uniform(0, jitter)  
                time.sleep(sleep)  
    """
]


In [None]:
## save sample documents to files
import tempfile
temp_dir=tempfile.mkdtemp()

for i, doc in enumerate(sample_docs):
    with open(f"{temp_dir}/doc_{i}.txt", "w", encoding="utf-8") as f:
        f.write(doc)

print(f"Sample document create in : {temp_dir}")

Sample document create in : C:\Users\moham\AppData\Local\Temp\tmpq_dh14v1


In [None]:
data_dir = "data"
os.makedirs(data_dir, exist_ok=True)

# Save each sample doc inside data/
for i, doc in enumerate(sample_docs):
    file_path = os.path.join(data_dir, f"doc_{i}.txt")
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(doc)
    print(f"✅ Saved: {file_path}")

✅ Saved: data\doc_0.txt
✅ Saved: data\doc_1.txt
✅ Saved: data\doc_2.txt
✅ Saved: data\doc_3.txt
✅ Saved: data\doc_4.txt
✅ Saved: data\doc_5.txt
✅ Saved: data\doc_6.txt
✅ Saved: data\doc_7.txt
✅ Saved: data\doc_8.txt
✅ Saved: data\doc_9.txt
✅ Saved: data\doc_10.txt
✅ Saved: data\doc_11.txt
✅ Saved: data\doc_12.txt
✅ Saved: data\doc_13.txt
✅ Saved: data\doc_14.txt


In [13]:
from langchain_community.document_loaders import DirectoryLoader,TextLoader

# Load documents from directory
loader = DirectoryLoader(
    "data", 
    glob="*.txt", 
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}
)
documents = loader.load()

print(f"Loaded {len(documents)} documents")
print(f"\nFirst document preview:")
print(documents[0].page_content[:200] + "...")


Loaded 15 documents

First document preview:

    Remote Work Policy

    Employees may work remotely up to 3 days per week with manager approval.
    Core working hours are 10:00–16:00 local time. 
    VPN must be used for all company data acce...


### 2.Initialize text splitter

In [14]:
# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Maximum size of each chunk
    chunk_overlap=50,  # Overlap between chunks to maintain context
    length_function=len,
    separators=[" "]  # Hierarchy of separators
)
chunks=text_splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 15 chunks from 15 documents

Chunk example:
Content: Remote Work Policy

    Employees may work remotely up to 3 days per week with manager approval.
    Core working hours are 10:00–16:00 local time. 
 ...
Metadata: {'source': 'data\\doc_0.txt'}


In [15]:
from sentence_transformers import SentenceTransformer
from langchain.embeddings import HuggingFaceEmbeddings
# Load the embedding model
embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')




  embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')


In [16]:
from langchain_community.vectorstores import Chroma
## Create a Chromdb vector store
persist_directory="./chroma_db"

## Initialize Chromadb with Open AI embeddings
vectorstore=Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=persist_directory,
    collection_name="rag_collection"

)

print(f"Vector store created with {vectorstore._collection.count()} vectors")
print(f"Persisted to: {persist_directory}")

Vector store created with 30 vectors
Persisted to: ./chroma_db


In [17]:
query="What are the maximum remote work days allowed per week?"

similar_docs=vectorstore.similarity_search(query,k=3)
similar_docs

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Remote Work Policy\n\n    Employees may work remotely up to 3 days per week with manager approval.\n    Core working hours are 10:00–16:00 local time. \n    VPN must be used for all company data access.\n    Internet reimbursement is capped at $40/month.'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Remote Work Policy\n\n    Employees may work remotely up to 3 days per week with manager approval.\n    Core working hours are 10:00–16:00 local time. \n    VPN must be used for all company data access.\n    Internet reimbursement is capped at $40/month.'),
 Document(metadata={'source': 'data\\doc_12.txt'}, page_content='Audit Logging — Admin Guide\n\n    Prerequisite: Admin role.  \n    Steps: Settings → Security → Audit Logs → Enable.  \n    Choose retention (30, 90, 365 days).  \n    Events: login, permission changes, exports.')]

In [18]:
results_scores=vectorstore.similarity_search_with_score(query,k=3)
results_scores

[(Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Remote Work Policy\n\n    Employees may work remotely up to 3 days per week with manager approval.\n    Core working hours are 10:00–16:00 local time. \n    VPN must be used for all company data access.\n    Internet reimbursement is capped at $40/month.'),
  0.6362245082855225),
 (Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Remote Work Policy\n\n    Employees may work remotely up to 3 days per week with manager approval.\n    Core working hours are 10:00–16:00 local time. \n    VPN must be used for all company data access.\n    Internet reimbursement is capped at $40/month.'),
  0.6362245082855225),
 (Document(metadata={'source': 'data\\doc_12.txt'}, page_content='Audit Logging — Admin Guide\n\n    Prerequisite: Admin role.  \n    Steps: Settings → Security → Audit Logs → Enable.  \n    Choose retention (30, 90, 365 days).  \n    Events: login, permission changes, exports.'),
  1.6702752113342285)]

In [19]:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",   # or "gemini-1.5-pro", "gemini-1.5-flash"
    temperature=0.2
)


In [20]:
response = llm.invoke("Explain retrieval augmented generation (RAG) in simple terms.")
print(response.content)

Imagine you're writing a report, but you need to check facts and find specific information.  Instead of searching the internet yourself and pasting things in, you have a super-smart assistant.  That assistant has access to a huge library of documents.  When you ask it a question, it doesn't just guess the answer; it searches its library, finds the relevant documents, and uses the information from those documents to create your report.

That's RAG in a nutshell.  It's a way for AI to generate text by first retrieving relevant information from a knowledge base (the library) before generating the final output.  This makes the AI more accurate, reliable, and less prone to "hallucinations" (making things up).


In [21]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [22]:
## Convert vector store to retriever
retriever=vectorstore.as_retriever(
    search_kwarg={"k":3} ## Retrieve top 3 relevant chunks
)
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x00000140055F70E0>, search_kwargs={})

In [23]:
## Create a prompt template
from langchain_core.prompts import ChatPromptTemplate
system_prompt="""You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

In [24]:
### Create a document chain
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain=create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', google_api_key=SecretStr('**********'), temperature=0.2, client=<google.ai.g

In [25]:
### Create The Final RAG Chain
from langchain.chains import create_retrieval_chain
rag_chain=create_retrieval_chain(retriever,document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x00000140055F70E0>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you

In [26]:
response=rag_chain.invoke({"input":"What events are captured in audit logs?"})

In [27]:
response

{'input': 'What events are captured in audit logs?',
 'context': [Document(metadata={'source': 'data\\doc_12.txt'}, page_content='Audit Logging — Admin Guide\n\n    Prerequisite: Admin role.  \n    Steps: Settings → Security → Audit Logs → Enable.  \n    Choose retention (30, 90, 365 days).  \n    Events: login, permission changes, exports.'),
  Document(metadata={'source': 'data\\doc_12.txt'}, page_content='Audit Logging — Admin Guide\n\n    Prerequisite: Admin role.  \n    Steps: Settings → Security → Audit Logs → Enable.  \n    Choose retention (30, 90, 365 days).  \n    Events: login, permission changes, exports.'),
  Document(metadata={'source': 'data\\doc_9.txt'}, page_content='Incident Runbook — API Latency Spikes\n\n    Symptom: P95 latency > 1s for /v1/search.  \n    Mitigation: scale pods +1, warm cache with top queries, check DB CPU and cache hit ratio.'),
  Document(metadata={'source': 'data\\doc_9.txt'}, page_content='Incident Runbook — API Latency Spikes\n\n    Symptom: P

In [28]:
response["answer"]

'Audit logs capture login events, permission changes, and exports.  To enable them, an admin user must navigate to Settings → Security → Audit Logs and enable the feature.  Retention periods of 30, 90, or 365 days can be selected.'