# Building a Retrieval-Augmented Generation (RAG) System

This tutorial demonstrates how to build a RAG system using LangChain and Google's embedding models. RAG combines retrieval of relevant documents with generative AI to provide contextually accurate responses.

**What you'll learn:**
- Document loading and chunking strategies
- Vector embeddings and similarity search
- Building a retrieval system with ChromaDB
- Querying knowledge bases effectively

## Step 1: Installing Dependencies

We need several LangChain components for our RAG system:
- **langchain-google-genai**: Google's embedding models for vector representations
- **langchain-community**: Community tools including ChromaDB vector store
- **chromadb**: In-memory vector database for storing and searching embeddings
- **python-dotenv**: Environment variable management for API keys

In [1]:
#!pip install langchain langchain-community langchain-google-genai langchain-core langchain-text-splitters chromadb python-dotenv

## Step 2: Import Required Libraries

Setting up the core components for our RAG pipeline:
- **Embeddings**: Convert text to numerical vectors for similarity search
- **Vector Store**: Database to store and retrieve document embeddings
- **Text Splitters**: Break large documents into manageable chunks
- **Document Loaders**: Read and process text files

In [None]:
# Import essential libraries for RAG system
from langchain_google_genai import GoogleGenerativeAIEmbeddings  # Google's embedding model
from langchain_community.vectorstores import Chroma             # Vector database
from langchain_core.documents import Document                   # Document structure
from langchain_text_splitters import CharacterTextSplitter     # Text chunking
from langchain_community.document_loaders import TextLoader    # File loading
from dotenv import load_dotenv                                  # Environment variables
import os


True

## Step 3: Document Loading

Loading our knowledge base from a text file. The TextLoader reads the entire file content and creates Document objects that contain both the text and metadata. This forms the foundation of our knowledge base.

In [None]:
# Load the laptop data from text file
# TextLoader reads the entire file and creates Document objects
loader = TextLoader("laptops_info.txt")
raw_docs = loader.load()


## Step 4: Text Chunking Strategy

Breaking documents into smaller chunks improves retrieval accuracy. We use:
- **chunk_size=500**: Each chunk contains ~500 characters (manageable for embeddings)
- **chunk_overlap=50**: Overlapping text preserves context between chunks
This ensures we don't lose important information at chunk boundaries.

In [None]:
# Split documents into chunks for better retrieval
# Smaller chunks = more precise retrieval, larger chunks = more context
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(raw_docs)
print("Total number of documents: ",len(docs))

print("printing one of the document........")
print(docs[2])


Total number of documents:  5
printing one of the document........
page_content='3. ASUS TUF Gaming F15
- Price: â‚¹79,990
- CPU: Intel i5 11400H
- GPU: NVIDIA RTX 3050 (4GB)
- RAM: 16GB DDR4
- Storage: 512GB SSD
- Good for: Deep learning models, parallel processing
- Comments: Rugged build, best performance for the price

4. Acer Aspire 7
- Price: â‚¹62,990
- CPU: AMD Ryzen 5 5500U
- GPU: NVIDIA GTX 1650
- RAM: 8GB
- Storage: 512GB SSD
- Good for: Intro ML, data science
- Comments: Value for money, limited by RAM/GPU' metadata={'source': 'laptops_info.txt'}


## Step 5: Vector Embeddings Setup

Embeddings convert text into numerical vectors that capture semantic meaning. We use Google's embedding-001 model which:
- Creates 768-dimensional vectors
- Captures semantic relationships between words
- Enables similarity search based on meaning, not just keywords
This allows us to find relevant documents even when query terms don't exactly match.

In [None]:
# Initialize Google's embedding model for vector representations
# Get API key from: https://ai.google.dev/gemini-api/docs/api-key
from dotenv import load_dotenv
load_dotenv() 

import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# Create embedding model instance
embedding_model = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",  # Google's latest embedding model
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

# Test embedding creation - converts text to 768-dimensional vector
vector = embedding_model.embed_query("hello, world!")
print("example embeddings........")
print(vector[:5])  # Show first 5 dimensions
len(vector)        # Full vector length: 768

example embeddings........
[0.05636945366859436, 0.004828543867915869, -0.07625909894704819, -0.023642510175704956, 0.053293220698833466]


768

## Step 6: Vector Store Creation

ChromaDB creates a vector database from our document chunks. It:
- Generates embeddings for each chunk using our embedding model
- Stores both the vectors and original text
- Builds indexes for fast similarity search
This creates a searchable knowledge base where we can find relevant information quickly.

In [None]:
# Create vector store from documents and embeddings
# ChromaDB automatically generates embeddings for all document chunks
vectorstore = Chroma.from_documents(documents=docs, embedding=embedding_model)

## Step 7: Retriever Configuration

The retriever is our search interface to the vector store. Configuration:
- **search_type="similarity"**: Finds documents most similar to the query
- **k=2**: Returns top 2 most relevant chunks
This balance ensures we get relevant context without overwhelming the system with too much information.

In [None]:
# Configure retriever for similarity search
# k=2 means we'll get the 2 most relevant document chunks
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 2})

## Step 8: Query Execution

Testing our RAG system with a specific question. The retriever:
1. Converts the query to an embedding vector
2. Searches for similar vectors in the database
3. Returns the most relevant document chunks
This demonstrates the core RAG retrieval functionality.

In [None]:
# Test our RAG system with a specific query
query = "Which laptop is best for machine learning under ₹80,000?"
retrieved_docs = retriever.invoke(query)
print(f"Retrieved {len(retrieved_docs)} relevant documents")


2


## Step 9: Results Analysis

Examining the retrieved chunks helps us understand:
- How well our chunking strategy worked
- Whether relevant information was found
- The quality of semantic matching
These chunks would typically be fed to a language model to generate a comprehensive answer.

In [None]:
# Display the retrieved chunks to analyze retrieval quality
print("\nTop Retrieved Chunks:")
for i, doc in enumerate(retrieved_docs):
    print(f"\nChunk {i+1}:\n{doc.page_content}")
    print("-" * 50)  # Separator for clarity


Top Retrieved Chunks:

Chunk 1:
# Laptop Buying Tips for AI/ML (2024)
- Prefer 16GB RAM or more
- Look for NVIDIA GPUs like GTX 1650, RTX 3050 or better
- Avoid integrated graphics for training models
- SSD preferred for fast data access
- Ryzen 5, i5 H-series or better recommended

Chunk 2:
# Laptop Buying Tips for AI/ML (2024)
- Prefer 16GB RAM or more
- Look for NVIDIA GPUs like GTX 1650, RTX 3050 or better
- Avoid integrated graphics for training models
- SSD preferred for fast data access
- Ryzen 5, i5 H-series or better recommended
