**RAG (Retrieval-Augmented Generation)** is a powerful technique that combines the strengths of retrieval systems (like vector databases storing embeddings) with large language models (LLMs). The goal is to provide the LLM with relevant, external information so it can generate more accurate, up-to-date, and grounded responses, reducing hallucinations.

Here's an example of a RAG pipeline using LangChain, with gemini-embedding-001 for embeddings and gemini-1.5-pro for generation.

**Core Components of a RAG Pipeline:**

- Document Loading: Loading data (PDFs, text files, web pages, etc.).
- Document Splitting (Chunking): Breaking down large documents into smaller, manageable chunks.
- Embedding Generation: Converting these text chunks into numerical vector representations (gemini-embedding-001).
- Vector Store Storage: Storing these embeddings in a vector database for efficient similarity search (e.g., FAISS, Pinecone, Chroma).
- Retrieval: Given a user query, finding the most semantically similar document chunks from the vector store.
- Augmentation: Passing the retrieved chunks along with the user's query to the LLM.
- Generation: The LLM generates a response based on the provided context and the query.

In [2]:
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS # Using FAISS as a simple in-memory vector store
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import SystemMessage, HumanMessage

load_dotenv() # Load your API key from .env


True

In [10]:
# --- 0. Setup ---
# Initialize LLM and Embedding Model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.2) # Lower temperature for more factual answers in RAG
embeddings_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# verify if models are loaded
print(f"LLM: {llm.model}, Embeddings Model: {embeddings_model.model}")


LLM: models/gemini-1.5-pro, Embeddings Model: models/embedding-001


In [None]:
# --- 1. Document Loading ---
# Create a dummy text file for demonstration
dummy_text_content = """
The Amazon River is the largest river by discharge volume of water in the world, and by some definitions, it is the longest.
It flows through South America, primarily through Brazil, Peru, and Colombia.
Its drainage basin is the largest in the world, about 7.05 million square kilometers (2.72 million square miles).
The Amazon rainforest, which it flows through, is the largest tropical rainforest on Earth.
Deforestation in the Amazon is a significant environmental concern, impacting biodiversity and climate.
Many unique species of flora and fauna, including the pink river dolphin and various types of monkeys, inhabit the Amazon.
The river's source is generally considered to be in the Andes Mountains of Peru.
"""
# save the dummy text to a file
with open("amazon_river_info.txt", "w") as f:
    f.write(dummy_text_content)

In [7]:
loader = TextLoader("amazon_river_info.txt")
documents = loader.load()
print(f"Loaded {len(documents)} document(s).")
print(f"First 200 chars of document: {documents[0].page_content[:200]}...")

Loaded 1 document(s).
First 200 chars of document: 
The Amazon River is the largest river by discharge volume of water in the world, and by some definitions, it is the longest.
It flows through South America, primarily through Brazil, Peru, and Colomb...


In [8]:
# --- 2. Document Splitting (Chunking) ---
# RecursiveCharacterTextSplitter is good for maintaining semantic coherence
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500, # Max size of each chunk
    chunk_overlap=50, # Overlap between chunks to maintain context
    length_function=len,
    is_separator_regex=False,
)
chunks = text_splitter.split_documents(documents)
print(f"\nSplit into {len(chunks)} chunks.")
print(f"First chunk content: {chunks[0].page_content}")


Split into 2 chunks.
First chunk content: The Amazon River is the largest river by discharge volume of water in the world, and by some definitions, it is the longest.
It flows through South America, primarily through Brazil, Peru, and Colombia.
Its drainage basin is the largest in the world, about 7.05 million square kilometers (2.72 million square miles).
The Amazon rainforest, which it flows through, is the largest tropical rainforest on Earth.


In [11]:
# --- 3. Embedding Generation & 4. Vector Store Storage ---
# Create an in-memory FAISS vector store. For production, use persistent stores like Pinecone, Chroma, etc.
print("\nCreating vector store from chunks and embeddings...")
vectorstore = FAISS.from_documents(chunks, embeddings_model)
print("Vector store created successfully!")


Creating vector store from chunks and embeddings...
Vector store created successfully!


In [12]:
# --- 5. Retrieval ---
# Create a retriever from the vector store
retriever = vectorstore.as_retriever(search_kwargs={"k": 2}) # Retrieve top 2 most relevant chunks
print("\nRetriever created. Ready to fetch relevant chunks.")


Retriever created. Ready to fetch relevant chunks.


In [19]:
# --- 6. Augmentation

# Define a prompt template for the LLM that includes context
rag_prompt = ChatPromptTemplate.from_messages([
    ('system',"You are a helpful assistant for question-answering tasks. "
                          "Use the following retrieved context to answer the question. "
                          "If you don't know the answer, say that you don't know."),
    ('human', "Context: {context}\nQuestion: {question}")
])


In [21]:
# --- 7. Generation (RAG Chain Construction) ---

# Create a chain to combine documents for the LLM
# This Runnable will format the retrieved documents into a single string for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Construct the RAG chain
# 1. User question comes in
# 2. It's passed to the retriever to get relevant documents (context)
# 3. Both the original question and the retrieved context are passed to the prompt
# 4. The prompt is sent to the LLM
# 5. The LLM's response is parsed
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)
print("\n--- RAG Pipeline Ready ---")


--- RAG Pipeline Ready ---


In [22]:
# --- Test the RAG Pipeline ---

# Example 1: Question that can be answered from the provided text
question_1 = "What is considered the source of the Amazon River?"
print(f"\nQuestion 1: {question_1}")
response_1 = rag_chain.invoke(question_1)
print(f"Answer 1: {response_1}")


Question 1: What is considered the source of the Amazon River?
Answer 1: The Andes Mountains of Peru are generally considered the source of the Amazon River.


In [23]:
# Example 2: Question that requires understanding from the context
question_2 = "What are some environmental issues associated with the Amazon?"
print(f"\nQuestion 2: {question_2}")
response_2 = rag_chain.invoke(question_2)
print(f"Answer 2: {response_2}")


Question 2: What are some environmental issues associated with the Amazon?
Answer 2: Deforestation is a significant environmental issue associated with the Amazon.


In [24]:
# --- Test Question 3 (same as before) ---
question_3 = "Who discovered the Amazon River?"
print(f"\nQuestion 3: {question_3}")
response_3 = rag_chain.invoke(question_3)
print(f"Answer 3: {response_3}")


Question 3: Who discovered the Amazon River?
Answer 3: The provided text doesn't state who discovered the Amazon River.
