# **Retrieval-Augmented Generation (RAG) Pipeline Components**





## **Overview**

RAG pipelines combine document retrieval with generation, enabling language models to provide grounded, context-aware answers. This section breaks down each component in the pipeline, showing their roles, how they work, and example code using LangChain.




## **1. Document Loader**

**Purpose:**  
Load various document formats (TXT, PDF, DOCX, HTML, etc.) into LangChain `Document` objects for processing.

**Function:**  
Transforms raw files into structured text chunks that the pipeline can work with.



**Other Loaders:**

* `PyPDFLoader` — loads PDF files  
* `UnstructuredLoader` — handles HTML, DOCX, and other complex formats

These loaders parse documents and extract text for further processing.

In [1]:
!pip install -U langchain-community --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m88.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m44.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.[0m[31m


In [5]:
from langchain_community.document_loaders import TextLoader

# Create an instance for the specified file
loader = TextLoader("Company_sample.txt")

# load the content fof the file
docs = loader.load()

In [6]:
print(docs[0].page_content)

ACME Corporation - Company Overview and Operations Manual
Version 2.1 - Last Updated: January 2024

1. COMPANY MISSION AND VISION
   Our mission is to deliver innovative technology solutions that empower
   businesses to achieve their full potential. We envision a future where
   technology seamlessly integrates with human creativity to solve complex
   challenges and drive sustainable growth.

2. COMPANY HISTORY
   ACME Corporation was founded in 2010 by a team of passionate engineers
   and business leaders. Starting with just five employees in a small office,
   we have grown to become a leading technology solutions provider with
   over 500 employees across 12 countries. Our journey has been marked by
   continuous innovation, strategic partnerships, and an unwavering commitment
   to customer success.

3. CORE VALUES
   Innovation: We embrace new ideas and technologies that push boundaries.
   Integrity: We conduct business with honesty, transparency, and ethical
   standards in a



## **2. Text Splitter**

**Purpose:**  
Split large documents into smaller, manageable chunks.

**Function:**  
Smaller chunks improve embedding quality and retrieval accuracy.

**Example:**


In [7]:
# Step # 1 = Library Import
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Step # 2 = Create a Splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
# chunk_size = number of letters
# chunk_overlap = each piece shares 50 letters with the next one

# step 3 : Cut the document into chunks
chunks = splitter.split_documents(docs)

**Why overlap?**  
Overlap (e.g., 50 characters) preserves context between chunks, helping the retriever maintain continuity in answers.

## **3. Embeddings**

**Purpose:**  
Convert text chunks into numerical vectors that capture semantic meaning.

**Function:**  
Transforms words and sentences into dense vectors so that similar texts have nearby vectors.

**Example:**


In [8]:
%pip install -U langchain-google-genai --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/63.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.6/63.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/475.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m475.8/475.8 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [9]:
# Step-1 import the library
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model= "models/gemini-embedding-001",
    google_api_key = "AIzaSyAaRCWlyIQyR5YdahHyKhjFpRNNqX2wMmY"
)

# We an convert the words into numbers

**Concept:**

* Related terms like “artificial intelligence” and “machine learning” have close vectors  
* Unrelated terms like “pizza” and “equation” have distant vectors

Embeddings allow semantic search beyond exact keyword matching.



## **4. Vector Store (FAISS)**

**Purpose:**  
Store embeddings and enable fast similarity search.

**Function:**  
Index vector embeddings so queries can quickly retrieve the nearest (most relevant) documents.

**Example:**


In [10]:
%pip install faiss-cpu --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m83.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [11]:
from langchain_community.vectorstores import FAISS

# Step 1 = Create Chunks
# Step 2 = Create embeddings like [0.1,0.2,0.9,12.1]
# Step 3 = Store the embeddings into vectors store
# Step 4 = Remember each chunk, and its embedding


vectorstore = FAISS.from_documents(chunks,embeddings)

**Features of FAISS:**

* Efficient k-nearest neighbor search  
* Fast, in-memory vector similarity retrieval  
* Scalable to millions of documents

## 5. Retriever

**Purpose:**  
Fetch the top relevant document chunks from the vector store based on a user query.

**Function:**  
Given a query, it returns the most semantically relevant text pieces.

**Example:**


In [12]:
# Without retriever
# You: I need books about dogs
# Library : I have 10,000 books, take them
# You: Too much information

# Retirever
# You: I need books about dogs
# Library : here are the best 3 books about dogs
# You: Perfect

# Ask a question: "What is a cat"
# Goes into vector store
# Retrieves the best 3 relevant chunks

In [13]:
retriever = vectorstore.as_retriever(
    search_type = "similarity",
    k = 3
)

**Retrieval methods:**

* **Similarity Search:** Returns top-k closest vectors.  




## 6. Prompt Template

**Purpose:**  
Control how retrieved documents and user query are presented to the language model.

**Function:**  
Formats the prompt to optimize generation quality and context usage.

**Example:**


In [14]:
from langchain_core.prompts import PromptTemplate

# Prompt Template = Fill in the blank worksheet for AI
# We are making a recpipe that tells AI: "Here is the information, here is the user question, now answer it"

# Create a prompt template
prompt_template = PromptTemplate(
    # There are two empty spaces in this worksheet
    # context = The information from our documents
    # question = what is the user question
    input_variables = {"context", "question"},

    template = """You are a helpful assistant that answers questions based on the provided context.

    Context:
    {context}

    Question: {question}

    Answer: Provide a clear and concise answer based on the context above, if the context doesn't contain enough information to answer the answer then say so"""

)

# Test our template
# fill the Worksheet with information
test_prompt = prompt_template.format(
    context="ACME corporation was founded in 2010.",
    question = "When was the company founded"
)

# Print the answer
print(test_prompt)


You are a helpful assistant that answers questions based on the provided context.

    Context:
    ACME corporation was founded in 2010.

    Question: When was the company founded

    Answer: Provide a clear and concise answer based on the context above, if the context doesn't contain enough information to answer the answer then say so


Custom prompts help guide the LLM’s reasoning or style.



## **7. LLM (Generator) – GeminiAI**

**Purpose:**  
Generate final answers, summaries, or responses based on the retrieved context.

**Function:**  
Consumes the combined prompt (retrieved text + query) and outputs human-like text.

**Example:**


In [19]:
from langchain_google_genai import ChatGoogleGenerativeAI

# We have to connect our pipeline to an LLM
# It will read the context and the user question and will respond

llm = ChatGoogleGenerativeAI(
    model = "gemini-2.5-flash",
    google_api_key = "AIzaSyBgEamZfyVXET4jo6UlFroPr8Rn6z3cA6s",
    temperature = 0.5,
    convert_system_message_to_human = True # Talk to it like a person
)

response = llm.invoke("What is the capital of france?")
print(response.content)

The capital of France is **Paris**.


Parameters like `temperature` control creativity vs. precision.



## 8. RAG Chain (RetrievalQA)

**Purpose:**  
Combine retriever and generator into an end-to-end pipeline.

**Function:**  
Automatically retrieve relevant documents and generate a grounded answer in one step.

**Example:**


In [20]:
# Step # 1 = Look in the vector store to find the best possible answers/context
# Step # 2 = Read the answers
# Step # 3 = Tells you the answer

def rag(query):
  # Step 1: Look in the vector store
  docs = retriever.invoke(query)

  # Step 2: Combine all of these chunks, text
  context = "\n\n".join([doc.page_content for doc in docs])

  # Step # 3: Fill in the prompt template
  prompt = prompt_template.format(context=context, question=query)

  # Step 4: Ask the LLM to combine the responses and give an asnwer
  response = llm.invoke(prompt)
  answer = response.content

  return answer


In [21]:
rag("what is the name of company")

'The name of the company is ACME Corporation.'

In [22]:
rag("what are the products provided by the company")

'The company provides ACME Enterprise Suite, cloud-based CRM platforms, and data analytics tools. They also offer custom software development and IT consulting services.'

In [23]:
rag("what do you know about abis working in ACME company")

'I\'m sorry, but the provided context does not contain any information about "abis" working in ACME company.'

This abstracts the workflow into a single callable chain.

# **LangChain RAG Pipeline Architecture**



---

##  What is RAG?

**RAG (Retrieval-Augmented Generation)** is an architecture that enhances LLM responses by retrieving relevant documents from a knowledge base before generating an answer. This allows the model to stay factual, up-to-date, and context-aware without retraining.





##  Visual Flow

```plaintext
                +-------------------+
User Query -->  |  LangChain RAG    |
                +---------+---------+
                          |
           +--------------+--------------+
           |                             |
  +--------v---------+        +----------v----------+
  |  Retriever       |        |      Gemini LLM     |
  | (FAISS Vector DB)|        |   (Text Generator)  |
  +------------------+        +---------------------+
           |                             |
  +--------v-----------------------------v---------+
  |       Final Answer / Summary from Gemini       |
  +------------------------------------------------+


##  Component Breakdown

| **Component**       | **Role**                                                                 |
|---------------------|--------------------------------------------------------------------------|
| **User Query**      | Natural language question from the user                                 |
| **LangChain RAG**   | Orchestrates the flow: gets query → retrieves docs → gets final answer  |
| **Retriever (FAISS)** | Searches stored documents based on semantic similarity                 |
| **Gemini LLM**      | Uses context + query to generate a relevant, concise response           |
| **Final Answer**    | The generated output shown to the user                                  |


##  Example Scenario

> **User Query:** "What are the symptoms of diabetes?"

1. **LangChain RAG** receives the query.  
2. **Retriever (FAISS)** finds the top 3 chunks from a medical textbook on diabetes.  
3. **Gemini LLM** takes the query and the chunks and generates:

> *"Common symptoms of diabetes include increased thirst, frequent urination, fatigue, and blurred vision."*




##  Why Use RAG with LangChain?

-  Combines **search + generation**
-  Keeps answers **grounded in source documents**
-  Handles **custom domain knowledge** (e.g., legal, medical, academic)
-  **Gemini** adds fluent natural language generation

