## What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is an AI architecture that combines two key techniques **retrieval and generation**.

1) First, the system retrieves relevant information from external sources such as databases, document collections, or the web.
2) Then, it uses this retrieved information to enhance the generation of responses or outputs.

By integrating retrieval with generation, RAG allows AI models to produce more accurate, and contextually relevant answers, even when the required information is not part of the model’s original training data.

## The High-Level Architecture of RAG (Semanctic Search)

1. **Data Ingestion Pipeline:** Converts documents into vector embeddings and stores them in a vector database for efficient semantic lookup.
2. **Retrieval Pipeline:** Transforms the user query into an embedding and retrieves semantically similar documents from the vector store.
3. **Generation Pipeline:** Combines the retrieved context with the query to generate a relevant, context-aware response.

<img src="https://i.ibb.co/wFT8HRbb/rag.png"
     alt="RAG Architecture"
     style="max-width: 90%; height: auto; display: block;">



### Instslling Dependencies

In [None]:
# Install required packages
%pip install langchain-core langchain-text-splitters langchain-google-genai python-dotenv langchain_community wikipedia



In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

GOOGLE_API_KEY="your google api key"

### Download the source data from Wikipedia

Source: https://en.wikipedia.org/wiki/Retrieval-augmented_generation

In [None]:
from langchain_community.document_loaders import WikipediaLoader

# Download the source data from wikipedia
loader = WikipediaLoader(query="Retrieval-augmented generation", load_max_docs=1, doc_content_chars_max=20000)

data = loader.load()



In [None]:
data

[Document(metadata={'title': 'Retrieval-augmented generation', 'summary': 'Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM\'s pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.\nRAG improves large language models (LLMs) by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process

In [None]:
source = ""
for doc in data:
    source += doc.page_content

In [None]:
source

'Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM\'s pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.\nRAG improves large language models (LLMs) by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to

### Text chunking (Recursive Text Chunking)

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size = 150, chunk_overlap = 35)

chunks = splitter.split_text(source)

print("Number of chunk : ", len(chunks))
chunks

Number of chunk :  105


['Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from',
 'incorporate new information from external data sources. With RAG, LLMs do not respond to user queries until they refer to a specified set of',
 "they refer to a specified set of documents. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use",
 'data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this helps',
 'data. For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.',
 'RAG improves large language models (LLMs) by incorporating information retrieval before generating responses. Unlike LLMs that rely on static',
 'Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. Ac

### Initialize Embedding Model

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedder = GoogleGenerativeAIEmbeddings(api_key=GOOGLE_API_KEY, model="models/text-embedding-004", task_type="RETRIEVAL_DOCUMENT")

### Initialize the Vector DB and Auto Generation of Embeddings

In [None]:
from langchain_core.vectorstores import InMemoryVectorStore

vectorstore = InMemoryVectorStore.from_texts(texts=chunks, embedding=embedder)

### User Query and Embeddings

In [None]:
user_query = "What is the purpose of Inverse Cloze Task"

# Embed the user query to perfom vector similarity search
query_embeddings = embedder.embed_query(user_query)

### Retriving Relevant Chunks from Vector DataBase

In [None]:
retriver = vectorstore.similarity_search_by_vector(embedding=query_embeddings, k=5)
retriver

[Document(id='b1e550f5-9a58-49f5-983f-89edd598339f', metadata={}, page_content='Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text'),
 Document(id='b0577605-8bbf-4076-ae8f-75ea20a23963', metadata={}, page_content='This involves retrieving the top-k vectors for a given prompt, scoring the generated response’s perplexity, and minimizing KL divergence between the'),
 Document(id='ea592674-31e4-463d-befb-22210ddec0e0', metadata={}, page_content='patterns by predicting masked text within documents.'),
 Document(id='8d3c04d4-db3a-4283-9e97-a57978cc3806', metadata={}, page_content='=== Language model ==='),
 Document(id='0028a2e1-f538-4776-b7f8-fab7e033c602', metadata={}, page_content='in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge.')]

### Initialize Chat Model

In [None]:
from langchain_google_genai import GoogleGenerativeAI

llm = GoogleGenerativeAI(api_key=GOOGLE_API_KEY, model="gemini-2.5-flash", temperature=0.3)

### System Prompting

In [None]:
context = "\n".join([doc.page_content for doc in retriver])

prompt = f"""
You are a knowledgeable assistant.
Answer the question using ONLY the context below.
If the answer is not present, say you don't know.

Context:
{context}

Question:
{user_query}
"""

In [None]:
response = llm.invoke(prompt)
print("Response: ", response)

Response:  The Inverse Cloze Task (ICT) is a technique that helps the model learn retrieval patterns by predicting masked text within documents.
