# **RAG: Simple example**

# **Introduction**

This notebook provides a **brief demonstration of how Retrieval-Augmented Generation (RAG) works**.  
We focus on using **high-level tools and ready-to-use libraries**, so you can build a working RAG pipeline without diving into the low-level implementation details.  

The goal is to show how different components — embeddings, vector stores, retrievers, and language models — can be combined with minimal setup to create a functional system.  

In the next notebook, we will explore a **low-level implementation of RAG**, where we break down each step in detail and build more of the pipeline manually. This will give you a deeper understanding of the mechanics behind RAG.  


# **Import data**

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQlpag95YX-c_LdJfWx7bD4MHs6rYuPItvGJg&s)

# **Dataset**

For this demonstration, we will use a **sample of 500 English texts from Wikipedia**.  
This dataset provides general-purpose knowledge that is well-suited for testing RAG pipelines, since it contains factual information across a wide range of topics.  

The selected texts will be processed (chunked and embedded) before being stored in the vector database, so they can later be retrieved and used as context for answering questions.


In [8]:
!pip install datasets
from datasets import load_dataset
from datasets import load_dataset_builder



In [9]:
ds_builder = load_dataset_builder("cornell-movie-review-data/rotten_tomatoes")

In [10]:
ds_builder.info.description

''

In [None]:

wiki = load_dataset("wikimedia/wikipedia", "20231101.en", split="train[:100]")
texts = [x["text"] for x in wiki]

In [None]:
len(texts)

In [None]:
texts[1]

# **RAG Pipeline**

# **Imports**

In [None]:
!pip install -U langchain-community

In [None]:
!pip install langchain faiss-cpu sentence-transformers datasets

# **Architecture:**

The RAG architecture in this case will be the following:

1. Get the dataset from texts rom wikipedia.
2. Take the model `all-MiniLM-L6-v2` to vectorize the texts we got from Wikipedia. Chunk them beforehand to make the retrieval more efficient.
3. Build FAISS vector store with the texts and the embedding model.
4. Set up the LLM to handle text.
5. Bind together the LLM and the retrieval to shape the Retrieval QA Chain.

[![](https://mermaid.ink/img/pako:eNplkttuozAQhl_FmlWlrgQR5OAmXFQiIUlXSqRu0-3FQi8cGIhVY7PG9LBR3n0NZNtKHQlL9vzz_ePBR0hVhhBALtRLemDakPsokcRGGP-qUZOfDeq3R-K612Qe34VrElYVcUnT5v50uUT2BXWzLzSrDuQOjeb4zER_3sa8Ayzim6YouCxWLMVluccss5va4pgQ7pZLvtm6G-o-Dx8_Shdd6TK-V5X7RCKVNiVKU58VKLMv_muUqJnhSn5Qlh1lFc8bLjJyq1VZGeu7YbJYHBiXnwxXnXQdbzbb_yilrTYXTLpm4tal7faTft3pb-IVl0yQUNYvqL92d3FBboW9NonmhBliDkhqnmGfjOLLVfhjtyMPmLZmO7vg9zOkH8D5r0T9NFoqOFBonkGQM1GjAyXqkrV7OCZg-SUmEJAEMsxZI0wCDrFlfeaBac72AutOIm3BnqVPhVaN7RiCBL7lXSRwOlmfisnfSpUQGN1YJysrDu--TZUxgxFndvrl-6m2d0e9sEADAaVeB4HgCK8QjOlkMPH92bD9pjNKHXiDYDQcXE3G_tXYH01HlE7pyYG_nas3mI38mefRoTem3sT3Rg5gxu2Qtv3rTZXMeQGnf6v72Nw?type=png)](https://mermaid.live/edit#pako:eNplkttuozAQhl_FmlWlrgQR5OAmXFQiIUlXSqRu0-3FQi8cGIhVY7PG9LBR3n0NZNtKHQlL9vzz_ePBR0hVhhBALtRLemDakPsokcRGGP-qUZOfDeq3R-K612Qe34VrElYVcUnT5v50uUT2BXWzLzSrDuQOjeb4zER_3sa8Ayzim6YouCxWLMVluccss5va4pgQ7pZLvtm6G-o-Dx8_Shdd6TK-V5X7RCKVNiVKU58VKLMv_muUqJnhSn5Qlh1lFc8bLjJyq1VZGeu7YbJYHBiXnwxXnXQdbzbb_yilrTYXTLpm4tal7faTft3pb-IVl0yQUNYvqL92d3FBboW9NonmhBliDkhqnmGfjOLLVfhjtyMPmLZmO7vg9zOkH8D5r0T9NFoqOFBonkGQM1GjAyXqkrV7OCZg-SUmEJAEMsxZI0wCDrFlfeaBac72AutOIm3BnqVPhVaN7RiCBL7lXSRwOlmfisnfSpUQGN1YJysrDu--TZUxgxFndvrl-6m2d0e9sEADAaVeB4HgCK8QjOlkMPH92bD9pjNKHXiDYDQcXE3G_tXYH01HlE7pyYG_nas3mI38mefRoTem3sT3Rg5gxu2Qtv3rTZXMeQGnf6v72Nw)


# **Components:**

**Embedding Model:**
Library: Hugging Face

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTbV0nsS-UXj2CFrm4REkgNl0D3o7Oa5SGIdQ&s)
* Converts text chunks into numerical vectors (embeddings).
* These vectors capture semantic meaning, so similar texts are close in vector space.
* all-MiniLM-L6-v2 is small, fast, and well-suited for semantic search.

**Text Splitter:**

Library: LangChain

* Breaks long documents into smaller chunks for better retrieval.
* chunk_size=500 → max 500 characters per chunk.
* chunk_overlap=50 → keeps some overlap to preserve context across splits.

**Vector Store (FAISS)**

Library: LangChain

* Stores embeddings in a FAISS index, optimized for fast similarity search.
* When a query comes in, FAISS retrieves the most similar chunks.
* Acts as the "knowledge base" for your retrieval-augmented generation (RAG).

**Local LLM**
* The language model that will generate answers.
* **flan-t5-small** is a lightweight instruction-tuned model good for Q&A.
* HuggingFace’s pipeline wraps the model, and LangChain makes it usable in chains.

**RetrievalQA Chain**
Orchestrates the full RAG pipeline:

Library: LangChain

1. Takes a user question.
2. Uses FAISS retriever to find top-3 relevant chunks (k=3).
3. Passes those chunks as context to the LLM.
4. LLM generates a final answer grounded in the retrieved docs.


# **Tools**

**LangChain**  
![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTc-QRqqtRi8EEyMCDcBawEio86I7MpmwMBTw&s)

LangChain is a framework that helps build applications with LLMs.  
In this project, LangChain provides:  
* Utilities to split texts into chunks (`RecursiveCharacterTextSplitter`).  
* A wrapper around embedding models and vector stores (`FAISS`).  
* High-level chains like `RetrievalQA` to connect retrievers with LLMs.  
* Makes the orchestration of RAG pipelines simple and modular.  

---

**FAISS** (Facebook AI Similarity Search)  

![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ee/Logo_de_Facebook.png/250px-Logo_de_Facebook.png)

FAISS is a library developed by Facebook AI for efficient similarity search and clustering of dense vectors.  
In this project, FAISS provides:  
* A **vector database** that stores embeddings of text chunks.  
* Very fast nearest-neighbor search, even with large datasets.  
* The retrieval backbone of our RAG pipeline.  

---

**Hugging Face Transformers / Pipelines**  
![](https://huggingface.co/front/assets/huggingface_logo.svg)  
Hugging Face provides both **pretrained models** and convenient APIs.  
In this project, Hugging Face provides:  
* **SentenceTransformers** model (`all-MiniLM-L6-v2`) for embeddings.  
* **Flan-T5-small** as a lightweight LLM for Q&A (`pipeline("text2text-generation")`).  
* Easy-to-use wrappers that allow us to focus on building instead of training.  

---

**Local LLM (Flan-T5-small)**  
* A small instruction-tuned model (77M parameters).  
* Runs fully **locally**, without requiring an API key.  
* Lightweight enough for Colab but powerful enough to perform Q&A tasks.  
* Integrated into LangChain through `HuggingFacePipeline`.  

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSpq64iKfSNWj95WbDvXsSh5smOOoLZZAjvcQ&s)

---


In [None]:
from datasets import load_dataset
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter


# 2. Create local embedding model
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 2. Split documents into smaller chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.create_documents(texts)

# Extract text content from Document objects
doc_texts = [doc.page_content for doc in docs]

# 3. Build FAISS vector store
db = FAISS.from_texts(doc_texts, embedding_model)

# 4. Setup a local LLM (distilbert Q&A pipeline for demo)
qa_model = pipeline("text2text-generation", model="google/flan-t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=qa_model)

# 5. RetrievalQA chain
qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=db.as_retriever(search_kwargs={"k": 3})
)

# **Query Testing**

Here we are going to test several queries to see if the model is behaving correctly.  
The answers we get will be **very direct and to the point**, unlike other larger LLMs (such as ChatGPT), because here we are using a **very lightweight local model** that can run inside the notebook.  

There are a few important reasons for this behavior:

1. **Model size**:  
   We are using `flan-t5-small` (~77M parameters). While it is instruction-tuned, it is still too small to generate long, expressive, or conversational answers. Larger models (e.g., `flan-t5-base`, `flan-t5-large`, or `Mistral-7B-Instruct`) provide more detailed and natural-sounding responses.

2. **Default prompt**:  
   LangChain’s `RetrievalQA.from_chain_type` uses a very minimal built-in prompt:  
   > “Use the following context to answer the question. If you don’t know, say you don’t know.”  
   This keeps the answers short and factual by design. If we want more conversational answers, we need to **customize the prompt** to encourage elaboration.

3. **Retriever grounding**:  
   The retriever provides literal context snippets from our FAISS index. Small models like `flan-t5-small` tend to summarize these snippets rather than expanding or rephrasing them.

---

⚡ **Key takeaway**: The short, factual style we see is expected given the combination of a **small LLM**, a **minimal prompt**, and a **retrieval-based pipeline**.  
If we want richer, more human-like answers, we can:  
- Switch to a bigger model (e.g., Flan-T5-large or Mistral).  
- Customize the prompt to request a more detailed, conversational explanation.  
- Or post-process the answer with another model to rephrase it naturally.



Who was Albert Einstein?

In [None]:
query = "Who was Albert Einstein?"
res = qa.invoke(query)

print("Q:", query)
print("A:", res)

In [None]:
query = "What continents are there??"
res = qa.invoke(query)

print("Q:", query)
print("A:", res)

In [None]:
query = "When was the US founded?"
res = qa.invoke(query)

print("Q:", query)
print("A:", res)

As you may realize some of the answers are incorrect, this is due to the fact that we only used 100 samples from the training dataset, so the information gathered is not that big.