<a href="https://colab.research.google.com/github/amoukrim/AI/blob/main/Week6/DailyChallenge/dailyChallengeW_6_D3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##@Author : Adil MOUKRIM


Build a (RAG) System
Last Updated: July 10th, 2025

Daily Challenge: Build a Retrieval Augmented Generation (RAG) System


üë©‚Äçüè´ üë©üèø‚Äçüè´ What You‚Äôll learn
Implement a Retrieval Augmented Generation (RAG) system using Langchain and Hugging Face.
Load and process datasets using Hugging Face datasets and Langchain HuggingFaceDatasetLoader.
Split documents into smaller chunks using Langchain RecursiveCharacterTextSplitter.
Generate text embeddings using Hugging Face sentence-transformers and Langchain HuggingFaceEmbeddings.
Create and utilize vector stores with Langchain FAISS for efficient document retrieval.
Prepare and integrate a pre-trained Language Model (LLM) from Hugging Face transformers for question answering.
Build a Retrieval QA Chain using Langchain RetrievalQA to answer questions based on retrieved documents.


üõ†Ô∏è What you will create
You will create a functional RAG system that can answer questions based on a dataset loaded from Hugging Face Datasets. This system will:

Load the databricks/databricks-dolly-15k dataset.
Index the dataset content into a vector store.
Utilize a pre-trained question-answering model from Hugging Face.
Answer user queries by retrieving relevant documents and using the LLM to generate answers.


Mandatory : You must read this article before starting the exercise
Faiss | LangChain



Mandatory : You must watch these videos before starting the exercise


PyTorch in 100 Seconds



LangChain Explained in 13 Minutes



Task
Our task is to implement RAG using Langchain and Hugging Face!

1. Set up your environment: : This ensures all the necessary tools are available to build the RAG system. Each library serves a specific role: Langchain handles the orchestration of components, transformers provide pre-trained models, sentence-transformers generate embeddings, datasets load sample data, and FAISS enables fast similarity searches.

Open your terminal or notebook environment.
Install all required libraries by running these commands:


!pip install -q langchain
!pip install -q torch
!pip install -q transformers
!pip install -q sentence-transformers
!pip install -q datasets
!pip install -q faiss-cpu
!pip install -U langchain-community


2. Load the dataset: To provide the system with information to retrieve from, you‚Äôll load a real-world dataset. HuggingFaceDatasetLoader simplifies the process of accessing Hugging Face datasets and formatting them into documents that Langchain can process.

before loading the dataset, run :
pip install -Uq datasets
Import HuggingFaceDatasetLoader from langchain.document_loaders.
Specify the dataset name and content column:


dataset_name = "databricks/databricks-dolly-15k"
page_content_column = "context"


Create a HuggingFaceDatasetLoader instance and load the data as documents:


loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)
data = loader.load()
print(data[:2]) # Optional: Print the first 2 entries to verify loading


3. Split the documents: Language models have a limit on how much text they can process at once. Splitting large documents into smaller, overlapping chunks ensures that no important context is lost and that each piece of text is a manageable size for embedding and retrieval.

Import RecursiveCharacterTextSplitter from langchain.text_splitter.
Create a RecursiveCharacterTextSplitter instance with a chunk_size of 1000 and chunk_overlap of 150:


text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)


Split the loaded documents:


docs = text_splitter.split_documents(data)
print(docs[0]) # Optional: Print the first document chunk


4. Embed the text: Text needs to be converted into numerical representations (embeddings) so that similar pieces of text can be found efficiently. Using a sentence-transformer model creates embeddings that capture semantic meaning, enabling effective retrieval later.

Import HuggingFaceEmbeddings from langchain.embeddings.
Define the model path, model configurations, and encoding options:


modelPath = "sentence-transformers/all-MiniLM-l6-v2"
model_kwargs = {'device':'cpu'}
encode_kwargs = {'normalize_embeddings': False}


Initialize HuggingFaceEmbeddings:


embeddings = HuggingFaceEmbeddings(
  model_name=modelPath,
  model_kwargs=model_kwargs,
  encode_kwargs=encode_kwargs
)


(Optional) Test embedding creation:


text = "This is a test document."
query_result = embeddings.embed_query(text)
print(query_result[:3])


5. Create a vector store: A vector store like FAISS indexes the embeddings, allowing fast and scalable similarity searches. This is how the system quickly finds relevant pieces of text when a query is made.

Import FAISS from langchain.vectorstores.
Create a FAISS vector store from the document chunks and embeddings:


db = FAISS.from_documents(docs, embeddings)


Note: This step might take some time depending on your dataset size.


6. Prepare the LLM model: The Language Model is responsible for generating answers based on retrieved documents. Loading a pre-trained model and wrapping it in a Langchain pipeline makes it easy to integrate with the retrieval system.

Import necessary classes from transformers and langchain:


from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langchain import HuggingFacePipeline


Load the tokenizer and question-answering model:


tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")
model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")


Create a question-answering pipeline:


model_name = "Intel/dynamic_tinybert"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding=True, truncation=True, max_length=512)
Youtubeer = pipeline(
  "question-answering",
  model=model_name,
  tokenizer=tokenizer,
  return_tensors='pt'
)


Create a Langchain pipeline wrapper:


llm = HuggingFacePipeline(
  pipeline=Youtubeer,
  model_kwargs={"temperature": 0.7, "max_length": 512},
)


7. Build the Retrieval QA Chain: The Retrieval QA Chain connects the retriever (which finds relevant documents) with the LLM (which generates answers). This chain enables the full RAG process, where the system retrieves helpful context and then answers the user‚Äôs query based on that context.

Import RetrievalQA from langchain.chains.
Create a retriever from your FAISS database:


retriever = db.as_retriever(search_kwargs={"k": 4}) # Optional: You can adjust k for number of documents retrieved


Build the RetrievalQA chain:


qa = RetrievalQA.from_chain_type(llm=llm, chain_type="refine", retriever=retriever, return_source_documents=False)


8. Test your RAG system: Running a test query allows you to verify that all components are working together. This step ensures that documents are retrieved correctly and that the model generates meaningful answers based on the retrieved context.

Define your question:


question = "What is cheesemaking?"


Run the QA chain and print the result:


result = qa.run({"query": question})
print(result) # Or print(result["result"]) if the output is a dictionary

##‚úÖ √âtape 1 : Installation des biblioth√®ques

In [25]:
# LangChain : Permet de composer les diff√©rentes briques d'un syst√®me d'IA.
!pip install -q langchain

# Torch : Librairie de base pour les mod√®les PyTorch (n√©cessaire aux mod√®les Hugging Face).
!pip install -q torch

# Transformers : Fournit les mod√®les pr√©-entra√Æn√©s (comme BERT, GPT, etc.).
!pip install -q transformers

# Sentence-transformers : Permet de g√©n√©rer des embeddings de phrases pour la recherche s√©mantique.
!pip install -q sentence-transformers

# Datasets : Permet de charger des jeux de donn√©es depuis Hugging Face.
!pip install -q datasets

# Faiss-cpu : Librairie pour les recherches vectorielles rapides (m√™me sur CPU).
!pip install -q faiss-cpu

# langchain-community : Fournit les loaders et connecteurs suppl√©mentaires pour Langchain.
!pip install -U langchain-community




‚úÖ √âtape 2 : Chargement du dataset

2-1 : Installer encore une fois datasets pour √™tre s√ªr que tout fonctionne

In [26]:
!pip install -Uq datasets

##2-2 2. Importer le loader depuis Langchain

In [27]:
from langchain_community.document_loaders import HuggingFaceDatasetLoader


##2-3 D√©finir les param√®tres du dataset


In [28]:
# Nom du dataset depuis Hugging Face
dataset_name = "databricks/databricks-dolly-15k"
# Nom de la colonne contenant le texte que nous voulons utiliser
page_content_column = "context"


##2-4. Cr√©er le loader et charger les documents

In [29]:
# Cr√©ation du loader avec le nom du dataset et la colonne cible
loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)

# Chargement effectif des documents
data = loader.load()

# (Optionnel) On affiche les deux premiers documents pour v√©rifier
print(data[:2])


[Document(metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}, page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia\'s domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."'), Document(metadata={'instruction': 'Which is a species of fish? Tope or Rope', 'response': 'Tope', 'category': 'classification'}, page_content='""')]


## √âtape 3 : Splitter les documents (d√©coupage en morceaux)

In [30]:
# On importe l'outil de d√©coupage r√©cursif de texte
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Cr√©ation d'une instance de TextSplitter
# - chunk_size : taille de chaque morceau (1000 caract√®res ici)
# - chunk_overlap : nombre de caract√®res communs entre deux morceaux successifs (150 ici)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150
)

# D√©coupage des documents pr√©c√©demment charg√©s
docs = text_splitter.split_documents(data)

#  Affichage du premier document pour v√©rifier le r√©sultat
print(docs[0])


page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."' metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}


##√âtape 4 : Cr√©er les embeddings

In [31]:
# On importe la classe pour cr√©er les embeddings depuis sentence-transformers
from langchain.embeddings import HuggingFaceEmbeddings

# On d√©finit le mod√®le d'embedding √† utiliser (l√©ger et rapide pour CPU)
modelPath = "sentence-transformers/all-MiniLM-L6-v2"

# Options du mod√®le : ici, on reste sur CPU
model_kwargs = {'device': 'cpu'}

# Options d'encodage : pas de normalisation, ce qui peut √™tre utile selon le moteur de recherche
encode_kwargs = {'normalize_embeddings': False}

# Cr√©ation de l'objet embeddings
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

#  teste avec une phrase simple
text = "This is a test document."
query_result = embeddings.embed_query(text)

# affiche les 3 premi√®res valeurs du vecteur pour v√©rifier
print(query_result[:3])


[-0.03833857178688049, 0.1234646737575531, -0.028642933815717697]


## R√©sultat de l'embedding :
j'ai un vecteur de sortie :

[-0.038338541984558105, 0.12346471846103668, -0.02864297851920128]
 Cela signifie que la phrase "This is a test document." a √©t√© convertie en une repr√©sentation num√©rique s√©mantique, utilisable pour la recherche.

## √âtape 5 : Cr√©er le vector store FAISS

In [32]:
!pip install -U langchain-community



In [33]:
from langchain_community.vectorstores import FAISS

# Cr√©ation du vector store √† partir des documents d√©coup√©s et des embeddings
db = FAISS.from_documents(docs, embeddings)

# Cr√©ation du retriever √† partir de la base FAISS
retriever = db.as_retriever()

# Test : recherche d'un document similaire √† une question
similar_docs = retriever.get_relevant_documents("When did Virgin Australia start operating?")

# Affichage du premier document trouv√©
print(similar_docs[0])



page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."' metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}


## R√©sultat:
syst√®me est maintenant capable de retrouver le contenu le plus pertinent √† partir d‚Äôune question gr√¢ce aux embeddings s√©mantiques.

## √âtape 6 : Chargement du mod√®le de r√©ponse (LLM)

## 6-1. Charger le mod√®le et le tokenizer :

In [34]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langchain import HuggingFacePipeline

# Nom du mod√®le (petit, rapide, optimis√© pour CPU)
model_name = "Intel/dynamic_tinybert"

# Chargement du tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Chargement du mod√®le
model = AutoModelForQuestionAnswering.from_pretrained(model_name)


Invalid model-index. Not loading eval results into CardData.


## 6-2. Cr√©er le pipeline Hugging Face

In [35]:
# Cr√©ation d'un pipeline de question-r√©ponse
qa_pipeline = pipeline(
    "question-answering",
    model=model,
    tokenizer=tokenizer,
    return_tensors='pt'
)


Device set to use cuda:0


## 6-3 3. Int√©grer le pipeline dans LangChain

In [36]:
llm = HuggingFacePipeline(
    pipeline=qa_pipeline,
    model_kwargs={"temperature": 0.7, "max_length": 512}
)


## √âtape 7 : Construire la cha√Æne de question-r√©ponse (RetrievalQA)

In [37]:
from langchain.chains import RetrievalQA

# Cr√©ation du retriever √† partir de la base FAISS
retriever = db.as_retriever(search_kwargs={"k": 4})  # k = nombre de documents √† r√©cup√©rer

# Construction de la cha√Æne RetrievalQA
qa = RetrievalQA.from_chain_type(
    llm=llm,                      # Le mod√®le de r√©ponse
    chain_type="refine",         # M√©thode de r√©ponse (refine = am√©liorations progressives)
    retriever=retriever,         # Le syst√®me de recherche (FAISS)
    return_source_documents=False  # On ne retourne que la r√©ponse finale
)


## √âtape 8 : Tester le syst√®me RAG


L'objectif est de poser une question en langage naturel, que le syst√®me va :

* √âl√©ment de liste

* interpr√©ter,

* utiliser pour retrouver les documents pertinents via FAISS,

* transmettre au LLM (Intel/dynamic_tinybert) pour g√©n√©rer une r√©ponse.

In [39]:
# 1. R√©cup√©rer les documents pertinents via FAISS
retrieved_docs = retriever.invoke("What is cheesemaking?")

# 2. Concat√©ner leur contenu en un seul contexte
context = " ".join([doc.page_content for doc in retrieved_docs])

# 3. Appeler le pipeline de question-r√©ponse directement avec question + contexte
response = qa_pipeline(
    question="What is cheesemaking?",
    context=context
)

# 4. Afficher uniquement la r√©ponse
print("üß† R√©ponse g√©n√©r√©e :", response["answer"])



üß† R√©ponse g√©n√©r√©e : to control the spoiling of milk into cheese


## Conclusion : Je viens de construire Tu viens de construire un RAG complet

##les √©tapes suivies :


| √âtape | Description                                                        |
| ----- | ------------------------------------------------------------------ |
| ‚úÖ 1   | Installation de tous les outils n√©cessaires                        |
| ‚úÖ 2   | Chargement d‚Äôun dataset r√©el depuis Hugging Face                   |
| ‚úÖ 3   | D√©coupage intelligent des documents                                |
| ‚úÖ 4   | G√©n√©ration des embeddings via `sentence-transformers`              |
| ‚úÖ 5   | Indexation vectorielle avec FAISS                                  |
| ‚úÖ 6   | Chargement d‚Äôun LLM de question-r√©ponse                            |
| ‚úÖ 7   | Recherche s√©mantique dans les documents                            |
| ‚úÖ 8   | G√©n√©ration de r√©ponses naturelles √† partir des documents r√©cup√©r√©s |
