<a href="https://colab.research.google.com/github/amoukrim/AI/blob/main/Week6/DailyChallenge/dailyChallengeW_6_D3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##@Author : Adil MOUKRIM


Build a (RAG) System
Last Updated: July 10th, 2025

Daily Challenge: Build a Retrieval Augmented Generation (RAG) System


👩‍🏫 👩🏿‍🏫 What You’ll learn
Implement a Retrieval Augmented Generation (RAG) system using Langchain and Hugging Face.
Load and process datasets using Hugging Face datasets and Langchain HuggingFaceDatasetLoader.
Split documents into smaller chunks using Langchain RecursiveCharacterTextSplitter.
Generate text embeddings using Hugging Face sentence-transformers and Langchain HuggingFaceEmbeddings.
Create and utilize vector stores with Langchain FAISS for efficient document retrieval.
Prepare and integrate a pre-trained Language Model (LLM) from Hugging Face transformers for question answering.
Build a Retrieval QA Chain using Langchain RetrievalQA to answer questions based on retrieved documents.


🛠️ What you will create
You will create a functional RAG system that can answer questions based on a dataset loaded from Hugging Face Datasets. This system will:

Load the databricks/databricks-dolly-15k dataset.
Index the dataset content into a vector store.
Utilize a pre-trained question-answering model from Hugging Face.
Answer user queries by retrieving relevant documents and using the LLM to generate answers.


Mandatory : You must read this article before starting the exercise
Faiss | LangChain



Mandatory : You must watch these videos before starting the exercise


PyTorch in 100 Seconds



LangChain Explained in 13 Minutes



Task
Our task is to implement RAG using Langchain and Hugging Face!

1. Set up your environment: : This ensures all the necessary tools are available to build the RAG system. Each library serves a specific role: Langchain handles the orchestration of components, transformers provide pre-trained models, sentence-transformers generate embeddings, datasets load sample data, and FAISS enables fast similarity searches.

Open your terminal or notebook environment.
Install all required libraries by running these commands:


!pip install -q langchain
!pip install -q torch
!pip install -q transformers
!pip install -q sentence-transformers
!pip install -q datasets
!pip install -q faiss-cpu
!pip install -U langchain-community


2. Load the dataset: To provide the system with information to retrieve from, you’ll load a real-world dataset. HuggingFaceDatasetLoader simplifies the process of accessing Hugging Face datasets and formatting them into documents that Langchain can process.

before loading the dataset, run :
pip install -Uq datasets
Import HuggingFaceDatasetLoader from langchain.document_loaders.
Specify the dataset name and content column:


dataset_name = "databricks/databricks-dolly-15k"
page_content_column = "context"


Create a HuggingFaceDatasetLoader instance and load the data as documents:


loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)
data = loader.load()
print(data[:2]) # Optional: Print the first 2 entries to verify loading


3. Split the documents: Language models have a limit on how much text they can process at once. Splitting large documents into smaller, overlapping chunks ensures that no important context is lost and that each piece of text is a manageable size for embedding and retrieval.

Import RecursiveCharacterTextSplitter from langchain.text_splitter.
Create a RecursiveCharacterTextSplitter instance with a chunk_size of 1000 and chunk_overlap of 150:


text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)


Split the loaded documents:


docs = text_splitter.split_documents(data)
print(docs[0]) # Optional: Print the first document chunk


4. Embed the text: Text needs to be converted into numerical representations (embeddings) so that similar pieces of text can be found efficiently. Using a sentence-transformer model creates embeddings that capture semantic meaning, enabling effective retrieval later.

Import HuggingFaceEmbeddings from langchain.embeddings.
Define the model path, model configurations, and encoding options:


modelPath = "sentence-transformers/all-MiniLM-l6-v2"
model_kwargs = {'device':'cpu'}
encode_kwargs = {'normalize_embeddings': False}


Initialize HuggingFaceEmbeddings:


embeddings = HuggingFaceEmbeddings(
  model_name=modelPath,
  model_kwargs=model_kwargs,
  encode_kwargs=encode_kwargs
)


(Optional) Test embedding creation:


text = "This is a test document."
query_result = embeddings.embed_query(text)
print(query_result[:3])


5. Create a vector store: A vector store like FAISS indexes the embeddings, allowing fast and scalable similarity searches. This is how the system quickly finds relevant pieces of text when a query is made.

Import FAISS from langchain.vectorstores.
Create a FAISS vector store from the document chunks and embeddings:


db = FAISS.from_documents(docs, embeddings)


Note: This step might take some time depending on your dataset size.


6. Prepare the LLM model: The Language Model is responsible for generating answers based on retrieved documents. Loading a pre-trained model and wrapping it in a Langchain pipeline makes it easy to integrate with the retrieval system.

Import necessary classes from transformers and langchain:


from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langchain import HuggingFacePipeline


Load the tokenizer and question-answering model:


tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")
model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")


Create a question-answering pipeline:


model_name = "Intel/dynamic_tinybert"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding=True, truncation=True, max_length=512)
Youtubeer = pipeline(
  "question-answering",
  model=model_name,
  tokenizer=tokenizer,
  return_tensors='pt'
)


Create a Langchain pipeline wrapper:


llm = HuggingFacePipeline(
  pipeline=Youtubeer,
  model_kwargs={"temperature": 0.7, "max_length": 512},
)


7. Build the Retrieval QA Chain: The Retrieval QA Chain connects the retriever (which finds relevant documents) with the LLM (which generates answers). This chain enables the full RAG process, where the system retrieves helpful context and then answers the user’s query based on that context.

Import RetrievalQA from langchain.chains.
Create a retriever from your FAISS database:


retriever = db.as_retriever(search_kwargs={"k": 4}) # Optional: You can adjust k for number of documents retrieved


Build the RetrievalQA chain:


qa = RetrievalQA.from_chain_type(llm=llm, chain_type="refine", retriever=retriever, return_source_documents=False)


8. Test your RAG system: Running a test query allows you to verify that all components are working together. This step ensures that documents are retrieved correctly and that the model generates meaningful answers based on the retrieved context.

Define your question:


question = "What is cheesemaking?"


Run the QA chain and print the result:


result = qa.run({"query": question})
print(result) # Or print(result["result"]) if the output is a dictionary

##✅ Étape 1 : Installation des bibliothèques

In [25]:
# LangChain : Permet de composer les différentes briques d'un système d'IA.
!pip install -q langchain

# Torch : Librairie de base pour les modèles PyTorch (nécessaire aux modèles Hugging Face).
!pip install -q torch

# Transformers : Fournit les modèles pré-entraînés (comme BERT, GPT, etc.).
!pip install -q transformers

# Sentence-transformers : Permet de générer des embeddings de phrases pour la recherche sémantique.
!pip install -q sentence-transformers

# Datasets : Permet de charger des jeux de données depuis Hugging Face.
!pip install -q datasets

# Faiss-cpu : Librairie pour les recherches vectorielles rapides (même sur CPU).
!pip install -q faiss-cpu

# langchain-community : Fournit les loaders et connecteurs supplémentaires pour Langchain.
!pip install -U langchain-community




✅ Étape 2 : Chargement du dataset

2-1 : Installer encore une fois datasets pour être sûr que tout fonctionne

In [26]:
!pip install -Uq datasets

##2-2 2. Importer le loader depuis Langchain

In [27]:
from langchain_community.document_loaders import HuggingFaceDatasetLoader


##2-3 Définir les paramètres du dataset


In [28]:
# Nom du dataset depuis Hugging Face
dataset_name = "databricks/databricks-dolly-15k"
# Nom de la colonne contenant le texte que nous voulons utiliser
page_content_column = "context"


##2-4. Créer le loader et charger les documents

In [29]:
# Création du loader avec le nom du dataset et la colonne cible
loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)

# Chargement effectif des documents
data = loader.load()

# (Optionnel) On affiche les deux premiers documents pour vérifier
print(data[:2])


[Document(metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}, page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia\'s domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."'), Document(metadata={'instruction': 'Which is a species of fish? Tope or Rope', 'response': 'Tope', 'category': 'classification'}, page_content='""')]


## Étape 3 : Splitter les documents (découpage en morceaux)

In [30]:
# On importe l'outil de découpage récursif de texte
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Création d'une instance de TextSplitter
# - chunk_size : taille de chaque morceau (1000 caractères ici)
# - chunk_overlap : nombre de caractères communs entre deux morceaux successifs (150 ici)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150
)

# Découpage des documents précédemment chargés
docs = text_splitter.split_documents(data)

#  Affichage du premier document pour vérifier le résultat
print(docs[0])


page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."' metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}


##Étape 4 : Créer les embeddings

In [31]:
# On importe la classe pour créer les embeddings depuis sentence-transformers
from langchain.embeddings import HuggingFaceEmbeddings

# On définit le modèle d'embedding à utiliser (léger et rapide pour CPU)
modelPath = "sentence-transformers/all-MiniLM-L6-v2"

# Options du modèle : ici, on reste sur CPU
model_kwargs = {'device': 'cpu'}

# Options d'encodage : pas de normalisation, ce qui peut être utile selon le moteur de recherche
encode_kwargs = {'normalize_embeddings': False}

# Création de l'objet embeddings
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

#  teste avec une phrase simple
text = "This is a test document."
query_result = embeddings.embed_query(text)

# affiche les 3 premières valeurs du vecteur pour vérifier
print(query_result[:3])


[-0.03833857178688049, 0.1234646737575531, -0.028642933815717697]


## Résultat de l'embedding :
j'ai un vecteur de sortie :

[-0.038338541984558105, 0.12346471846103668, -0.02864297851920128]
 Cela signifie que la phrase "This is a test document." a été convertie en une représentation numérique sémantique, utilisable pour la recherche.

## Étape 5 : Créer le vector store FAISS

In [32]:
!pip install -U langchain-community



In [33]:
from langchain_community.vectorstores import FAISS

# Création du vector store à partir des documents découpés et des embeddings
db = FAISS.from_documents(docs, embeddings)

# Création du retriever à partir de la base FAISS
retriever = db.as_retriever()

# Test : recherche d'un document similaire à une question
similar_docs = retriever.get_relevant_documents("When did Virgin Australia start operating?")

# Affichage du premier document trouvé
print(similar_docs[0])



page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."' metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}


## Résultat:
système est maintenant capable de retrouver le contenu le plus pertinent à partir d’une question grâce aux embeddings sémantiques.

## Étape 6 : Chargement du modèle de réponse (LLM)

## 6-1. Charger le modèle et le tokenizer :

In [34]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langchain import HuggingFacePipeline

# Nom du modèle (petit, rapide, optimisé pour CPU)
model_name = "Intel/dynamic_tinybert"

# Chargement du tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Chargement du modèle
model = AutoModelForQuestionAnswering.from_pretrained(model_name)


Invalid model-index. Not loading eval results into CardData.


## 6-2. Créer le pipeline Hugging Face

In [35]:
# Création d'un pipeline de question-réponse
qa_pipeline = pipeline(
    "question-answering",
    model=model,
    tokenizer=tokenizer,
    return_tensors='pt'
)


Device set to use cuda:0


## 6-3 3. Intégrer le pipeline dans LangChain

In [36]:
llm = HuggingFacePipeline(
    pipeline=qa_pipeline,
    model_kwargs={"temperature": 0.7, "max_length": 512}
)


## Étape 7 : Construire la chaîne de question-réponse (RetrievalQA)

In [37]:
from langchain.chains import RetrievalQA

# Création du retriever à partir de la base FAISS
retriever = db.as_retriever(search_kwargs={"k": 4})  # k = nombre de documents à récupérer

# Construction de la chaîne RetrievalQA
qa = RetrievalQA.from_chain_type(
    llm=llm,                      # Le modèle de réponse
    chain_type="refine",         # Méthode de réponse (refine = améliorations progressives)
    retriever=retriever,         # Le système de recherche (FAISS)
    return_source_documents=False  # On ne retourne que la réponse finale
)


## Étape 8 : Tester le système RAG


L'objectif est de poser une question en langage naturel, que le système va :

* Élément de liste

* interpréter,

* utiliser pour retrouver les documents pertinents via FAISS,

* transmettre au LLM (Intel/dynamic_tinybert) pour générer une réponse.

In [39]:
# 1. Récupérer les documents pertinents via FAISS
retrieved_docs = retriever.invoke("What is cheesemaking?")

# 2. Concaténer leur contenu en un seul contexte
context = " ".join([doc.page_content for doc in retrieved_docs])

# 3. Appeler le pipeline de question-réponse directement avec question + contexte
response = qa_pipeline(
    question="What is cheesemaking?",
    context=context
)

# 4. Afficher uniquement la réponse
print("🧠 Réponse générée :", response["answer"])



🧠 Réponse générée : to control the spoiling of milk into cheese


## Conclusion : Je viens de construire Tu viens de construire un RAG complet

##les étapes suivies :


| Étape | Description                                                        |
| ----- | ------------------------------------------------------------------ |
| ✅ 1   | Installation de tous les outils nécessaires                        |
| ✅ 2   | Chargement d’un dataset réel depuis Hugging Face                   |
| ✅ 3   | Découpage intelligent des documents                                |
| ✅ 4   | Génération des embeddings via `sentence-transformers`              |
| ✅ 5   | Indexation vectorielle avec FAISS                                  |
| ✅ 6   | Chargement d’un LLM de question-réponse                            |
| ✅ 7   | Recherche sémantique dans les documents                            |
| ✅ 8   | Génération de réponses naturelles à partir des documents récupérés |
