
Build a (RAG) System

Last Updated: July 10th, 2025

Daily Challenge: Build a Retrieval Augmented Generation (RAG) System


👩‍🏫 👩🏿‍🏫 What You’ll learn

    Implement a Retrieval Augmented Generation (RAG) system using Langchain and Hugging Face.
    Load and process datasets using Hugging Face datasets and Langchain HuggingFaceDatasetLoader.
    Split documents into smaller chunks using Langchain RecursiveCharacterTextSplitter.
    Generate text embeddings using Hugging Face sentence-transformers and Langchain HuggingFaceEmbeddings.
    Create and utilize vector stores with Langchain FAISS for efficient document retrieval.
    Prepare and integrate a pre-trained Language Model (LLM) from Hugging Face transformers for question answering.
    Build a Retrieval QA Chain using Langchain RetrievalQA to answer questions based on retrieved documents.


🛠️ What you will create

You will create a functional RAG system that can answer questions based on a dataset loaded from Hugging Face Datasets. This system will:

    Load the databricks/databricks-dolly-15k dataset.
    Index the dataset content into a vector store.
    Utilize a pre-trained question-answering model from Hugging Face.
    Answer user queries by retrieving relevant documents and using the LLM to generate answers.


Mandatory : You must read this article before starting the exercise

Faiss | LangChain


Mandatory : You must watch these videos before starting the exercise


PyTorch in 100 Seconds


LangChain Explained in 13 Minutes


Task

Our task is to implement RAG using Langchain and Hugging Face!

1. Set up your environment: : This ensures all the necessary tools are available to build the RAG system. Each library serves a specific role: Langchain handles the orchestration of components, transformers provide pre-trained models, sentence-transformers generate embeddings, datasets load sample data, and FAISS enables fast similarity searches.

    Open your terminal or notebook environment.
    Install all required libraries by running these commands:


!pip install -q langchain
!pip install -q torch
!pip install -q transformers
!pip install -q sentence-transformers
!pip install -q datasets
!pip install -q faiss-cpu
!pip install -U langchain-community


2. Load the dataset: To provide the system with information to retrieve from, you’ll load a real-world dataset. HuggingFaceDatasetLoader simplifies the process of accessing Hugging Face datasets and formatting them into documents that Langchain can process.

    before loading the dataset, run :

pip install -Uq datasets 

    Import HuggingFaceDatasetLoader from langchain.document_loaders.
    Specify the dataset name and content column:


dataset_name = "databricks/databricks-dolly-15k"
page_content_column = "context"


    Create a HuggingFaceDatasetLoader instance and load the data as documents:


loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)
data = loader.load()
print(data[:2]) # Optional: Print the first 2 entries to verify loading


3. Split the documents: Language models have a limit on how much text they can process at once. Splitting large documents into smaller, overlapping chunks ensures that no important context is lost and that each piece of text is a manageable size for embedding and retrieval.

    Import RecursiveCharacterTextSplitter from langchain.text_splitter.
    Create a RecursiveCharacterTextSplitter instance with a chunk_size of 1000 and chunk_overlap of 150:


text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)


    Split the loaded documents:


docs = text_splitter.split_documents(data)
print(docs[0]) # Optional: Print the first document chunk


4. Embed the text: Text needs to be converted into numerical representations (embeddings) so that similar pieces of text can be found efficiently. Using a sentence-transformer model creates embeddings that capture semantic meaning, enabling effective retrieval later.

    Import HuggingFaceEmbeddings from langchain.embeddings.
    Define the model path, model configurations, and encoding options:


modelPath = "sentence-transformers/all-MiniLM-l6-v2"
model_kwargs = {'device':'cpu'}
encode_kwargs = {'normalize_embeddings': False}


    Initialize HuggingFaceEmbeddings:


embeddings = HuggingFaceEmbeddings(
  model_name=modelPath,
  model_kwargs=model_kwargs,
  encode_kwargs=encode_kwargs
)


    (Optional) Test embedding creation:


text = "This is a test document."
query_result = embeddings.embed_query(text)
print(query_result[:3])


5. Create a vector store: A vector store like FAISS indexes the embeddings, allowing fast and scalable similarity searches. This is how the system quickly finds relevant pieces of text when a query is made.

    Import FAISS from langchain.vectorstores.
    Create a FAISS vector store from the document chunks and embeddings:


db = FAISS.from_documents(docs, embeddings)


    Note: This step might take some time depending on your dataset size.


6. Prepare the LLM model: The Language Model is responsible for generating answers based on retrieved documents. Loading a pre-trained model and wrapping it in a Langchain pipeline makes it easy to integrate with the retrieval system.

    Import necessary classes from transformers and langchain:


from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langchain import HuggingFacePipeline


    Load the tokenizer and question-answering model:


tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")
model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")


    Create a question-answering pipeline:


model_name = "Intel/dynamic_tinybert"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding=True, truncation=True, max_length=512)
Youtubeer = pipeline(
  "question-answering",
  model=model_name,
  tokenizer=tokenizer,
  return_tensors='pt'
)


    Create a Langchain pipeline wrapper:


llm = HuggingFacePipeline(
  pipeline=Youtubeer,
  model_kwargs={"temperature": 0.7, "max_length": 512},
)


7. Build the Retrieval QA Chain: The Retrieval QA Chain connects the retriever (which finds relevant documents) with the LLM (which generates answers). This chain enables the full RAG process, where the system retrieves helpful context and then answers the user’s query based on that context.

    Import RetrievalQA from langchain.chains.
    Create a retriever from your FAISS database:


retriever = db.as_retriever(search_kwargs={"k": 4}) # Optional: You can adjust k for number of documents retrieved


    Build the RetrievalQA chain:


qa = RetrievalQA.from_chain_type(llm=llm, chain_type="refine", retriever=retriever, return_source_documents=False)


8. Test your RAG system: Running a test query allows you to verify that all components are working together. This step ensures that documents are retrieved correctly and that the model generates meaningful answers based on the retrieved context.

    Define your question:


question = "What is cheesemaking?"


    Run the QA chain and print the result:


result = qa.run({"query": question})
print(result) # Or print(result["result"]) if the output is a dictionary


In [1]:
from langchain_community.document_loaders import HuggingFaceDatasetLoader

dataset_name = "databricks/databricks-dolly-15k"
page_content_column = "context"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)
data = loader.load()

print(data[:2])  # Vérification

README.md: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


databricks-dolly-15k.jsonl:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/15011 [00:00<?, ? examples/s]

[Document(metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}, page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia\'s domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."'), Document(metadata={'instruction': 'Which is a species of fish? Tope or Rope', 'response': 'Tope', 'category': 'classification'}, page_content='""')]


In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(data)

print(docs[0])  # Vérification

page_content='"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney."' metadata={'instruction': 'When did Virgin Australia start operating?', 'response': 'Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.', 'category': 'closed_qa'}


In [3]:
from langchain.embeddings import HuggingFaceEmbeddings

modelPath = "sentence-transformers/all-MiniLM-l6-v2"
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': False}
)

# Test optionnel
query_result = embeddings.embed_query("This is a test document.")
print(query_result[:3])

  embeddings = HuggingFaceEmbeddings(





modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

[-0.038338545709848404, 0.1234646886587143, -0.028642931953072548]


In [4]:
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

KeyboardInterrupt: 

In [None]:
from transformers import AutoTokenizer, pipeline
from langchain.llms import HuggingFacePipeline

model_name = "Intel/dynamic_tinybert"
tokenizer = AutoTokenizer.from_pretrained(model_name)

qa_pipeline = pipeline(
    "question-answering",
    model=model_name,
    tokenizer=tokenizer,
    return_tensors="pt"
)

llm = HuggingFacePipeline(
    pipeline=qa_pipeline,
    model_kwargs={"temperature": 0.7, "max_length": 512}
)