Deux limitations des LLMs (modèles de langage de grande taille) :
Fenêtre de contexte : la quantité d'informations que l'on peut fournir en entrée est limitée.

Apprentissage spécifique : ils sont performants principalement sur les données sur lesquelles ils ont été entraînés.

== Pour contourner ces limitations, nous allons utiliser le RAG (Retrieval-Augmented Generation).

Two limitations of LLMs (Large Language Models):
Context window: the amount of input they can process at once is limited.

Training scope: they perform best on data they were specifically trained on.

== To overcome these limitations, we will use RAG (Retrieval-Augmented Generation).

A RAG system has three stages: Indexing, Retrieval, and Generation. It retrieves relevant documents at query time and uses them with the user’s prompt to generate accurate, customized answers—even with unseen data.
In this notebook you will use the Gemini API to create a vector database, retrieve answers to questions from the database and generate a final answer. You will use Chroma, an open-source vector database. With Chroma, you can store embeddings alongside metadata, embed documents and queries, and search your documents.

Un système RAG comprend trois étapes : indexation, récupération et génération. Il retrouve les documents pertinents à la demande et les utilise avec la requête pour générer des réponses précises, même avec des données jamais vues auparavant.

In [32]:
!pip install -qU "google-genai==1.7.0" "chromadb==0.6.3"
!pip install python-dotenv
!pip install google-api-core


[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting google-api-core
  Downloading google_api_core-2.24.2-py3-none-any.whl.metadata (3.0 kB)
Collecting proto-plus<2.0.0,>=1.22.3 (from google-api-core)
  Downloading proto_plus-1.26.1-py3-none-any.whl.metadata (2.2 kB)
Downloading google_api_core-2.24.2-py3-none-any.whl (160 kB)
Downloading proto_plus-1.26.1-py3-none-any.whl (50 kB)
Installing collected packages: proto-plus, google-api-core
Successfully installed google-api-core-2.24.2 proto-plus-1.26.1



[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [30]:
!pip install "chromadb==0.6.3"




[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Nous allons importer tous les packages necessaires

In [29]:
from google import genai
from google.genai import types

from IPython.display import Markdown

genai.__version__

'1.7.0'

In [None]:
from dotenv import load_dotenv
import os

# Charge le .env situé dans le même dossier
load_dotenv()

# Vérifie que la variable est bien chargée
api = os.getenv("GOOGLE_API_KEY")

if api:
    print(".env bien chargé.")
    print(f"Clé détectée (début) : {api[:5]}******")
else:
    print(" Clé non trouvée. Vérifie le nom dans .env ou sa syntaxe.")


✅ .env bien chargé.
Clé détectée (début) : AIzaS******


In [15]:
client = genai.Client(api_key=api)

for m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

models/embedding-001
models/text-embedding-004
models/gemini-embedding-exp-03-07
models/gemini-embedding-exp


Nous allons importer nos données

In [20]:
#ILM DATA WEBSCRAPPING

import re

def split_text_into_chunks(text, max_chunk_size=800):
    # Découpe aux fins de phrases (., ?, !) tout en respectant une taille max
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    current_chunk = ""

    for sentence in sentences:
        if len(current_chunk) + len(sentence) <= max_chunk_size:
            current_chunk += sentence + " "
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + " "
    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

# Lecture du fichier
with open("output_impact_all.txt", "r", encoding="utf-8") as f:
    text = f.read()

chunks = split_text_into_chunks(text)

# Exemple : afficher les 3 premiers
for i, chunk in enumerate(chunks[5:9]):
    print(f"\n--- Chunk {i+1} ---\n{chunk}")



--- Chunk 1 ---
Impact Language oversees a school of excellence and cultural diversity

--- Contenu structuré extrait de : https://impactlanguagementors.com/fr/facilities/ ---

 Ventilated room
Some quick example text to build on the card title and make up the bulk of the card’s content. Air-conditioned room
Some quick example text to build on the card title and make up the bulk of the card’s content. Kitchen
Existence of a large kitchen. There is a gas bottle and a burner in the kitchen which is shared between neighbors. You will nevertheless need to obtain your own cooking utensils. Canteen
The full canteen is 120,000 CFA per month (breakfast, lunch and dinner)

 Class Rooms
Existence of a large kitchen. There is a gas bottle and a burner in the kitchen which is shared between neighbors.

--- Chunk 2 ---
You will nevertheless need to obtain your own cooking utensils. Friendly
The full canteen is 120,000 CFA per month (breakfast, lunch and dinner)
Impact Language oversees a school of

In [25]:
Document = []  # Initialisation de la liste

for i, chunk in enumerate(chunks, start=0):
    Document.append( chunk) # Ajout du nom et du contenu sous forme de tuple

print(Document)  # Affichage du résultat



['--- Contenu structuré extrait de : https://impactlanguagementors.com/fr/ ---\n\n Welcome to ILM\nWe are accredited by GES (Ghana Education Service) and language departments consisting of English, French and Spanish. Within which we offer courses like General English, French, Spanish: Intensive (25 hours per week) and Non-Intensive (15 hours per week), Professional English and Exam Preparation: TOFEL, IETLS, SAT, GMAT\n\nOur Programs\nGeneral & Accelerated English Courses\nGeneral English courses are ideal for those\nwho need to improve their English skills\nbefore study, work or pleasure. Professional English Courses\nBusiness English courses will improve your presentation skills, interview techniques as well as your management and negotiation abilities.', 'Preparation For TOEFL, TOEIC,   IETLS Tests\nAfter completing your general English courses, the next step is exam preparation for IETLS, TOEIC, TOFEL. Online English Courses\nWe bring ILM to your home! Our General English course i

In [33]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

from google.genai import types


# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

In [35]:
import chromadb

DB_NAME = "ILM"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=Document, ids=[str(i) for i in range(len(Document))])

In [36]:
db.count()

9

Retrieval: Find relevant documents
To search the Chroma database, call the query method. Note that you also switch to the retrieval_query mode of embedding generation.

In [38]:
# Switch to query mode when generating embeddings.
embed_fn.document_mode = False

# Search the Chroma DB using the specified query.
query = "what is your address?"

result = db.query(query_texts=[query], n_results=1)
[all_passages] = result["documents"]

Markdown(all_passages[0])

Travel Plans - By Air or Road
If you are thinking of coming to Ghana by land or by air, do not hesitate to write or contact us now for more information. Impact Language oversees a school of excellence and cultural diversity

--- Contenu structuré extrait de : https://impactlanguagementors.com/fr/contact-us/ ---

 Our Address
Dawhenya, Methodist Junction, Tena, Ghana. +233559643898
+233268896068
+233260909144
Impact Language oversees a school of excellence and cultural diversity

Augmented generation: Answer the question
Now that you have found a relevant passage from the set of documents (the retrieval step), you can now assemble a generation prompt to have the Gemini API generate a final answer. Note that in this example only a single passage was retrieved. In practice, especially when the size of your underlying data is large, you will want to retrieve more than one result and let the Gemini model determine what passages are relevant in answering the question. For this reason it's OK if some retrieved passages are not directly related to the question - this generation step should ignore them.

In [39]:
query_oneline = query.replace("\n", " ")

# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. 
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: {query_oneline}
"""

# Add the retrieved documents to the prompt.
for passage in all_passages:
    passage_oneline = passage.replace("\n", " ")
    prompt += f"PASSAGE: {passage_oneline}\n"

print(prompt)

You are a helpful and informative bot that answers questions using text from the reference passage included below. 
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: what is your address?
PASSAGE: Travel Plans - By Air or Road If you are thinking of coming to Ghana by land or by air, do not hesitate to write or contact us now for more information. Impact Language oversees a school of excellence and cultural diversity  --- Contenu structuré extrait de : https://impactlanguagementors.com/fr/contact-us/ ---   Our Address Dawhenya, Methodist Junction, Tena, Ghana. +233559643898 +233268896068 +233260909144 Impact Language oversees a school of excellence and cultural diversity



In [40]:
answer = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt)

Markdown(answer.text)

According to the provided passage, Impact Language is located in Dawhenya, at the Methodist Junction in Tena, Ghana.


In [None]:
def answer (query): 
    # Switch to query mode when generating embeddings.
    embed_fn.document_mode = False


    result = db.query(query_texts=[query], n_results=1)
    [all_passages] = result["documents"]

    query_oneline = query.replace("\n", " ")

    # This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
    prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. 
    Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
    However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
    strike a friendly and converstional tone. If the passage is irrelevant to the answer, just say 
    Concact ILM on their phone numbers are +233559643898, +233268896068, and +233260909144 for more directions.

    QUESTION: {query_oneline}
    """

    # Add the retrieved documents to the prompt.
    for passage in all_passages:
        passage_oneline = passage.replace("\n", " ")
        prompt += f"PASSAGE: {passage_oneline}\n"

    answer = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=prompt)

    return Markdown(answer.text)

In [43]:
# Search the Chroma DB using the specified query.
query = "what is your address and phone numbers?"
answer (query)

The address of Impact Language is Dawhenya, Methodist Junction, Tena, Ghana, and their phone numbers are +233559643898, +233268896068, and +233260909144.


In [44]:
# Search the Chroma DB using the specified query.
query = "what differentiate you with others schools?"
answer (query)

Impact Language is unique because it's a school with cultural diversity that is officially recognized by the Ghana Education Service (GES), as well as offering language courses in English, French, and Spanish.


In [50]:
# Search the Chroma DB using the specified query.
query = "How to Get to ILM ?"
answer (query)

I am sorry, the provided passage does not contain specific directions on how to get to ILM. Concact ILM on their phone numbers are +233559643898, +233268896068, and +233260909144 for more directions.
