# <span style="color:#2E86C1; font-size:2.5em; font-family:Georgia; font-weight:bold;">MODÈLE ROBERTA</span>

---
<div style="background-color: #FFB6C1 ; padding: 10px; border-radius: 5px;">
Après avoir testé plusieurs modèles, notamment <strong>GPT-2</strong>, <strong>LLaMA-2</strong>, et <strong>RoBERTa</strong>, notre choix s’est porté sur <strong>RoBERTa</strong>. Ce modèle a démontré une performance supérieure pour la compréhension et l’analyse de texte complexe. Contrairement à <strong>GPT-2</strong>, conçu principalement pour la génération de texte, <strong>RoBERTa</strong> se distingue par sa capacité à saisir les nuances contextuelles et à exceller dans des tâches telles que l'analyse de sentiments et la classification de texte. Bien que <strong>LLaMA-2</strong> soit également performant, <strong>RoBERTa</strong> s'est avéré plus précis et mieux adapté à notre jeu de données. Son efficacité et sa précision en font un choix optimal pour les applications nécessitant une compréhension fine du langage naturel.
</div>


## Importation des biblios

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
from transformers import AutoModelForQuestionAnswering
from datasets import load_dataset
import time
from rouge_score import rouge_scorer
import evaluate
import warnings
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import faiss
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from sklearn.metrics.pairwise import cosine_similarity
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from gym import spaces
import pandas as pd



In [None]:
!pip install bitsandbytes -U

Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl (122.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.44.1


In [None]:
from huggingface_hub import login
login('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')

## 1-Zero shot prompt engineering

<div style="background-color: #87CEEB; padding: 10px; border-radius: 5px;">
    En apprentissage zéro-shot, le modèle est amené à accomplir une tâche sans avoir été explicitement formé à cette tâche. Par exemple, en lui posant une question comme "Quelles sont les métriques clés pour mesurer le succès des entreprises dans l'industrie technologique ?", on lui demande de générer une réponse sans formation spécifique sur cette tâche. Le modèle doit alors s'appuyer sur ses connaissances préexistantes acquises lors de sa phase de pré-entraînement pour répondre à la question. Cela repose sur sa capacité à généraliser et à utiliser l'information disponible pour accomplir des tâches pour lesquelles il n'a pas reçu d'exemples directs pendant l'entraînement.
</div>

In [None]:
def load_model_and_tokenizer(model_path: str):
    model = AutoModelForQuestionAnswering.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    return model, tokenizer

def create_qa_pipeline(model, tokenizer):
    return pipeline("question-answering", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)

def generate_answer(qa_pipeline, question: str, context: str) -> str:
    result = qa_pipeline(question=question, context=context)
    return result[0]['answer'] if isinstance(result, list) else result['answer']

def evaluate_answer(reference_answer, generated_answer):
    if isinstance(reference_answer, list):
        reference_answer = ' '.join([str(item) for item in reference_answer])
    if isinstance(generated_answer, list):
        generated_answer = ' '.join([item['answer'] for item in generated_answer])

    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    rouge_scores = scorer.score(reference_answer, generated_answer)
    return rouge_scores


def calculate_execution_time(func, *args, **kwargs):
    start_time = time.time()
    result = func(*args, **kwargs)
    end_time = time.time()
    execution_time = end_time - start_time
    return result, execution_time



def print_model_params(model):
    param_count = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Model Parameter Count: {param_count}")

def test_model_with_different_configurations(model_path: str, context: str, question: str, reference_answer: str):
    model, tokenizer = load_model_and_tokenizer(model_path)
    qa_pipeline = create_qa_pipeline(model, tokenizer)

    answer, execution_time = calculate_execution_time(generate_answer, qa_pipeline, question, context)
    print(f"Generated Answer: {answer}")
    print(f"Execution Time: {execution_time:.4f} seconds")

    rouge_scores = evaluate_answer(reference_answer, answer)
    print(f"ROUGE Scores: {rouge_scores}")

    print_model_params(model)




In [None]:
if __name__ == "__main__":
    model_path = "deepset/roberta-base-squad2"
    question = "What is the capital of France?"
    context = "France is a country in Western Europe. The capital of France is Paris."
    reference_answer = "The capital of france is Paris"
    test_model_with_different_configurations(model_path, context, question, reference_answer)

Question: What is the capital of France?
Generated Answer: Paris
Execution Time: 0.0114 seconds
ROUGE Scores: {'rouge1': Score(precision=1.0, recall=0.16666666666666666, fmeasure=0.2857142857142857), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=1.0, recall=0.16666666666666666, fmeasure=0.2857142857142857)}
--------------------------------------------------
Model Parameter Count: 124056578


<h3 style="color: #007acc;">Résultats pour la question : "Quelle est la capitale de la France ?"</h3>

<ul>
    <li><strong style="color: #28a745;">Réponse générée</strong> : Le modèle a correctement généré <strong>"Paris"</strong> comme réponse.</li>
    <li><strong style="color: #28a745;">Temps d'exécution</strong> : Le modèle a mis <strong>0,0114 secondes</strong> pour générer la réponse. Ce temps est très rapide, comme on peut s'y attendre pour la plupart des modèles pré-entraînés modernes lorsqu'ils sont exécutés sur un GPU.</li>
</ul>

<hr style="border: 1px solid #ddd;" />

<h4 style="color: #007acc;">Scores ROUGE :</h4>

<ul>
    <li><strong style="color: #ff5733;">ROUGE-1</strong> :
        <ul>
            <li><strong>Précision</strong> : <span style="color: #28a745;">1.0</span></li>
            <li><strong>Rappel</strong> : <span style="color: #dc3545;">0.167</span></li>
            <li><strong>F-mesure</strong> : <span style="color: #ffc107;">0.286</span></li>
        </ul>
        <p style="color: #333333;">
            Ce score indique que la réponse générée par le modèle (<em>"Paris"</em>) contient certains mots pertinents de la réponse de référence. Cependant, le rappel est faible, suggérant que la réponse de référence contient plus d'informations pertinentes que la réponse du modèle.
        </p>
    </li>

<li>
<strong style="color: #ff5733;">ROUGE-2</strong> :
        <ul>
            <li><strong>Précision</strong> : <span style="color: #dc3545;">0.0</span></li>
            <li><strong>Rappel</strong> : <span style="color: #dc3545;">0.0</span></li>
            <li><strong>F-mesure</strong> : <span style="color: #dc3545;">0.0</span></li>
        </ul>
        <p style="color: #333333;">
            Le score ROUGE-2 est nul, ce qui signifie qu'aucun bigramme (combinaison de deux mots) de la réponse générée ne correspond à ceux de la réponse de référence.
        </p>
    </li>

<li><strong style="color: #ff5733;">ROUGE-L</strong> :
        <ul>
            <li><strong>Précision</strong> : <span style="color: #28a745;">1.0</span></li>
            <li><strong>Rappel</strong> : <span style="color: #dc3545;">0.167</span></li>
            <li><strong>F-mesure</strong> : <span style="color: #ffc107;">0.286</span></li>
        </ul>
        <p style="color: #333333;">
            Similaire à ROUGE-1, mais prenant en compte les plus longues sous-séquences communes, ce score montre que la séquence de mots communs entre la réponse générée et la réponse de référence est très limitée.
        </p>
    </li>
</ul>

<hr style="border: 1px solid #ddd;" />

<h4 style="color: #007acc;">Paramètres du modèle :</h4>

<ul>
    <li><strong style="color: #28a745;">Nombre de paramètres</strong> : 124,056,578</li>
    <p style="color: #333333;">
        Ce nombre de paramètres est typique pour un modèle de transformeur pré-entraîné de grande taille.
    </p>
</ul>


## <strong style="color: #007acc;">Conclusion :</strong>

<p style="color: #333333;">
    La fonction fonctionne comme prévu, et les résultats donnent un aperçu des performances du modèle pour la tâche de questions-réponses. Les scores ROUGE indiquent que bien que la réponse soit correcte, le chevauchement avec la réponse de référence (s'il s'agissait d'une réponse de plusieurs mots) peut être limité.
</p>


In [None]:
if __name__ == "__main__":
    model_path = "meta-llama/Llama-2-7b-hf"
    question = "What is the capital of France?"
    context = "France is a country in Western Europe. The capital of France is Paris."
    reference_answer = "The capital of france is Paris"
    test_model_with_different_configurations(model_path, context, question, reference_answer)

## 2-One shot prompt engineering

<div style="background-color: #87CEEB; padding: 10px; border-radius: 5px;">  L'ingénierie de prompt "One-Shot" (ou "prompt à un seul essai") désigne une approche où un modèle d'IA, comme un modèle de langage, reçoit un seul exemple de tâche ou de question avant d'être utilisé pour générer une réponse. Contrairement à l'approche "few-shot" (avec quelques exemples) ou "zero-shot" (sans exemple), l'approche "one-shot" se base sur l'idée qu'un seul exemple suffira à guider correctement le modèle pour accomplir la tâche demandée.

In [None]:
def generate_answer_one_shot(qa_pipeline, question: str, context: str) -> str:
    """
    Generates an answer based on a one-shot prompt.
    This function uses multiple example (question-answer pairs) to teach the model how to answer a question.
    """
    one_shot_prompt = f"""
    Context: {context}

    Example 1:
    Question: What is the capital of France?
    Answer: Paris

    Question: {question}
    Answer:
    """

    inputs = {
        'question': question,
        'context': context
    }

    result = qa_pipeline(inputs)

    return result['answer']

def evaluate_answer_one_shot(reference_answer: str, generated_answer: str):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    rouge_scores = scorer.score(reference_answer, generated_answer)
    return rouge_scores

def test_model_with_different_configurations(model_path: str, context: str, question: str, reference_answer: str):
    model, tokenizer = load_model_and_tokenizer(model_path)
    qa_pipeline = create_qa_pipeline(model, tokenizer)
    generated_answer, execution_time = calculate_execution_time(generate_answer_one_shot, qa_pipeline, question, context)
    rouge_scores = evaluate_answer_one_shot(reference_answer, generated_answer)

    print(f"Question: {question}")
    print(f"Generated Answer: {generated_answer}")
    print(f"Execution Time: {execution_time:.4f} seconds")
    print(f"ROUGE Scores: {rouge_scores}")
    print("-" * 50)

    print_model_params(model)

def print_model_params(model):
    param_count = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Model Parameter Count: {param_count}")



In [None]:
if __name__ == "__main__":
    model_path = "deepset/roberta-base-squad2"

    context = """
    Paris is the capital of France. It is one of the most important cultural and economic centers in Europe.
    The city is known for its art, fashion, and landmarks like the Eiffel Tower and the Louvre Museum.
    """

    question = "Where is the Eiffel Tower located?"
    reference_answer = "Paris, France"

    test_model_with_different_configurations(model_path, context, question, reference_answer)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Question: Where is the Eiffel Tower located?
Generated Answer: Paris
Execution Time: 0.2810 seconds
ROUGE Scores: {'rouge1': Score(precision=1.0, recall=0.5, fmeasure=0.6666666666666666), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=1.0, recall=0.5, fmeasure=0.6666666666666666)}
--------------------------------------------------
Model Parameter Count: 124056578


# Analyse du modèle

Le modèle a bien identifié **Paris** comme étant la localisation de la Tour Eiffel, mais il a omis le contexte plus large de **Paris, France** dans la réponse de référence.

### Résultats ROUGE :
- **Précision** : Le modèle a généré des mots pertinents, mais a manqué des détails importants (ex : le pays **France**).  
  <span style="color: green;">**Bonne précision**</span>, mais manque des éléments de contexte.
- **Rappel** : Le modèle n'a pas capturé tous les éléments de la réponse complète.  
  <span style="color: red;">**Rappel modéré**</span>, le modèle pourrait manquer certaines informations essentielles.
- **ROUGE-2 (Bigrammes)** : L'absence de chevauchement de bigrammes suggère une structure de réponse simplifiée et manquant de nuances.  
  <span style="color: orange;">**Structure simplifiée**</span>, manque de détails dans les relations entre les mots.
- **Exécution rapide** : Le modèle a répondu rapidement, malgré les lacunes de détail.  
  <span style="color: blue;">**Exécution rapide**</span>, mais manque de profondeur dans la réponse.

### Conclusion :
Bien que le modèle ait montré une **bonne précision**, il peut être amélioré en termes de **rappel** et de structure de la réponse pour mieux refléter le contexte complet.  
  <span style="color: purple;">**Amélioration possible en termes de rappel et de structure**</span>.


## 3 -Few shot prompt engineering

<div style="background-color: #87CEEB; padding: 10px; border-radius: 5px;">Le "few-shot prompt engineering" est une technique utilisée dans le traitement automatique du langage naturel (NLP) pour orienter les modèles de langage de grande taille (LLMs) comme GPT à effectuer des tâches spécifiques avec un minimum d'exemples. Ici, "few-shot" signifie fournir au modèle quelques exemples de paires entrée-sortie dans le prompt afin de démontrer le comportement souhaité avant qu'il ne génère la réponse pour une nouvelle requête.

In [None]:
def generate_answer_few_shots(qa_pipeline, question: str, context: str) -> str:
    """
    Generates an answer based on a few-shot prompt.
    This function uses multiple example (question-answer pairs) to teach the model how to answer a question.
    """
    few_shot_prompt = f"""
    Context: {context}

  Example 1:
    Question: What is the capital of France?
    Answer: Paris

    Example 2:
    Question: Where is the Eiffel Tower located?
    Answer: Paris

    Example 3:
    Question: What is Paris known for?
    Answer: Art, fashion, and landmarks like the Eiffel Tower and the Louvre Museum.

    Example 4:
    Question: What museum is in Paris?
    Answer: Louvre Museum

    Example 5:
    Question: What is the population of Paris?
    Answer: Paris has a population of around 2.1 million people within the city limits. The metropolitan area has a population of over 12 million.

    Example 6:
    Question: How old is the Eiffel Tower?
    Answer: The Eiffel Tower was completed in 1889, making it over 130 years old.

    Example 7:
    Question: What is the famous landmark in Paris known for its glass pyramid?
    Answer: The Louvre Museum, which has a famous glass pyramid entrance.

    Example 8:
    Question: What type of cuisine is Paris known for?
    Answer: Paris is famous for French cuisine, which includes dishes like croissants, escargot, and coq au vin.

    Example 9:
    Question: Who designed the Eiffel Tower?
    Answer: The Eiffel Tower was designed by Gustave Eiffel, a French civil engineer.

    Example 10:
    Question: Is Paris a coastal city?
    Answer: No, Paris is located inland along the River Seine and is not a coastal city.

    Example 11:
    Question: What is the population of Paris?
    Answer: Paris has a population of about 2.1 million people, and the metropolitan area has a population of over 12 million people.

    Example 12:
    Question: What is the Eiffel Tower known for?
    Answer: The Eiffel Tower is famous for being a world-renowned symbol of France, offering stunning views of Paris and being a prime tourist attraction.

    Example 13:
    Question: What is the Louvre Museum known for?
    Answer: The Louvre Museum is famous for housing thousands of works of art, including the Mona Lisa, one of the most famous paintings in the world.

    Question: {question}
    Answer:
    """

    result = qa_pipeline(question=question, context=context)

    return result[0]['answer'] if isinstance(result, list) else result['answer']

def test_few_shot(model_path: str, context: str, questions: list, reference_answers: list):
    model, tokenizer = load_model_and_tokenizer(model_path)
    qa_pipeline = create_qa_pipeline(model, tokenizer)

    for question, reference_answer in zip(questions, reference_answers):
        generated_answer, execution_time = calculate_execution_time(generate_answer_few_shots, qa_pipeline, question, context)

        rouge_scores = evaluate_answer_one_shot(reference_answer, generated_answer)

        print(f"Question: {question}")
        print(f"Generated Answer: {generated_answer}")
        print(f"Execution Time: {execution_time:.4f} seconds")
        print(f"ROUGE Scores: {rouge_scores}")
        print("-" * 50)

    print_model_params(model)




In [None]:
if __name__ == "__main__":
    model_path = "deepset/roberta-base-squad2"
    context = """
    Paris is the capital of France, located on the River Seine. It is famous for its landmarks like the Eiffel Tower, the Louvre Museum, and the Notre-Dame Cathedral. Paris has a population of over 2 million people in the city and over 12 million in the metropolitan area.
    The Eiffel Tower was designed by Gustave Eiffel and completed in 1889. It is one of the most iconic landmarks in the world.
    Paris is known for its culinary culture, offering French dishes such as croissants, escargot, and coq au vin.
    The Louvre Museum, which houses thousands of works of art, including the Mona Lisa, is located in Paris.
    """

    questions = [
        "What is the capital of France?",
        "Where is the Eiffel Tower located?",
        "What is Paris known for?",
        "What museum is in Paris?",
        "What is the population of Paris?",
        "How old is the Eiffel Tower?",
        "What is the famous landmark in Paris known for its glass pyramid?",
        "What type of cuisine is Paris known for?",
        "Who designed the Eiffel Tower?",
        "Is Paris a coastal city?"
    ]

    reference_answers = [
        "Paris",
        "Paris",
        "Art, fashion, and landmarks like the Eiffel Tower and the Louvre Museum.",
        "Louvre Museum",
        "Paris has a population of around 2.1 million people within the city limits. The metropolitan area has a population of over 12 million.",
        "The Eiffel Tower was completed in 1889, making it over 130 years old.",
        "The Louvre Museum, which has a famous glass pyramid entrance.",
        "Paris is famous for French cuisine, which includes dishes like croissants, escargot, and coq au vin.",
        "The Eiffel Tower was designed by Gustave Eiffel, a French civil engineer.",
        "No, Paris is located inland along the River Seine and is not a coastal city."
    ]

    test_few_shot(model_path, context, questions, reference_answers)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Question: What is the capital of France?
Generated Answer: Paris
Execution Time: 0.5304 seconds
ROUGE Scores: {'rouge1': Score(precision=1.0, recall=1.0, fmeasure=1.0), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=1.0, recall=1.0, fmeasure=1.0)}
--------------------------------------------------
Question: Where is the Eiffel Tower located?
Generated Answer: Paris
Execution Time: 0.5229 seconds
ROUGE Scores: {'rouge1': Score(precision=1.0, recall=1.0, fmeasure=1.0), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=1.0, recall=1.0, fmeasure=1.0)}
--------------------------------------------------
Question: What is Paris known for?
Generated Answer: culinary culture
Execution Time: 0.4984 seconds
ROUGE Scores: {'rouge1': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=0.0, recall=0.0, fmeasure=0.0)}
----------------------------------

<h3 style="color: #007acc;">Interprétation</h3>

<p style="color: #333333;">
    Le modèle fonctionne bien pour les questions <span style="color: #28a745; font-weight: bold;">factuelles, avec réponses directes</span> (par exemple, sur les capitales, les musées, les designers), en obtenant des <span style="color: #28a745;">scores ROUGE élevés</span>.
</p>

<p style="color: #333333;">
    En revanche, pour les questions plus complexes ou dépendant du contexte, comme celles portant sur un <span style="color: #ff5733; font-style: italic;">“célèbre monument avec une pyramide en verre”</span> ou <span style="color: #ff5733; font-style: italic;">“connu pour”</span>, le modèle n'arrive pas à fournir des réponses suffisamment pertinentes, ce qui entraîne des <span style="color: #dc3545;">scores ROUGE plus faibles</span>.
</p>

<p style="color: #333333;">
    Il serait donc bénéfique d'améliorer la capacité du modèle à comprendre les <span style="color: #007acc; font-weight: bold;">questions nuancées</span> ou plus détaillées pour une meilleure performance sur tous types de requêtes.
</p>

<p style="color: #333333;">
    En résumé, bien que le modèle performe bien sur certains types de requêtes, des améliorations sont possibles pour mieux gérer les <span style="color: #007acc;">questions complexes ou contextuelles</span>, en particulier pour renforcer le rappel et la précision dans les scores ROUGE-2 et ROUGE-L.
</p>


## PEFT Parameter-Efficient Fine-Tuning

<div style="background-color: #87CEEB; padding: 10px; border-radius: 5px;">
Le Parameter-Efficient Fine-Tuning (PEFT) est une technique d'apprentissage automatique qui permet aux modèles de s'adapter à de nouvelles tâches en effectuant des mises à jour minimales de leurs paramètres. Plutôt que de fine-tuner l'ensemble des paramètres du modèle — ce qui peut être coûteux en termes de calcul et de mémoire — PEFT se concentre sur l'ajustement sélectif d'un petit sous-ensemble de paramètres. Cela permet d'apprendre efficacement tout en conservant les connaissances préexistantes du modèle.

Cette approche est particulièrement utile dans des contextes où l'adaptation rapide à de nouvelles tâches est requise, sans avoir à réentraîner le modèle entier, ce qui réduit considérablement les ressources nécessaires.
</div>

In [None]:
qa_model_id = 'deepset/roberta-base-squad2'
corpus_file_path = './finance_qa_dataset.csv'

wandb.init(project="huggingface", entity="amira-khalfi-esprit")

tokenizer = AutoTokenizer.from_pretrained(qa_model_id)
model = RobertaForQuestionAnswering.from_pretrained(qa_model_id)

def apply_lora(model):
    lora_config = LoraConfig(
        r=8,
        lora_alpha=32,
        lora_dropout=0.1,
        bias="none",
        task_type="QUESTION_ANSWERING"
    )
    return get_peft_model(model, lora_config)

model = apply_lora(model)

def load_dataset_from_csv(file_path):
    data = pd.read_csv(file_path)
    print("Columns in dataset:", data.columns)

    data['context'] = data['question'].map(context_data)

    dataset = Dataset.from_pandas(data)
    return dataset

def tokenize_data_with_positions(dataset):
    def tokenize_function(examples):
        answers = examples['answer']
        answer_starts = examples['answer_start']

        encoding = tokenizer(
            examples['question'],
            examples['context'],
            truncation=True,
            padding='max_length',
            max_length=512,
            return_tensors="pt"
        )

        start_positions = []
        end_positions = []

        for i, answer in enumerate(answers):
            start_char = answer_starts[i]
            end_char = start_char + len(answer)

            if start_char < 0 or end_char > len(examples['context'][i]):
                start_position = end_position = 0
            else:
                start_position = encoding.char_to_token(start_char)
                end_position = encoding.char_to_token(end_char - 1)

                if start_position is None or end_position is None:
                    start_position = end_position = 0

            start_positions.append(start_position)
            end_positions.append(end_position)

        encoding['start_positions'] = start_positions
        encoding['end_positions'] = end_positions

        return encoding

    return dataset.map(tokenize_function, batched=True)


def split_dataset(dataset):
    data_pandas = dataset.to_pandas()
    train_data, eval_data = train_test_split(data_pandas, test_size=0.1, random_state=42)
    train_dataset = Dataset.from_pandas(train_data)
    eval_dataset = Dataset.from_pandas(eval_data)
    return train_dataset, eval_dataset

start_time = time.time()
dataset = load_dataset_from_csv(corpus_file_path)
dataset = tokenize_data_with_positions(dataset)
train_dataset, eval_dataset = split_dataset(dataset)
print(f"Dataset preprocessing time: {time.time() - start_time:.2f} seconds")

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy="steps",
    save_steps=500,
    load_best_model_at_end=True,
    report_to="wandb",
    run_name="finance-qa-finetuning"
)

optimizer = AdamW(model.parameters(), lr=5e-5)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    optimizers=(optimizer, None),
)

start_train_time = time.time()
trainer.train()
print(f"Training time: {time.time() - start_train_time:.2f} seconds")

model.save_pretrained('./finance_finetuned_model')
tokenizer.save_pretrained('./finance_finetuned_model')

rouge_metric = evaluate.load("rouge")

def display_model_info(model):
    """
    Display the number of parameters in the model.
    """
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Total parameters: {total_params}")
    print(f"Trainable parameters: {trainable_params}")

def evaluate_model(eval_dataset, model, tokenizer):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    predictions = []
    references = []

    for idx, example in enumerate(eval_dataset):
        inputs = tokenizer(example['question'], example['context'], return_tensors='pt', padding=True, truncation=True)

        inputs = {key: value.to(device) for key, value in inputs.items()}

        with torch.no_grad():
            outputs = model(**inputs)

        start_idx = torch.argmax(outputs.start_logits)
        end_idx = torch.argmax(outputs.end_logits)

        answer = tokenizer.decode(inputs['input_ids'][0][start_idx:end_idx + 1], skip_special_tokens=True)

        predictions.append({
            'id': str(idx),
            'prediction_text': answer
        })

        references.append({
            'id': str(idx),
            'answers': [{'text': example['answer'], 'answer_start': example['answer_start']}]
        })

    rouge_results = rouge_metric.compute(predictions=predictions, references=references)

    return  rouge_results


print(f"Total execution time: {time.time() - start_time:.2f} seconds")
display_model_info(model)

VBox(children=(Label(value='0.023 MB of 0.023 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁
train/global_step,▁

0,1
total_flos,14158947225600.0
train/epoch,3.0
train/global_step,9.0
train_loss,2.47605
train_runtime,4.6
train_samples_per_second,11.739
train_steps_per_second,1.957


Columns in dataset: Index(['question', 'answer', 'context', 'answer_start'], dtype='object')


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset preprocessing time: 0.48 seconds


Step,Training Loss,Validation Loss
10,2.7123,No log
20,2.7136,No log
30,2.4599,No log
40,2.3698,No log
50,2.405,No log
60,2.3014,No log
70,1.641,No log
80,1.5067,No log
90,1.0977,No log
100,0.7347,No log


Training time: 296.36 seconds
Total execution time: 299.24 seconds
Total parameters: 124351490
Trainable parameters: 294912


In [None]:
def main():
    model, tokenizer = prepare_model_and_tokenizer(qa_model_id)
    if model is None or tokenizer is None:
        return
    model = apply_lora(model)
    save_model_and_tokenizer(model, tokenizer)
    qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)
    evaluate_model(qa_pipeline, eval_data)
    torch.cuda.empty_cache()

if __name__ == "__main__":
    main()


The model 'PeftModel' is not supported for question-answering. Supported models are ['AlbertForQuestionAnswering', 'BartForQuestionAnswering', 'BertForQuestionAnswering', 'BigBirdForQuestionAnswering', 'BigBirdPegasusForQuestionAnswering', 'BloomForQuestionAnswering', 'CamembertForQuestionAnswering', 'CanineForQuestionAnswering', 'ConvBertForQuestionAnswering', 'Data2VecTextForQuestionAnswering', 'DebertaForQuestionAnswering', 'DebertaV2ForQuestionAnswering', 'DistilBertForQuestionAnswering', 'ElectraForQuestionAnswering', 'ErnieForQuestionAnswering', 'ErnieMForQuestionAnswering', 'FalconForQuestionAnswering', 'FlaubertForQuestionAnsweringSimple', 'FNetForQuestionAnswering', 'FunnelForQuestionAnswering', 'GPT2ForQuestionAnswering', 'GPTNeoForQuestionAnswering', 'GPTNeoXForQuestionAnswering', 'GPTJForQuestionAnswering', 'IBertForQuestionAnswering', 'LayoutLMv2ForQuestionAnswering', 'LayoutLMv3ForQuestionAnswering', 'LEDForQuestionAnswering', 'LiltForQuestionAnswering', 'LlamaForQuestion

ROUGE score: {'rouge1': 0.888888888888889, 'rouge2': 0.8750000000000001, 'rougeL': 0.888888888888889, 'rougeLsum': 0.888888888888889}


# Évaluation du Score ROUGE

Le score ROUGE (Recall-Oriented Understudy for Gisting Evaluation) est une métrique populaire pour évaluer la génération de texte, la synthèse et la traduction automatique. Voici la répartition du score ROUGE pour le texte généré par rapport à la référence :

## <span style="color: #1E90FF">ROUGE-1</span>
- **Score** : <span style="color: #32CD32">0.89</span>
- **Description** : Mesure le chevauchement des **unigrammes** (mots simples) entre le texte généré et la référence. Un score de 0.89 suggère un **très fort chevauchement** au niveau des unigrammes.

## <span style="color: #1E90FF">ROUGE-2</span>
- **Score** : <span style="color: #32CD32">0.88</span>
- **Description** : Mesure le chevauchement des **bigrams** (paires de mots). Ce score indique une **bonne similarité** au niveau des bigrams.

## <span style="color: #1E90FF">ROUGE-L</span>
- **Score** : <span style="color: #32CD32">0.89</span>
- **Description** : Évalue la **plus longue sous-séquence commune (LCS)**, qui prend en compte l'ordre des mots. Ce score suggère une **forte correspondance séquentielle** entre les textes générés et de référence.

## <span style="color: #1E90FF">ROUGE-Lsum</span>
- **Score** : <span style="color: #32CD32">0.89</span>
- **Description** : Ce score est utilisé pour les **tâches de résumé**, reflétant l'alignement au niveau du résumé. Un score de 0.89 indique un **bon alignement** du résumé généré avec le résumé de référence.

---

### <span style="color: #FF4500">Résumé</span>
- Les scores ROUGE indiquent un **très haut niveau de similarité** entre le contenu généré et la référence, avec une excellente correspondance à la fois en **chevauchement de contenu** (unigrammes et bigrams) et en **structure séquentielle** (LCS).


# Application RAG avec LANGCHAIN

## Système de Question-Réponse avec LangChain, Hugging Face et FAISS

Dans notre travail, nous avons développé un système de **question-réponse** en utilisant **LangChain**, **Hugging Face** et **FAISS** pour la recherche documentaire et la génération de réponses.

## Processus clé :

### <span style="color: green;">Chargement du modèle QA :</span>
Nous avons intégré un modèle pré-entraîné de type **RoBERTa** avec **Hugging Face**, ainsi qu'un tokenizer pour traiter les questions et contextes.

### <span style="color: blue;">Création du pipeline QA :</span>
Un pipeline permet de générer des réponses à partir d'un contexte donné en utilisant le modèle et le tokenizer.

### <span style="color: orange;">Indexation avec FAISS :</span>
Les documents sont indexés sous forme d'embeddings générés par **Sentence-Transformers** et **FAISS** est utilisé pour rechercher les documents pertinents.

### <span style="color: purple;">Récupération des documents :</span>
Lorsqu'une question est posée, **FAISS** recherche les documents les plus pertinents en fonction de l'embedding de la question.

### <span style="color: red;">Génération de la réponse :</span>
Les documents pertinents sont combinés et utilisés pour générer une réponse via le modèle **RoBERTa**.

### <span style="color: brown;">Évaluation avec ROUGE :</span>
La qualité des réponses générées est mesurée à l'aide de la métrique **ROUGE**, en comparant les réponses générées à celles de référence.

---

Ce système combine la **recherche documentaire** et la **génération de texte**, offrant des réponses précises et pertinentes aux questions posées. Le **RAG** (Retrieval-Augmented Generation) est utilisé pour améliorer la génération en augmentant les réponses avec des informations extraites des documents récupérés.


In [None]:
pip install faiss-gpu


Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2


In [None]:
pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.7-py3-none-any.whl.metadata (2.9 kB)
Collecting SQLAlchemy<2.0.36,>=1.4 (from langchain-community)
  Downloading SQLAlchemy-2.0.35-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain-core<0.4.0,>=0.3.17 (from langchain-community)
  Downloading langchain_core-0.3.18-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.6.1-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.23.1-py3-none-any.whl.metadata (7.5 kB)
Collecting ty


# <span style="color: #1E90FF"><strong>Le RAG (Retrieval-Augmented Generation)</strong></span>

Le RAG est un modèle qui combine la recherche de documents avec la génération de texte. Dans notre approche, le RAG est implémenté à travers l'utilisation de **FAISS** pour la récupération des documents et d'un modèle pré-entraîné de génération de texte (comme RoBERTa) pour générer la réponse. Voici où il s'intègre dans les étapes que nous avons définies :

## <span style="color: #32CD32"><strong>Récupération des documents :</strong></span>

Lorsqu'une question est posée, un embedding de la question est généré, puis **FAISS** est utilisé pour rechercher dans une base de données d'index des documents pertinents (c'est la phase de récupération).

## <span style="color: #FF6347"><strong>Génération de la réponse (augmentation par récupération) :</strong></span>

Une fois les documents pertinents récupérés, ils sont combinés pour fournir un contexte qui est passé au modèle génératif (comme RoBERTa) pour générer la réponse.  
Ici, le modèle génératif utilise les documents récupérés pour compléter sa réponse, augmentant ainsi la qualité et la pertinence des réponses générées. Cela constitue le **Retrieval-Augmented Generation**.

Ainsi, le RAG est intégré dans le pipeline entre la récupération des documents et la génération de la réponse, en augmentant la génération avec des informations pertinentes provenant de documents externes.


In [None]:
def load_qa_model(model_name='deepset/roberta-base-squad2'):
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer

def create_qa_pipeline(model, tokenizer):
    return pipeline('question-answering', model=model, tokenizer=tokenizer)

def generate_answer(qa_pipeline, question, context):
    result = qa_pipeline(question=question, context=context)
    return result['answer']

def load_faiss_index(embedding_model, corpus=None):
    if corpus is not None:
        embeddings = embedding_model.embed_documents(corpus)

        embeddings_array = np.array([np.array(embedding) for embedding in embeddings], dtype=np.float32)

        index = faiss.IndexFlatL2(embeddings_array.shape[1])
        index.add(embeddings_array)

        index.documents = corpus
    else:
        index = faiss.IndexFlatL2(768)
        index.documents = []
    return index

def update_faiss_index(index, new_documents, embedding_model):
    new_embeddings = embedding_model.embed_documents(new_documents)

    new_embeddings_array = np.array([np.array(embedding) for embedding in new_embeddings], dtype=np.float32)

    index.add(new_embeddings_array)

    index.documents.extend(new_documents)

def retrieve_documents(query, index, embedding_model, k=5):
    query_embedding = embedding_model.embed_query(query)

    distances, indices = index.search(np.array([query_embedding]).astype(np.float32), k)

    return [index.documents[i] for i in indices[0]]

def test_rag_system(query, reference_answer):
    model, tokenizer = load_qa_model()
    qa_pipeline = create_qa_pipeline(model, tokenizer)
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    corpus = [
      "The stock market is a place where buyers and sellers come together to trade shares of public companies.",
    "The Federal Reserve controls monetary policy and can influence interest rates and inflation.",
    "A mutual fund is a pool of funds collected from many investors for the purpose of investing in securities.",
    "Cryptocurrency is a form of digital or virtual currency that relies on cryptographic methods for security.",
    "Financial planning involves setting goals, assessing your financial situation, and creating a strategy to achieve your goals.",
    "A hedge fund is an investment vehicle that pools capital from accredited individuals or institutional investors to invest in a variety of assets.",
    "An index fund is a type of mutual fund designed to replicate the performance of a specific index.",
    "Bonds are fixed-income securities where investors lend money to an entity for a fixed period of time in exchange for interest payments.",
    "Stocks represent ownership in a company and entitle the shareholder to a portion of the company's profits.",
    "The capital market is a market for buying and selling financial securities, like stocks and bonds.",
    "The bond market refers to the marketplace where participants can issue new debt or buy and sell debt securities.",
    "A certificate of deposit (CD) is a time deposit offered by banks with a fixed interest rate and maturity date.",
    "A savings account is a deposit account held at a financial institution that provides a modest interest rate.",
    "The Dow Jones Industrial Average is a stock market index that tracks 30 large publicly-owned companies in the United States.",
    "The S&P 500 is a stock market index that tracks 500 large companies listed on stock exchanges in the United States.",
    "An exchange-traded fund (ETF) is a type of fund that holds assets like stocks, commodities, or bonds and is traded on a stock exchange.",
    "A credit score is a numerical expression based on a person's credit history, used by lenders to assess creditworthiness.",
    "Interest rates represent the cost of borrowing money, typically expressed as an annual percentage rate (APR).",
    "An IPO (Initial Public Offering) is the process through which a private company offers shares to the public for the first time.",
    "A 401(k) is a retirement savings plan sponsored by an employer that allows employees to save and invest for retirement on a tax-deferred basis.",
    "A mutual fund is an investment vehicle that pools money from many investors to purchase securities.",
    "The Consumer Price Index (CPI) is a measure that examines the weighted average of prices of a basket of consumer goods and services.",
    "The inflation rate refers to the rate at which the general level of prices for goods and services rises and erodes purchasing power.",
    "Financial diversification involves spreading investments across various asset classes to reduce risk.",
    "A stock dividend is a payment made by a corporation to its shareholders, usually in the form of additional shares or cash.",
    "Capital gains are the profits earned from the sale of an asset, such as a stock, bond, or real estate property.",
    "Tax planning is the process of analyzing financial situations to minimize tax liability through various strategies.",
    "The money market is a sector of the financial market in which short-term borrowing and lending takes place.",
    "A mortgage is a loan specifically used to purchase real estate, typically involving regular payments of principal and interest.",
    "A retirement plan is a financial arrangement designed to provide income during retirement years.",
    "Asset allocation is the strategy of distributing investments across various asset classes, such as stocks, bonds, and real estate.",
    "A financial advisor is a professional who helps clients manage their investments, estate planning, and financial goals.",
    "Corporate finance involves managing a company's financial activities, such as investment decisions, capital raising, and risk management.",
    "An annuity is a financial product that provides a series of payments made at equal intervals, often used for retirement income.",
    "An emergency fund is a reserve of money set aside to cover unexpected expenses or financial emergencies.",
    "A credit card allows users to borrow funds up to a limit to make purchases, with interest charged on outstanding balances.",
    "A pension plan is a retirement plan where employers make contributions to a pool of funds set aside for an employee's future benefit.",
    "Debt consolidation is the process of combining multiple debts into a single loan or payment plan to simplify management.",
    "A financial statement is a formal record of a company's financial activities and position, including the balance sheet and income statement.",
    "The balance sheet is a financial statement that reports a company's assets, liabilities, and shareholders' equity at a specific point in time.",
    "An income statement is a financial document that shows a company's revenues and expenses over a specific period of time.",
    "Financial modeling is the process of creating a mathematical representation of a company's financial performance.",
    "Venture capital refers to funding provided to startups or small businesses with high growth potential in exchange for equity.",
    "Private equity refers to investments made in privately held companies, typically through buyouts or direct investments.",
    "A dividend yield is the annual dividend payment divided by the stock's price, representing the return an investor can expect from dividends.",
    "Asset management involves managing investments on behalf of clients, often through mutual funds, ETFs, or other financial products.",
    "Financial leverage involves using borrowed capital to increase the potential return on investment, though it also increases risk.",
    "A liquidity ratio is a financial metric that measures a company's ability to meet its short-term obligations using its liquid assets.",
    "Debt-to-equity ratio is a financial leverage ratio that compares a company's total liabilities to its shareholder equity.",
    "A treasury bond is a debt security issued by the government with a fixed interest rate and a maturity of 10 years or more.",
    "A government bond is a debt instrument issued by a national government to support spending and obligations.",
    "An investment portfolio is a collection of assets held by an individual or institution for the purpose of achieving specific financial goals.",
    "Foreign exchange (Forex) is the global marketplace for trading currencies, driven by factors like interest rates and economic stability.",
    "Financial independence is the state of having sufficient income or wealth to cover all living expenses without needing employment.",
    "Personal finance refers to the management of an individual's or family's financial activities, such as budgeting, investing, and saving.",
    "A wealth manager is a financial advisor who provides specialized services in managing high-net-worth individuals' assets and investments.",
    "Crowdfunding involves raising small amounts of money from a large number of people, typically via the internet, to fund a project or venture.",
    "A commodity is a basic good used in commerce that is interchangeable with other goods of the same type, such as oil or gold.",
    "A real estate investment trust (REIT) is a company that owns, operates, or finances income-producing real estate.",
    "A cryptocurrency wallet is a digital tool used to store and manage cryptocurrency assets like Bitcoin or Ethereum.",
    "A decentralized finance (DeFi) platform is a blockchain-based financial service that operates without a centralized authority.",
    "The term 'blockchain' refers to a distributed ledger technology used to securely store data in a decentralized manner.",
    "A smart contract is a self-executing contract with terms directly written into code that automatically enforces the contract's terms.",
    "Financial risk management involves identifying, analyzing, and mitigating risks to minimize the financial impact of uncertain events.",
    "A stock buyback occurs when a company repurchases its own shares from the market, reducing the number of outstanding shares.",
    "Market capitalization refers to the total value of a company's outstanding shares of stock, calculated by multiplying share price by shares outstanding.",
    "A short sale occurs when an investor borrows shares to sell them at a high price, hoping to buy them back later at a lower price.",
    "A portfolio manager is a professional responsible for making investment decisions and managing an investment portfolio on behalf of clients.",
    "Financial leverage can increase potential returns but also amplifies risk, especially if investments do not perform as expected.",
    "A margin account allows an investor to borrow funds from a broker to purchase securities, increasing buying power.",
    "A credit default swap is a financial derivative contract that allows investors to swap the credit risk of bond issues.",
    "The term 'bear market' refers to a market condition in which asset prices are falling or expected to fall.",
    "The term 'bull market' refers to a market condition where asset prices are rising or expected to rise.",
    "The price-to-earnings (P/E) ratio is a valuation ratio calculated by dividing a company's share price by its earnings per share (EPS).",
    "A blue-chip stock refers to shares of a well-established company with a history of stable performance and reliability.",
    "The bond yield curve is a graph that shows the relationship between bond yields and maturities for bonds of similar credit quality.",
    "An economic recession is a significant decline in economic activity spread across the economy, lasting for several months or more.",
    "Interest rate hikes typically lead to reduced borrowing and spending, which can slow down economic growth and lower inflation.",
    "A bank run occurs when a large number of depositors attempt to withdraw their funds simultaneously, fearing the bank's insolvency.",
    "Behavioral finance studies the psychological influences on investor decisions and how they affect market outcomes.",
    "A financial crisis is a situation in which the value of financial assets or institutions drops rapidly, potentially leading to systemic instability.",
    "Sustainable investing focuses on investments that generate social and environmental benefits alongside financial returns.",
    "Impact investing seeks to generate a positive social or environmental impact alongside financial returns."
    ]
    index = load_faiss_index(embedding_model, corpus)

    new_documents = [
        "Machine learning is a subset of artificial intelligence that enables systems to improve from experience.",
        "Bitcoin is a decentralized digital currency that operates without a central authority or government."
    ]
    update_faiss_index(index, new_documents, embedding_model)

    documents = retrieve_documents(query, index, embedding_model)

    context = "\n".join(documents)

    generated_answer = generate_answer(qa_pipeline, query, context)
    print(f"Generated Answer: {generated_answer}")

    rouge_scores = evaluate_rouge(reference_answer, generated_answer)
    print(f"ROUGE Scores: {rouge_scores}")

def evaluate_rouge(reference, generated):
    from rouge_score import rouge_scorer
    scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
    scores = scorer.score(reference, generated)
    return scores

# Exemple d'utilisation
query = "What is the stock market?"
reference_answer = "The stock market is a place where buyers and sellers trade shares of public companies."
test_rag_system(query, reference_answer)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Generated Answer: a place where buyers and sellers come together to trade shares of public companies
ROUGE Scores: {'rouge1': Score(precision=0.7857142857142857, recall=0.7333333333333333, fmeasure=0.7586206896551724), 'rouge2': Score(precision=0.6923076923076923, recall=0.6428571428571429, fmeasure=0.6666666666666666), 'rougeL': Score(precision=0.7857142857142857, recall=0.7333333333333333, fmeasure=0.7586206896551724)}


### ROUGE Scores Analysis

#### ROUGE-1
- **Precision**: <span style="color: green;">0.7857</span>  
  Mesure combien des unigrams (mots individuels) générés par le modèle apparaissent dans la référence.
- **Recall**: <span style="color: orange;">0.7333</span>  
 Mesure combien des unigrams de la référence se trouvent dans le texte généré.
- **F-Measure**: <span style="color: blue;">0.7586</span>  
  Une moyenne harmonique de la précision et du rappel, offrant une vue équilibrée des performances du modèle.

**Interpretation**:  

Le modèle performe bien dans la correspondance des unigrams, avec une précision de **78,57%** et un rappel de **73,33%.** La  F measure de **75,86%** suggère une forte performance globale dans la correspondance des unigrams.

---

#### ROUGE-2
- **Precision**: <span style="color: green;">0.6923</span>  
  Measures how many of the bigrams (pairs of consecutive words) generated by the model match the reference.
- **Recall**: <span style="color: orange;">0.6429</span>  
  Measures how many of the bigrams from the reference are found in the generated text.
- **F-Measure**: <span style="color: blue;">0.6667</span>  
  Une mesure équilibrée de la précision et du rappel pour les bigrammes.
**Interpretation**:  
Le modèle performe légèrement moins bien avec les bigrammes, obtenant une précision de **69,23%** et un rappel de **64,29%**. La mesure F de **66,67%** suggère qu'il y a encore de la marge pour améliorer la capture des relations entre les bigrammes.

---

#### ROUGE-L
- **Precision**: <span style="color: green;">0.7857</span>  
- **Recall**: <span style="color: orange;">0.7333</span>  
- **F-Measure**: <span style="color: blue;">0.7586</span>  

**Interpretation**:  
ROUGE-L évalue la plus longue sous-séquence commune (LCS), en tenant compte de l'ordre des mots. Les scores sont identiques à ceux de ROUGE-1, ce qui indique que le texte généré s'aligne bien avec la référence en termes de plus longues sous-séquences.
---

### **Summary Interpretation**:
- Les **hauts scores** ROUGE-1 et ROUGE-L suggèrent que le modèle performe bien dans la correspondance globale des mots et des sous-séquences.
  
- Dans l'ensemble, le modèle est pertinent



## Comparaison finale :
 les meilleurs résultats sont données par RAG suivi du peft suivi du prompt engineering few shot