# Prompt Engineering: Cas d'utilisation de l'IA générative pour résumer des dialogues

Dans ce notebook, nous allons réaliser la tâche de résumé de dialogues en utilisant l'IA générative. 
Nous verrons comment le texte d'entrée affecte la sortie d'un modèle de LLM, et effectuerons du Prompt Engineering pour l'orienter vers la tâche dont on a besoin.  
En comparant les techniques de zéro, one et few shot inference, nous verrons comment tout cela peut améliorer la sortie générative des LLMs.

# Table of Contents

- [ 1 - Installation des Dépendances Requises](#1)
- [ 2 - Résumer un Dialogue sans Ingénierie de Prompt](#2)
- [ 3 - Résumer un Dialogue avec un Prompt d'Instruction](#3)
  - [ 3.1 - Inférence Zero Shot avec un Instruction Prompt](#3.1)
  - [ 3.2 - Inférence Zero Shot avec le Template de Prompt de FLAN-T5](#3.2)
- [ 4 - Résumer un Dialogue avec One Shot et Few Shot Inference](#4)
  - [ 4.1 - Inférence One Shot](#4.1)
  - [ 4.2 - Inférence Few Shot](#4.2)
- [ 5 - Paramètres de Configuration Générative pour l'Inférence](#5)

<a name='1'></a>
## 1 - Installation des Dépendances Requises

In [1]:
!pip install datasets
!pip install torch torchdata
!pip install transformers



In [3]:
# charger le dataset, le LLM, le tokenizer et le configurateur.

from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

<a name='2'></a>
## 2 - Résumer un Dialogue sans Prompt Engineering

Dans ce cas d'utilisation, on va générer un résumé d'un dialogue avec le modèle LLM pré-entraîné FLAN-T5 de Hugging Face.

Téléchargeons quelques dialogues simples du dataset [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) de Hugging Face. Ce dataset contient plus de 10 000 dialogues avec les résumés et les sujets correspondants étiquetés manuellement.

In [4]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 11.3M/11.3M [00:00<00:00, 46.4MB/s]
Downloading data: 100%|██████████| 442k/442k [00:00<00:00, 1.98MB/s]
Downloading data: 100%|██████████| 1.35M/1.35M [00:00<00:00, 4.10MB/s]


Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [27]:
dash_line = '-'.join('' for x in range(100))
example_indices = [202, 74]

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    
    print(dash_line)
    print("\n\n\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Where to, miss?
#Person2#: Hi! Crenshaw and Hawthorne, at the Holiday Inn that is on that corner.
#Person1#: Sure thing. So, where are you flying in from?
#Person2#: From China.
#Person1#: Really? You don't look very Chinese to me, if you don't mind me saying so.
#Person2#: It's fine. I am actually from Mexico. I was in China on a business trip, visiting some local companies that manufacture bathroom products.
#Person1#: Wow sounds interesting! Excuse me if I am being a bit nosy but, how old are you?
#Person2#: Don't you know it's rude to ask a lady her age?
#Person1#: Don't get me wrong! It's just that you seem so young and already doing business overseas!
#Person2#: Well thank you! In that case, I am 26 years old, and what about yourself?
#Person1#: 

On charge le modèle [FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5), en créant une instance de la classe`AutoModelForSeq2SeqLM` avec la méthode`.from_pretrained()`. 

In [28]:
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

On télécharge le tokenizer pour le modèle FLAN-T5 en utilisant la méthode `AutoTokenizer.from_pretrained()`.    

Le paramètre `use_fast` active le tokenizer rapide - voir [documentation](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/auto#transformers.AutoTokenizer).

In [29]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Testons l'encodage et le décodage d'une phrase simple par le tokenizer :

In [30]:
sentence = "Burkina Fase is a west Africa country"

sentence_encoded = tokenizer(sentence, return_tensors='pt')
print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])

ENCODED SENTENCE:
tensor([4152, 2917,    9, 1699,    7,   15,   19,    3,    9, 4653, 2648,  684,
           1])


In [31]:
sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0], 
        skip_special_tokens=True
    )

print('\nDECODED SENTENCE:')
print(sentence_decoded)


DECODED SENTENCE:
Burkina Fase is a west Africa country


Il est maintenant temps d'explorer comment le **LLM de base** résume un dialogue sans Prompt Engineering.   

**Le Prompt Engineering** est un acte par lequel un humain modifie le prompt** (entrée) afin d'améliorer la réponse pour une tâche donnée.

In [32]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    inputs = tokenizer(dialogue, return_tensors='pt')
    generation = model.generate( inputs["input_ids"], 
                                 max_new_tokens=50,
                                )[0]
    
    output = tokenizer.decode(generation, skip_special_tokens=True)
    
    print(dash_line)
    print('Example ', i + 1)
    
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')
    print("\n\n\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: Where to, miss?
#Person2#: Hi! Crenshaw and Hawthorne, at the Holiday Inn that is on that corner.
#Person1#: Sure thing. So, where are you flying in from?
#Person2#: From China.
#Person1#: Really? You don't look very Chinese to me, if you don't mind me saying so.
#Person2#: It's fine. I am actually from Mexico. I was in China on a business trip, visiting some local companies that manufacture bathroom products.
#Person1#: Wow sounds interesting! Excuse me if I am being a bit nosy but, how old are you?
#Person2#: Don't you know it's rude to ask a lady her age?
#Person1#: Don't get me wrong! It's just that you seem so young and already doing business overseas!
#Person2#: Well thank you! In that case, I am 26 years old, and what about yourself?
#Person1#: I 

Vous pouvez voir que les suppositions du modèle ont un certain sens, mais il ne semble pas être sûr de la tâche qu'il est censé accomplir. On dirait qu'il se contente d'inventer la phrase suivante du dialogue.   

Le Prompt Engineering peut être utile dans ce cas.

<a name='3'></a>
## 3 - Résumer le dialogue à l'aide d'un Prompt d'instruction

<a name='3.1'></a>
### 3.1 - Inférence Zero Shot avec un Instruction Prompt

Pour demander au modèle d'effectuer une tâche - résumer un dialogue - vous pouvez prendre le dialogue et le convertir en un instruction prompt.   

Enveloppez le dialogue dans une instruction descriptive et voyez comment le texte généré va changer :

In [34]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

    # Input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    print(dash_line)
    print('Example ', i + 1)
    
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    
    print(dash_line)    
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')
    print("\n\n\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: Where to, miss?
#Person2#: Hi! Crenshaw and Hawthorne, at the Holiday Inn that is on that corner.
#Person1#: Sure thing. So, where are you flying in from?
#Person2#: From China.
#Person1#: Really? You don't look very Chinese to me, if you don't mind me saying so.
#Person2#: It's fine. I am actually from Mexico. I was in China on a business trip, visiting some local companies that manufacture bathroom products.
#Person1#: Wow sounds interesting! Excuse me if I am being a bit nosy but, how old are you?
#Person2#: Don't you know it's rude to ask a lady her age?
#Person1#: Don't get me wrong! It's just that you seem so young and already doing business overseas!
#Person2#: Well thank you! In that case, I am 26 years old

C'est beaucoup mieux ! Mais le modèle ne saisit toujours pas les nuances des conversations.

<a name='3.2'></a>
### 3.2 - Zero Shot Inference avec le Prompt Template de FLAN-T5

Utilisons un prompt légèrement différent. FLAN-T5 dispose de nombreux modèles de prompt publiés pour certaines tâches [ici](https://github.com/google-research/FLAN/tree/main/flan/v2).    

Dans le code suivant, vous utiliserez un des [prompts pré-construits de FLAN-T5](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py) :

In [35]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
        
    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
    
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')
    print("\n\n\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Dialogue:

#Person1#: Where to, miss?
#Person2#: Hi! Crenshaw and Hawthorne, at the Holiday Inn that is on that corner.
#Person1#: Sure thing. So, where are you flying in from?
#Person2#: From China.
#Person1#: Really? You don't look very Chinese to me, if you don't mind me saying so.
#Person2#: It's fine. I am actually from Mexico. I was in China on a business trip, visiting some local companies that manufacture bathroom products.
#Person1#: Wow sounds interesting! Excuse me if I am being a bit nosy but, how old are you?
#Person2#: Don't you know it's rude to ask a lady her age?
#Person1#: Don't get me wrong! It's just that you seem so young and already doing business overseas!
#Person2#: Well thank you! In that case, I am 26 years old, and what about yourself?
#

Remarquez que cette invite de FLAN-T5 a aidé un peu, mais qu'elle a encore du mal à saisir les nuances de la conversation. C'est ce que vous allez essayer de résoudre avec le few shot inference.

<a name='4'></a>
## 4 - Résumer un Dialogue avec One Shot et Few Shot Inference

**One shot et few shot inference** sont les pratiques consistant à fournir à un LLM un ou plusieurs exemples complets de paires prompt-réponse correspondant à votre tâche - avant votre prompt réel que vous souhaitez compléter.    

Cela s'appelle "l'apprentissage en contexte" et place votre modèle dans un état qui comprend votre tâche spécifique. Vous pouvez en lire plus à ce sujet dans [ce blog de Hugging Face](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api).

<a name='4.1'></a>
### 4.1 - One Shot Inference

Construisons une fonction qui prend une liste de `example_indices_full`, génère un prompt avec des exemples complets, puis à la fin ajoute le prompt que vous souhaitez que le modèle complète (`example_index_to_summarize`).   
Vous utiliserez le même template de prompt FLAN-T5 de la section [3.2](#3.2).

In [36]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""
    
    dialogue = dataset['test'][example_index_to_summarize]['dialogue']
    
    prompt += f"""
Dialogue:

{dialogue}

What was going on?
"""
        
    return prompt

Construct the prompt to perform one shot inference:

In [37]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also ne

Now pass this prompt to perform the one shot inference:

In [38]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')

print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


<a name='4.2'></a>
### 4.2 - Few Shot Inference

Explorons le few shot inference en ajoutant deux autres paires complètes dialogue-résumé à votre prompt.

In [39]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. 

Now pass this prompt to perform a few shot inference:

In [None]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

Dans ce cas, le few shot n'a pas apporté beaucoup d'amélioration par rapport au one shot inference. De plus, au-delà de 5 ou 6 shots, cela n'apportera généralement pas beaucoup d'améliorations non plus. Il faut également s'assurer de ne pas dépasser la longueur de contexte d'entrée du modèle qui, dans notre cas, est de 512 tokens. Tout ce qui dépasse la longueur de contexte sera ignoré.

Cependant, vous pouvez constater que fournir au moins un exemple complet (one shot) donne plus d'informations au modèle et améliore qualitativement le résumé global.

<a name='5'></a>
## 5 - Paramètres de configuration générative pour l'inférence

Vous pouvez modifier les paramètres de configuration de la méthode `generate()` pour voir une sortie différente du LLM. Jusqu'à présent, le seul paramètre que vous avez défini était `max_new_tokens=50`, qui définit le nombre maximal de tokens à générer. Une liste complète des paramètres disponibles se trouve dans la [documentation sur la génération de Hugging Face](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).

Une façon pratique d'organiser les paramètres de configuration est d'utiliser la classe `GenerationConfig`.

**Exercice :**

Modifiez les paramètres de configuration pour étudier leur influence sur la sortie.

En activant le paramètre `do_sample = True`, vous activez diverses stratégies de décodage qui influencent le prochain token à partir de la distribution de probabilité sur l'ensemble du vocabulaire. Vous pouvez ensuite ajuster les sorties en modifiant `temperature` et d'autres paramètres (comme `top_k` et `top_p`).

Décommentez les lignes dans la cellule ci-dessous et réexécutez le code. Essayez d'analyser les résultats. Vous pouvez lire quelques commentaires ci-dessous.

In [40]:
generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0], 
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.



Commentaires concernant le choix des paramètres dans la cellule de code ci-dessus :
- Choisir `max_new_tokens=10` rendra le texte de sortie trop court, ce qui coupera le résumé du dialogue.
- En mettant `do_sample = True` et en changeant la valeur de la température, vous obtiendrez plus de flexibilité dans la sortie.

Comme vous pouvez le constater, l'ingénierie des prompts peut vous emmener loin pour ce cas d'utilisation, mais il y a certaines limitations.     

Ensuite, vous commencerez à explorer comment vous pouvez utiliser le fine-tuning pour aider votre LLM à mieux comprendre un cas d'utilisation particulier en profondeur !