# Fine Tuning Large Language Model - Prompts

In this workshop, you will learn how to fine tune the prompts and the LLMs to enhance and improves its response.

## Loading and Exploring the dataset

In this workshop we will be using [<code>knkarthick/dialogsum</code>](https://huggingface.co/datasets/knkarthick/dialogsum) dataset from [HuggingFace](https://huggingface.co/). The dataset contains manually labelled summary and topic.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig

In [2]:
# TODO: Load and explore the following datasets
# Q: Number of sets? 
# Q: How many records in each of these sets?
# Q: What are the column names?

dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(dataset_name)

print(dataset)
print(dataset.shape)

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})
{'train': (12460, 4), 'validation': (500, 4), 'test': (1500, 4)}


In [3]:
idx = 100 

for k, v in dataset['train'][idx].items():
   print(f'{k.upper()}\n{v}\n')

ID
train_100

DIALOGUE
#Person1#: I have a problem with my cable.
#Person2#: What about it?
#Person1#: My cable has been out for the past week or so.
#Person2#: The cable is down right now. I am very sorry.
#Person1#: When will it be working again?
#Person2#: It should be back on in the next couple of days.
#Person1#: Do I still have to pay for the cable?
#Person2#: We're going to give you a credit while the cable is down.
#Person1#: So, I don't have to pay for it?
#Person2#: No, not until your cable comes back on.
#Person1#: Okay, thanks for everything.
#Person2#: You're welcome, and I apologize for the inconvenience.

SUMMARY
#Person1# has a problem with the cable. #Person2# promises it should work again and #Person1# doesn't have to pay while it's down.

TOPIC
cable



In [4]:
# TODO: Write a prompt to summarize the dialogue from the training dataset. Use the google/flan-t5-base LLM.  

model_name = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

## Tuning the prompt

### Zero-Shot prompt

A instruction prompt for the LLM to perform a task without having seen any examples. 

Characteristics of zero-shot prompts include
- <b>No provided examples</b> The prompt does not include any explicit examples of the desired input or output
- <b>Direct instruction</b> The prompt describes what is expected of the LLM
- <b>Relies on pre-training</b> Relies only on the data that the LLM is trained on

In [None]:
# TODO: Write a Zero-Shot prompt 



In [None]:
# TODO: Write a Zero Shot prompt using Flan prompt template



### One and Few Short Prompt

Refers to the prompting technique where the model is given 1 example of the task. The example is used to help the LLM to understand the task's requirements and to generate appropriate response.

Characteristics of a one-shot prompt include
- <b>Single example</b> - The prompt includes a single input/output pair example
- <b>Instruction and query</b> - The provided example will be followed by a query that the LLM needs to respond to in the same format as the example
- <b>Task guidance</b> - The example serves to guide the LLM to give the desired response

FLAN uses the following template format for one/few shot prompts
```
<input_prefix>
{example_0_inputs}
\n\n
<target_prefix>
{example_0_targets}
\n\n\n
<input_prefix>
{example_1_inputs}
\n\n
<target_prefix>
{example_1_targets}
\n\n\n
<input_prefix>
{actual_inputs}
\n\n
<target_prefix>
```

In [7]:
# Function to create examplars
def mk_examplars(dataset, idxs):
   prompt = ""
   for _, i in enumerate(idxs):
      dialogue = dataset[i]['dialogue']
      summary = dataset[i]['summary']
      #summary = dataset['train'][i]['topic']
      prompt += f"Summarize this article:\n\n{dialogue}\n\nSummary:\n{summary}\n\n\n"
   return prompt

In [8]:
# TODO: 
idxs = [ 10, 20, 30, 40 ]
prompt = mk_examplars(dataset['validation'], idxs)

print(prompt)




Summarize this article:

#Person1#: I am tired of everything in my life.
#Person2#: What? How happy you life is! I do envy you.
#Person1#: You don't know that I have been over-protected by my mother these years. I am really about to leave the family and spread my wings.
#Person2#: Maybe you are right.

Summary:
#Person1# feels tired because of #Person1#'s mother's over-protection.


Summarize this article:

#Person1#: Did you know that drinking beer helps you sing better?
#Person2#: Are you sure? How do you know?
#Person1#: Well, usually people think I'm a terrible singer, but after we all have a few beers, they say I sound a lot better!
#Person2#: Well, I heard that if you drink enough beer, you can speak foreign languages better. . .
#Person1#: Then after a few beers, you'll be singing in Taiwanese?
#Person2#: Maybe. . .

Summary:
#Person1# says drinking beer helps sing better, but #Person2# heard it helps speaking foreign languages.


Summarize this article:

#Person1#: We've been c

In [20]:
i = 10
dialogue = dataset['test'][i]['dialogue']
our_prompt = f"Summarize this article:\n\n{dialogue}\n\nSummary:"

print(our_prompt)

Summarize this article:

#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday

Summary:


In [21]:
final_prompt = prompt + our_prompt
print(final_prompt)

Summarize this article:

#Person1#: I am tired of everything in my life.
#Person2#: What? How happy you life is! I do envy you.
#Person1#: You don't know that I have been over-protected by my mother these years. I am really about to leave the family and spread my wings.
#Person2#: Maybe you are right.

Summary:
#Person1# feels tired because of #Person1#'s mother's over-protection.


Summarize this article:

#Person1#: Did you know that drinking beer helps you sing better?
#Person2#: Are you sure? How do you know?
#Person1#: Well, usually people think I'm a terrible singer, but after we all have a few beers, they say I sound a lot better!
#Person2#: Well, I heard that if you drink enough beer, you can speak foreign languages better. . .
#Person1#: Then after a few beers, you'll be singing in Taiwanese?
#Person2#: Maybe. . .

Summary:
#Person1# says drinking beer helps sing better, but #Person2# heard it helps speaking foreign languages.


Summarize this article:

#Person1#: We've been c

In [22]:
enc_final_prompt = tokenizer(final_prompt, return_tensors='pt')
compl = model.generate(enc_final_prompt['input_ids'], max_new_tokens=500)
dec_compl = tokenizer.decode(compl[0], skip_special_tokens=True)

original_summary = dialogue = dataset['test'][i]['summary']

print(original_summary)

print(dec_compl)

#Person1# attends Brian's birthday party. Brian thinks #Person1# looks great and charming.
#Person1#: Happy Birthday, Brian. #Person2#: I'm so happy you remember. #Person1#: This is really wonderful party. #Person2#: You look very pretty today. #Person1#: Thanks, that's very kind of you to say. #Person2#: You look great, you are absolutely glowing. #Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday.


In [23]:
enc_our_prompt = tokenizer(our_prompt, return_tensors='pt')
compl_zero = model.generate(enc_our_prompt['input_ids'], max_new_tokens=500)
dec_compl_zero = tokenizer.decode(compl_zero[0], skip_special_tokens=True)

print(dec_compl_zero)

#Person1#: Happy Birthday, Brian. #Person2#: I'm so happy you remember. #Person1#: This is really wonderful party. #Person2#: You look very pretty today. #Person1#: Thanks, that's very kind of you to say. #Person2#: You look great, you are absolutely glowing. #Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday.
