In [None]:
!pip install transformers datasets

In [None]:
!pip install torchdata torch

## Loading libraries
1. load_datasets
2. AutoModelForSeq2SeqLM
3. AutoTokenizer
4. GenerationConfig

In [None]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM # to load pretrained s2s model like T5 , BASRT etc
from transformers import AutoTokenizer # to tokenize
from transformers import GenerationConfig # to play with temperature ,top p topk,blah blah

## Loading dataset

In [None]:
from datasets import load_dataset
ds = load_dataset("knkarthick/dialogsum")

## Summarize Dialogue without Prompt Enginnering

using pre trained LLM **FLAN-T5** from Huggingface .abs
Other models of any category(Image ,video ,LLM) can be found https://huggingface.co/docs/transformers/index

### Visualising the dataset

In [None]:
ds

In [None]:
# Training 
train_dataset =ds["train"]
train_dataset


In [None]:
print("Number of data in training is" , len(train_dataset))

In [None]:
import random

def print_border_line():
    dash_line ="-".join("" for x in range(60))
    print(dash_line)

def print_random_conversation():
    rndm_convo = random.choice(train_dataset)
    dialogue = rndm_convo["dialogue"]
    summary = rndm_convo["summary"]
    print("Dialogue\n",dialogue)
    print_border_line()
    print("Summary\n",summary)

print_random_conversation()

just using pre trained model no fine tuning , so choosing any of train ,test.validation to do prompt engineering

In [None]:
example_indices=[40,200]

test_ds = ds["test"]

for i, index in enumerate(example_indices):
    print_border_line()
    print("Example Number ",i+1)
    print_border_line()
    print("Input Dialogue: ")
    print(test_ds[index]["dialogue"])
    print_border_line()
    print("BASELINE HUMAN SUMMARY:")
    print(test_ds[index]["summary"])
    print_border_line()
    print("\n\n")

Load the `FLAN-T5` model ,creating an instance of the `AutoModelForSeq2SeqLM` class with the .from_pretrained() method

### loading model FLAN-T5

In [None]:
model_name = "google/flan-t5-base"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

### loading Tokenization

To perform encoding and decoding , you need to work with text in a tokenized form. **Tokenizarion is the process of splitting texts into smaller units that can be pocessed by the LLM Model**.

In [None]:
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")


In [None]:

inputs = tokenizer("Hey my name is Priyam!", return_tensors="pt")
outputs = model.generate(**inputs)
print("Tokenized input: ")
print(outputs)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Decoded output: ")
print(decoded_output)

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. Prompt engineering is an act of a human changing the prompt (input) to improve the response for a given task.

In [None]:
for i , index in enumerate(example_indices):
    dialogue = test_ds[index]['dialogue']
    summary = test_ds[index]['summary']

    #1. tokenising the input sentence
    inputs = tokenizer(dialogue ,return_tensors ="pt")
    #2 . inferencing 
    model_predicted_tokens = model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )
    model_predicted_tokens=model_predicted_tokens[0]
    #3. Decoding the tokens that we got from the model
    output = tokenizer.decode(
        model_predicted_tokens,
        skip_special_tokens=True
    )

    print_border_line()
    print('Example ', i + 1)
    print_border_line()
    print(f'INPUT PROMPT:\n{dialogue}')
    print_border_line()
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print_border_line()
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')




Ouputs of the model makes some sense , but it does not seem to be sure what task it is supposed to do. 

Lets try see prompt engineering ...

## Summarize Dialogue with an Instruction Prompt

prompt engineering is an important concept in using foundation models for text generation . 

### Zero Shot inference with an Instruction prompt

Creating a sample zero shot inference Prompt

In [None]:
rndm_val =random.choice(test_ds)
dialogue=rndm_val["dialogue"]
summary = rndm_val["summary"]

prompt =f"""
Summarize thr following conversation.

{dialogue}

Summary:
    
"""
print(prompt)

In [None]:
for i , index in enumerate(example_indices):
    dialogue = test_ds[index]['dialogue']
    summary = test_ds[index]['summary']

    prompt= f"""
Summarize the following conversation.

{dialogue}

Summary:
    
    """

    
    
    inputs = tokenizer(prompt ,return_tensors ="pt")
    #2 . inferencing 
    model_predicted_tokens = model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )
    model_predicted_tokens=model_predicted_tokens[0]
    #3. Decoding the tokens that we got from the model
    output = tokenizer.decode(
        model_predicted_tokens,
        skip_special_tokens=True
    )


    print_border_line()
    print('Example ', i + 1)
    print_border_line()
    print(f'INPUT PROMPT:\n{dialogue}')
    print_border_line()
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print_border_line()
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

    



Much better! But the model still does not pick up on the naunce of the conversations though

Experiment with the prompt text and see how the inferences will be changed. Will the inferences change if you end the prompt with just empty string vs. Summary: ?
Try to rephrase the beginning of the prompt text from Summarize the following conversation. to something different - and see how it will influence the generated output.

### Zero shot inference with differebt Prompt Tempelate used

In [None]:
for i , index in enumerate(example_indices):
    dialogue = test_ds[index]['dialogue']
    summary = test_ds[index]['summary']

    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    
    
    inputs = tokenizer(prompt ,return_tensors ="pt")
    #2 . inferencing 
    model_predicted_tokens = model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )
    model_predicted_tokens=model_predicted_tokens[0]
    #3. Decoding the tokens that we got from the model
    output = tokenizer.decode(
        model_predicted_tokens,
        skip_special_tokens=True
    )


    print_border_line()
    print('Example ', i + 1)
    print_border_line()
    print(f'INPUT PROMPT:\n{dialogue}')
    print_border_line()
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print_border_line()
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

    



Notice that this prompt from FLAN-T5 did help a bit, but still struggles to pick up on the nuance of the conversation. This is what you will try to solve with the few shot inferencing.

### One Shot and Few Shot Inference

One shot and few shot inference are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task

Making prompt

In [None]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    
    # Loop through the few-shot examples
    for index in example_indices_full:
        dialogue = test_ds[index]["dialogue"]
        summary = test_ds[index]["summary"]

        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}
    
    """
    
    # Add the example that needs to be summarized
    dialogue = test_ds[example_index_to_summarize]["dialogue"]

    prompt += f"""
Dialogue:

{dialogue}

What was going on?

    """

    return prompt


In [None]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)

Prompt is Created , Now pass this promt to perform the `one shot inference`

In [None]:
summary = test_ds[example_index_to_summarize]["summary"]

#1. Tokenizing the inpit sentences
inputs =tokenizer(one_shot_prompt , return_tensors="pt")

#2. Passing it to the model
model_output_tokenized = model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )

output = tokenizer.decode(
    model_output_tokenized[0],
    skip_special_tokens=True
)
print_border_line()
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print_border_line()
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

### FEW SHOT INFERENCE

In [None]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)

In [None]:
summary = test_ds[example_index_to_summarize]["summary"]

#1. Tokenizing the inpit sentences
inputs =tokenizer(few_shot_prompt , return_tensors="pt")

#2. Passing it to the model
model_output_tokenized = model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )

output = tokenizer.decode(
    model_output_tokenized[0],
    skip_special_tokens=True
)
print_border_line()
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print_border_line()
print(f'MODEL GENERATION - FEW SHOT:\n{output}')


In this case, few shot did not provide much of an improvement over one shot inference. And, anything above 5 or 6 shot will typically not help much, either. Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens. Anything above the context length will be ignored

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Generative Configuration Params for Inference

Earlier we were using generate() method to get output from LLM , and using only one params named `max_new_tokens=50` ,which means the maximum nnumber of tokens generated by the LLM coming out of the decoder model .

A convenient way of organising the configuration params is to use `GenerationConfig` class

### Trying out different parameters

Currently here , I am enabling do_smaple =True and tweaking temperature value

Putting the parameter do_sample = True, you activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing temperature and other parameters (such as top_k and top_p).

In [None]:
# generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

In [None]:
inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0], 
    skip_special_tokens=True
)

# print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
# print(dash_line)
print("----------------------------------")
print(f'MODEL GENERATION - FEW SHOT:\n{output}')