# Workshop topic:Text Generation, Summarisation and Prompting

### Introduction to Text Generation using GPT-2

Text generation stands as one of the most useful applications of Natural Language Processing.
Decode based GPT is the still the SOTA of text generation and other NLP tasks. 
Although we cannot use the latest GPT-4, we are going to use GPT-2, which is a smaller pre-trained model that can be run with limited resources.

## Activity 1: GPT-2 Open Text Generation

Work on this activity is groups (one at each table)

1. Review the following code to understand its working
2. Think of a few more prompting examples, and generate texts using them
3. Open ChatGPT-3.5 and use the same examples to generate texts
4. Compare GPT-2 with GPT-3.5 generation applying human evaluation criteria discussed in Lecture 11. Apply scoring from 1 to 5 for each criteria, add scored together to comare the models.
    fluency
    coherence / consistency
    factuality and correctness
    commonsense
    style / formality
    grammaticality
    typicality (what type of something, exemplars etc.)
    redundancy
5. Discuss your findings in the class. What are the variations between different groups in the class in evaluating texts?

**Explanation of code:**

    Tokenizer Initialization: The code initializes a GPT-2 tokenizer (tokenizer) to preprocess text inputs. Tokenizers break down input text into tokens, which are numerical representations used by the model.

    Model Initialization: The GPT-2 model (model) is loaded. This model is a pre-trained neural network that has learned to predict the next word in a sequence given some context.

    Maximum Length: max_length is set to control the length of the generated text. This prevents the model from generating excessively long responses.

    Input Prompt: The prompt variable contains the initial snippet of text provided to the model for text generation.

    Encoding the Input: The encode() method of the tokenizer converts the input prompt into token IDs (input_ids). These token IDs are the numerical representations of the input text.

    Text Generation: The generate() method of the GPT-2 model generates text based on the input token IDs (input_ids). The do_sample=True parameter allows for sampling from the model's predicted probability distribution, adding randomness to the generated text.

    Decoding the Output: The decode() method of the tokenizer converts the generated token IDs (output_ids) back into text, excluding any special tokens such as padding or separator tokens.

    Printing the Output: The generated text (output_text) is printed to the console for visualization.

This code demonstrates the process of using GPT-2 for text generation based on an initial prompt, providing participants with a hands-on understanding of how the model operates.

In [4]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load the GPT-2 tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Set the maximum length of the generated text
max_length = 100

# Define the input prompt
prompt = "Calculate square root of 2"

# Encode the input prompt using the tokenizer
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate the text using the GPT-2 model
output_ids = model.generate(input_ids=input_ids, max_length=max_length, do_sample=True)

# Decode the generated text using the tokenizer
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Print the generated text
print(output_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Calculate square root of 2.05 in order to calculate: [1.05 + 2.8 + 0.9] / 2 = 2.0 * 3.9 * 4.9 * 5.9 = 3.3 = 5.0

The above equation gives the final solution:

Using the current values presented here. Also note that any value greater than 3.9% is considered off by the algorithm and not in the final block chain.

The


## Activity 2: ChatGPT Prompting

Work on this activity in groups (one at each table)

### Example of Few Shot Prompting

1. Try this example using zero-shot pronpting first. Your prompt would be:
"Convert the following plain English sentence into formal legal language: If you break the rules, you might get kicked out."
2. Note the response.
3. Now enter the following few-shot prompting and compare the response with the zero-shot. Did you get better response?


-------- prompt -----------------

Convert the following plain English sentences into formal legal language.

**Example 1:**  
**Input:** You must return the rental car by 5 PM.  
**Output:** The renter shall return the vehicle no later than 5:00 PM on the agreed-upon date.

**Example 2:**  
**Input:** We can cancel the contract if you don't pay on time.  
**Output:** The agreement may be terminated by the first party in the event of a failure by the second party to render payment in a timely manner.

**Example 3:**  
**Input:** You're not allowed to share this document with anyone.  
**Output:** Disclosure of this document to any third party is strictly prohibited.

**Now your turn:**  
**Input:** If you break the rules, you might get kicked out.  
**Output:**

----------- end of prompt ----------------------

4. Now think of a harder example that zero-shot might have trouble with. Try it and note results.

Note: Maybe for these relatively simple examples it would not make make much difference, but you got the idea how it works.


### Example of Chain-of-Thought Prompting

CoT prompting us used for complex reasoning tasks, for example, tasks with many constraints.

Here is an example of CoT prompting to solve tasks with contraints. 
1. First, enter just the last task (scroll down) as zero-shot starting with "Here is the task for you:". Note the result. Was the task solved correctly?
2. Next enther the following CoT prompt and again, note the result. Did ChatGPT solve your task correctly? Did it follow your example closely? 
3. Think of another example of possibly harder task (maybe 4 constraints) and follow points 1 and 2.

------------- prompt starts here ----------------


**Question:**  
Three friends — Alice, Bob, and Carol — are sitting in a row of three chairs.  
- Alice does not want to sit next to Bob.  
- Bob must sit to the left of Carol.  
- Carol is not sitting in the middle.

Where is everyone sitting?

Let's think step by step:

1. First, let's list all the possible permutations of the three people:  
   ABC, ACB, BAC, BCA, CAB, CBA.

2. Now apply the constraint: **Bob must sit to the left of Carol.**  
   
   For Bob to be to the left of Carol, he must appear **before** Carol in the order.

   Let’s check each:
   - **ABC** → B is before C (valid)  
   - **ACB** → B is after C (invalid  
   - **BAC** → B is before C (valid)
   - **BCA** → B is before C (valid)
   - **CAB** → B is after C (invalid)
   - **CBA** → B is after C (invalid) 

   Valid ones: ABC, BAC, BCA

3. Next constraint: **Carol is not in the middle.**

   From our remaining options:
   - **ABC** → Carol is in position 3 (valid)  
   - **BAC** → Carol is in position 3 (valid) 
   - **BCA** → Carol is in position 2 (invalid) 

   BCA is eliminated. Remaining: ABC, BAC

4. Now apply: **Alice does not want to sit next to Bob.**

   - **ABC** → Alice is position 1, Bob is 2 → next to each other (invalid)   
   - **BAC** → Bob is 1, Alice is 2 → also next to each other (invalid) 

Let’s re-express the constraints more carefully:
- Alice ≠ adjacent to Bob  
- Bob < Carol in position  
- Carol ≠ middle seat

Try all permutations again but check **all constraints** one by one:

| Permutation | Bob < Carol? | Carol ≠ Middle? | Alice ≠ Next to Bob? | Valid? |
|-------------|---------------|------------------|------------------------|----------|
| ABC         | Yes           | Yes (Carol is 3) | No (Alice next to Bob) | (invalid) |
| ACB         | No            | No (Carol is 2)  | Yes                    | (invalid) |
| BAC         | Yes           | Yes (Carol is 3) | No                     | (invalid) |
| BCA         | Yes           | No (Carol is 2)  | Yes                    | (invalid) |
| CAB         | No            | Yes (Carol is 1) | Yes                    | (invalid) |
| CBA         | No            | Yes (Carol is 1) | No                     | (invalid) |

All are invalid!  
So the conclusion is:

There is no valid seating arrangement that satisfies all three constraints.

Now your turn.

Here is the task for you: 

I have to schedule three meetings in one day: Meeting A, meeting B and meeting C.
Meeting A cannot happen before meeting B
Meeting C cannot be the first
Meeting B Cannot be the last
Meetings A and B cannot be scheduled next to each other.

Find all possible schedules of these meetings. 


------------ end of prompt ----------------


## Activity 3: BART Text Summarisation

The following code can summarise text using Bart and trasformer pipeline.

1. Review the example code below
2. In the second cell of code, implement article summarisation, both abstractive and extractive, from given short news. 
3. Compare these two types of summarisation using ROUGE, as well as human evaluation as in Activity 2.
4. Answer the following questions:
    
    a. Which type of summarisation generally gives better ROUGE score?
    
    b. Which type of summarisation generally gives better human score?
    
 Discuss the results in the class. If you find these articles hard to assess the quality of summarisation, you can use some articles from your assignment 2, but need to provide a reference summary.

In [5]:
# Text summarisation example

!pip install rouge

import torch
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import pipeline
from rouge import Rouge

# Load the BART tokenizer and model for abstractive summarization
tokenizer_abstractive = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model_abstractive = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

# Load the pipeline for extractive summarization
pipeline_extractive = pipeline('summarization')

# Define the input text
input_text = "The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing."

# Define the target summary
target_summary = "The quick brown fox jumps over the lazy dog. This is a test sentence for summarization."

# Perform abstractive summarization using BART
inputs = tokenizer_abstractive([input_text], max_length=1024, truncation=True, padding='max_length', return_tensors='pt')
outputs = model_abstractive.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], max_length=60, num_beams=4, length_penalty=2.0)
summary_abstractive = tokenizer_abstractive.decode(outputs[0], skip_special_tokens=True)

# Perform extractive summarization using pipeline
summary_extractive = pipeline_extractive(input_text, max_length=60)[0]['summary_text']

# Evaluate the summaries using the ROUGE metric
rouge = Rouge()
scores_abstractive = rouge.get_scores(summary_abstractive, target_summary)
scores_extractive = rouge.get_scores(summary_extractive, target_summary)

# Print the summaries and ROUGE scores
print("Input Text: ", input_text)
print("Target Summary: ", target_summary)
print("Abstractive Summary: ", summary_abstractive)
print("ROUGE Scores for Abstractive Summary: ", scores_abstractive)
print("Extractive Summary: ", summary_extractive)
print("ROUGE Scores for Extractive Summary: ", scores_extractive)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Your max_length is set to 60, but your input_length is only 28. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=14)


Input Text:  The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing.
Target Summary:  The quick brown fox jumps over the lazy dog. This is a test sentence for summarization.
Abstractive Summary:  The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing. The quick brownfox jumps over a lazy dog to get to the other side of the road. The lazy dog is the one who falls over the quick
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 1.0, 'p': 0.5517241379310345, 'f': 0.7111111065283952}, 'rouge-2': {'r': 1.0, 'p': 0.3488372093023256, 'f': 0.5172413754756243}, 'rouge-l': {'r': 1.0, 'p': 0.5517241379310345, 'f': 0.7111111065283952}}]
Extractive Summary:   The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing. Here are another sentences for summarizing. The test sentence is a shor

In [6]:
# Text summarisation example

# !pip install rouge
import pandas as pd
import torch
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import pipeline
from rouge import Rouge

# Load the CNN/DailyMail dataset
df = pd.read_csv('./daily_cnn.csv')

# Load the BART tokenizer and model for abstractive summarization
tokenizer_abstractive = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model_abstractive = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

# Load the pipeline for extractive summarization
pipeline_extractive = pipeline('summarization')

for index, row in df.iterrows():
    # Extract the article and the reference summary
    input_text = row['article']
    target_summary = row['highlights']

    # Perform abstractive summarization using BART
    inputs = tokenizer_abstractive([input_text], max_length=1024, truncation=True, padding='max_length', return_tensors='pt')
    outputs = model_abstractive.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], max_length=60, num_beams=4, length_penalty=2.0)
    summary_abstractive = tokenizer_abstractive.decode(outputs[0], skip_special_tokens=True)

    # Perform extractive summarization using pipeline
    summary_extractive = pipeline_extractive(input_text, max_length=60)[0]['summary_text']

    # Evaluate the summaries using the ROUGE metric
    rouge = Rouge()
    scores_abstractive = rouge.get_scores(summary_abstractive, target_summary)
    scores_extractive = rouge.get_scores(summary_extractive, target_summary)

    # Print the summaries and ROUGE scores
    print("Input Text: ", input_text)
    print("Target Summary: ", target_summary)
    print("Abstractive Summary: ", summary_abstractive)
    print("ROUGE Scores for Abstractive Summary: ", scores_abstractive)
    print("Extractive Summary: ", summary_extractive)
    print("ROUGE Scores for Extractive Summary: ", scores_extractive)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Your max_length is set to 60, but your input_length is only 53. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=26)


Input Text:  The United States is the third most populous country in the world, with over 330 million people. It is a diverse country, with a wide range of ethnic and cultural backgrounds. The most populous state in the US is California, with over 39 million people.
Target Summary:  The United States is the third most populous country in the world.
Abstractive Summary:  The United States is the third most populous country in the world, with over 330 million people. It is a diverse country, with a wide range of ethnic and cultural backgrounds. The most populous state in the US is California, withOver 39 million people in the state.
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 0.9090909090909091, 'p': 0.3125, 'f': 0.4651162752623039}, 'rouge-2': {'r': 0.9090909090909091, 'p': 0.23809523809523808, 'f': 0.3773584872766109}, 'rouge-l': {'r': 0.9090909090909091, 'p': 0.3125, 'f': 0.4651162752623039}}]
Extractive Summary:   The United States is the third most populous country in 

Your max_length is set to 60, but your input_length is only 56. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)


Input Text:  The European Union is a political and economic union of 27 member states located primarily in Europe. It was formed in the aftermath of World War II with the aim of promoting peace and economic prosperity among its members. The euro is the official currency of 19 of the member states.
Target Summary:  The European Union is a political and economic union of 27 member states.
Abstractive Summary:  The European Union is a political and economic union of 27 member states located primarily in Europe. It was formed in the aftermath of World War II with the aim of promoting peace and economic prosperity among its members. The euro is the official currency of 19 of the member states.
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 1.0, 'p': 0.35135135135135137, 'f': 0.5199999961520001}, 'rouge-2': {'r': 1.0, 'p': 0.25, 'f': 0.39999999680000003}, 'rouge-l': {'r': 1.0, 'p': 0.35135135135135137, 'f': 0.5199999961520001}}]
Extractive Summary:   The European Union is a politi

Your max_length is set to 60, but your input_length is only 51. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=25)


Input Text:  China is the most populous country in the world, with over 1.4 billion people. It is also the second-largest economy in the world, after the United States. The capital of China is Beijing, and the official language is Mandarin.
Target Summary:  China is the most populous country in the world.
Abstractive Summary:  China is the most populous country in the world, with over 1.4 billion people. The capital of China is Beijing, and the official language is Mandarin. It is also the second-largest economy in the World, after the United States. The country's economy is the second largest
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 0.875, 'p': 0.21212121212121213, 'f': 0.3414634114931589}, 'rouge-2': {'r': 0.875, 'p': 0.16279069767441862, 'f': 0.27450980127643215}, 'rouge-l': {'r': 0.875, 'p': 0.21212121212121213, 'f': 0.3414634114931589}}]
Extractive Summary:   China is the most populous country in the world, with over 1.4 billion people . The capital of China is Be

Your max_length is set to 60, but your input_length is only 58. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=29)


Input Text:  The human brain is the most complex organ in the body, consisting of over 100 billion neurons and trillions of connections between them. It controls everything we do, from our thoughts and emotions to our movements and senses. The brain is divided into different regions that are responsible for different functions.
Target Summary:  The human brain is the most complex organ in the body.
Abstractive Summary:  The human brain is the most complex organ in the body, consisting of over 100 billion neurons and trillions of connections between them. It controls everything we do, from our thoughts and emotions to our movements and senses. The brain is divided into different regions that are responsible for different functions.
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 0.9, 'p': 0.21428571428571427, 'f': 0.34615384304733726}, 'rouge-2': {'r': 0.9, 'p': 0.1836734693877551, 'f': 0.30508474294742893}, 'rouge-l': {'r': 0.9, 'p': 0.21428571428571427, 'f': 0.34615384304733

Your max_length is set to 60, but your input_length is only 57. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)


Input Text:  The International Space Station (ISS) is a habitable artificial satellite that orbits the Earth. It is a joint project between five space agencies, including NASA, Roscosmos, and the European Space Agency. The ISS is used for scientific research, space exploration, and international cooperation.
Target Summary:  The International Space Station (ISS) is a habitable artificial satellite that orbits the Earth.
Abstractive Summary:  The International Space Station (ISS) is a habitable artificial satellite that orbits the Earth. It is a joint project between five space agencies, including NASA, Roscosmos, and the European Space Agency. The ISS is used for scientific research, space exploration, and international cooperation.
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 1.0, 'p': 0.4, 'f': 0.5714285673469389}, 'rouge-2': {'r': 1.0, 'p': 0.3170731707317073, 'f': 0.48148147782578876}, 'rouge-l': {'r': 1.0, 'p': 0.4, 'f': 0.5714285673469389}}]
Extractive Summary:   The

Your max_length is set to 60, but your input_length is only 54. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)


Input Text:  The Eiffel Tower is an iconic landmark in Paris, France, and one of the most recognizable structures in the world. It was built for the 1889 World's Fair and stands over 300 meters tall. The tower is visited by millions of tourists every year.
Target Summary:  The Eiffel Tower is an iconic landmark in Paris, France.
Abstractive Summary:  The Eiffel Tower is an iconic landmark in Paris, France. It was built for the 1889 World's Fair and stands over 300 meters tall. The tower is visited by millions of tourists every year. It is one of the most recognizable structures in the world, and is a UNESCO
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 1.0, 'p': 0.2564102564102564, 'f': 0.40816326205747605}, 'rouge-2': {'r': 1.0, 'p': 0.1875, 'f': 0.31578947102493077}, 'rouge-l': {'r': 1.0, 'p': 0.2564102564102564, 'f': 0.40816326205747605}}]
Extractive Summary:   The Eiffel Tower is one of the most recognizable structures in the world . It was built for the 1889 World's Fa