## **BLEU:** evaluates the quality of machine-generated translations by comparing them to one or more human-generated reference translations.

In this notebook, we essentially focus on the BLEU performance metric (more about this on the readme.md file).

### **Steps to run this Notebook:**

- **Step 1:** Download the libraries & Load the data
- **Step 2:** Prompt the text generative LLM - using the prompt given below
- **Step 3:** Adding the summary to the pandas df to execute results & download
- **Step 4:** Compress all in 1 function

### **Step 1:** Download the libraries & Load the data

In [2]:
# Importing Libraries
import nltk
import pandas as pd
from datasets import load_metric
from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction

In [3]:
# Load the dataset
xsum_sample = pd.read_csv("./content/dataset_sample_summaries_v2.csv")

In [4]:
print(xsum_sample.shape)
xsum_sample

(10, 3)


Unnamed: 0,document,summary,id
0,"Prison Link Cymru had 1,099 referrals in 2015-...","There is a ""chronic"" need for more housing for...",38264402
1,Officers searched properties in the Waterfront...,"A man has appeared in court after firearms, am...",34227252
2,"Jordan Hill, Brittany Covington and Tesfaye Co...",Four people accused of kidnapping and torturin...,38537698
3,The 48-year-old former Arsenal goalkeeper play...,West Brom have appointed Nicky Hammond as tech...,36175342
4,Restoring the function of the organ - which he...,The pancreas can be triggered to regenerate it...,39070183
5,But there certainly should be.\nThese are two ...,Since their impending merger was announced in ...,38899892
6,Media playback is not supported on this device...,"A ""medal at any cost"" approach created a ""cult...",39339718
7,It's no joke. But Kareem Badr says people did ...,Have you heard the one about the computer prog...,34571446
8,Relieved that the giant telecoms company would...,The reaction from BT's investors told us much ...,36892983
9,"""I'm really looking forward to it - the home o...",Manager Brendan Rodgers is sure Celtic can exp...,37732028


In [5]:
document_array = xsum_sample['document']
print(document_array)

0    Prison Link Cymru had 1,099 referrals in 2015-...
1    Officers searched properties in the Waterfront...
2    Jordan Hill, Brittany Covington and Tesfaye Co...
3    The 48-year-old former Arsenal goalkeeper play...
4    Restoring the function of the organ - which he...
5    But there certainly should be.\nThese are two ...
6    Media playback is not supported on this device...
7    It's no joke. But Kareem Badr says people did ...
8    Relieved that the giant telecoms company would...
9    "I'm really looking forward to it - the home o...
Name: document, dtype: object


### **Step 2:** Prompt the text generative LLM - using the prompt given below


**Query the text generating llm with the following prompt:** (copy the document as mentionned: PASTE_DOCUMENTS_HERE)

```
Please generate a summary in one line (max 25 words) for each of the following documents: PASTE_DOCUMENTS_HERE
```
```
, please just return the answer as the following: results={"generated_summary":["","","","",""]}
```

In [6]:
# This is the output from CHATGPT (as example, but we need to do it for all the generative models we are testing)
results={"generated_summary":["Prison Link Cymru handled 1,099 referrals in 2015-16, highlighting a critical need for housing ex-offenders to prevent homelessness and reduce incarceration costs.","Edinburgh police recovered firearms, ammunition, and money from two properties, arresting a 26-year-old man who appeared in court.","Four individuals charged with hate crimes in Chicago tortured a disabled victim, prompting significant public and legal reaction.","The former Arsenal goalkeeper and director of football helped West Brom achieve Premier League promotion in 2006 and 2012.","A study shows that a fasting-mimicking diet can regenerate pancreatic cells and potentially reverse diabetes symptoms in mice."]}

### **Step 3:** Adding the summary to the pandas df to execute results & download


In [7]:
opt_result = pd.DataFrame.from_dict(results).rename({"summary_text": "generated_summary"}, axis=1).join(pd.DataFrame.from_dict(xsum_sample))[["generated_summary", "summary", "document"]]
display(opt_result.head())

Unnamed: 0,generated_summary,summary,document
0,"Prison Link Cymru handled 1,099 referrals in 2...","There is a ""chronic"" need for more housing for...","Prison Link Cymru had 1,099 referrals in 2015-..."
1,"Edinburgh police recovered firearms, ammunitio...","A man has appeared in court after firearms, am...",Officers searched properties in the Waterfront...
2,Four individuals charged with hate crimes in C...,Four people accused of kidnapping and torturin...,"Jordan Hill, Brittany Covington and Tesfaye Co..."
3,The former Arsenal goalkeeper and director of ...,West Brom have appointed Nicky Hammond as tech...,The 48-year-old former Arsenal goalkeeper play...
4,A study shows that a fasting-mimicking diet ca...,The pancreas can be triggered to regenerate it...,Restoring the function of the organ - which he...


In [8]:
print("Generated Summary : ",opt_result.iloc[0]["generated_summary"])
print(30*"-")
print("Summary : ",opt_result.iloc[0]["summary"])

Generated Summary :  Prison Link Cymru handled 1,099 referrals in 2015-16, highlighting a critical need for housing ex-offenders to prevent homelessness and reduce incarceration costs.
------------------------------
Summary :  There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.


#### **Calculating the BLEU score:**

In [9]:

def calculate_bleu(data):
    # Tokenize reference summaries and generated summaries
    #references = data["summary"].apply(lambda ref: [ref.split()]).tolist()
    #hypotheses = data["generated_summary"].apply(lambda hyp: hyp.split()).tolist()
    # Load BLEU metric
    # bleu = load_metric("bleu")
    # Compute BLEU score
    # bleu_score = bleu.compute(predictions=hypotheses, references=references)
    # Using another blue metric, less severe avoids penalizing the model too much for small mistakes having a score of 0
    references = data["summary"].tolist()
    hypotheses = data["generated_summary"].tolist()
    # Calculate BLEU score with smoothing
    smoothie = SmoothingFunction().method7
    bleu_score = corpus_bleu(references, hypotheses, smoothing_function=smoothie)

    return bleu_score

In [15]:
from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction

def calculate_bleu(data):
    references = data["summary"].tolist()
    hypotheses = data["generated_summary"].tolist()
    smoothie = SmoothingFunction().method7
    bleu_score = corpus_bleu(references, hypotheses, smoothing_function=smoothie)
    return bleu_score

In [16]:
score_ret=calculate_bleu(opt_result)

In [21]:
score_ret

0.08600134727689304

In [22]:
model_name = "chat_gpt"

In [23]:
df = pd.DataFrame(columns=["model_name", "bleu_score"])
df.loc[0] = [model_name, score_ret]
df.to_csv(f"{model_name}.csv", index=False)

In [24]:
df = pd.read_csv(f"./results/{model_name}.csv")
print(df)


  model_name  bleu_score
0   chat_gpt    0.086001


### **Step 4:** Compress all in 1 function

In [25]:
import pandas as pd
import nltk
from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction

def calculate_and_export_bleu(model_name, results):
    # Extract reference summaries and generated summaries
    references = results["summary"].tolist()
    hypotheses = results["generated_summary"].tolist()

    # Calculate BLEU score with smoothing
    smoothie = SmoothingFunction().method7
    bleu_score = corpus_bleu(references, hypotheses, smoothing_function=smoothie,)

    # Create DataFrame with BLEU score and model name
    df = pd.DataFrame({
        "model_name": [model_name],
        "bleu_score": [bleu_score]
    })

    # Export to CSV
    df.to_csv(f"{model_name}.csv", index=False)

In [27]:
xsum_sample = pd.read_csv("./content/dataset_sample_summaries_v2.csv") # delete . if on colab
model_name = "chat_gpt"
# Generate the results by copy pasting the following prompt:
xsum_sample[['document']]
# Click on the icon next to *document* (convert this dataframe to an interactive table) - then select (right) copy table and select JSON and copy - paste the result in the cell below  replacing **PASTE_DOCUMENTS_HERE**
# Then copy the entire cell and prompt the LLM

Unnamed: 0,document
0,"Prison Link Cymru had 1,099 referrals in 2015-..."
1,Officers searched properties in the Waterfront...
2,"Jordan Hill, Brittany Covington and Tesfaye Co..."
3,The 48-year-old former Arsenal goalkeeper play...
4,Restoring the function of the organ - which he...
5,But there certainly should be.\nThese are two ...
6,Media playback is not supported on this device...
7,It's no joke. But Kareem Badr says people did ...
8,Relieved that the giant telecoms company would...
9,"""I'm really looking forward to it - the home o..."


In [None]:
# Please generate a summary in one line (max 25 words) for each of the following documents: PASTE_DOCUMENTS_HERE, please just return the answer as the following: results={"generated_summary":["","","","",""]}

In [28]:
# Example usage:
results={"generated_summary":[
    "Prison Link Cymru handled 1,099 referrals in 2015-16, highlighting a critical need for housing ex-offenders to prevent homelessness and reduce incarceration costs.","Edinburgh police recovered firearms, ammunition, and money from two properties, arresting a 26-year-old man who appeared in court.","Four individuals charged with hate crimes in Chicago tortured a disabled victim, prompting significant public and legal reaction.","The former Arsenal goalkeeper and director of football helped West Brom achieve Premier League promotion in 2006 and 2012.","A study shows that a fasting-mimicking diet can regenerate pancreatic cells and potentially reverse diabetes symptoms in mice."
    ]}

In [29]:
opt_result = pd.DataFrame.from_dict(results).rename({"summary_text": "generated_summary"}, axis=1).join(pd.DataFrame.from_dict(xsum_sample))[["generated_summary", "summary", "document"]]
calculate_and_export_bleu(model_name, opt_result)

In [30]:
df = pd.read_csv(f"/results/{model_name}.csv")
df

Unnamed: 0,model_name,bleu_score
0,chat_gpt,0.086001
