# Introduction

The goal of this project is to test how good different models are at generating summaries.  
A good summary should keep the most important information from the text while staying short and easy to read.  

To check this, we compare the summaries from several models (like **BART**, **LED**, **T5**, **Pegasus**, **GPT-2**, and **Phi-3**) with:  
1. **A human-made summary** – to see how close the models are to what a person would write.  
2. **The original text** – to see how much important information the models actually keep.  

We use two types of metrics:  
- **BERTScore** → measures similarity in meaning between sentences.  
- **ROUGE (1, 2, L)** → measures word overlap (how many words or sequences of words are shared).  

By comparing both perspectives, we can figure out:  
- Which models create the most human-like summaries.  
- Which models best capture the key ideas from the original text.  

This helps us better understand the strengths and weaknesses of each summarization model.


### Import libraries and display settings
Import the necessary libraries and adjust pandas display options.

In [34]:
import pandas as pd
from datasets import load_dataset, DatasetDict, Dataset

#from evaluate import load
from rouge_score import rouge_scorer

from bert_score import BERTScorer

pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

### Import data
Import the required data for grading.
There are two df_predict, one for comparing the untrained (not fine-tuned) models, the other to compare the trained (fine-tuned) models that were selected as the best based on the results from the untrained models.\
I choosed the 3 models performing the best according to the bertscore, those were phi-3, bart-large-cnn and flan-t5-base (phi-3 wasn't fine-tuned because of a lack of resources for training)

In [35]:
df_test = pd.read_json('data/lotm_clean_dataset.json')

df_predict = pd.read_json('data/model_comparison.json') #untrained models

#df_predict = pd.read_json('data/Trained_model_comparison.json') #trained models

### Prepare DataFrames for scoring summaries

Create a DataFrame (df_benchmark) to store evaluation scores (ROUGE and BERTScore) between generated summaries and the originals texts.

In [36]:
### df for comparing summaries with text
df_benchmark = pd.DataFrame()

df_benchmark['num_chp'] = df_predict['num_chp']
df_benchmark['model'] = df_predict['model']
df_benchmark['predicted_summary'] = df_predict['summary']

for index, row in df_test[df_test['num_chp'].isin(df_predict['num_chp'])].iterrows():
    df_benchmark.loc[len(df_benchmark)] = [row['num_chp'],"handmade_summary",row['summary']]

df_benchmark['original_text'] = df_benchmark.apply(lambda row: df_test.iloc[row.num_chp]["text"], axis=1)
print(df_benchmark)

    num_chp             model                                  predicted_summary                                      original_text
0         2       gpt2-medium   to find out if he could fire the revolver aga...  After confirming his plan, Zhou Mingrui immedi...
1      1379       gpt2-medium   it was still not enough.\n"You think that I w...  More than a thousand Amons each committed "The...
2      1347       gpt2-medium  \n"I'll do everything in my power to help you,...  Blue Mountain Island, within a primitive fores...
3      1238       gpt2-medium   still know the reason."\nCattleya nodded and ...  Upon hearing Cattleya's words, Queen Mystic Be...
4       784       gpt2-medium   to see if she could see anything.\nShe saw a ...  100 Böklund Street, in a corner of the garden ...
..      ...               ...                                                ...                                                ...
67     1117  handmade_summary  As the flames flicker, Derrick and company en

In [37]:
original_text = df_benchmark["original_text"].tolist()
generated_summary = df_benchmark["predicted_summary"].tolist()

Create a DataFrame (df_grading) to store evaluation scores (ROUGE and BERTScore) between generated summaries and the handmade summaries.

In [38]:
### df for comparing generated summaries with handmade summary
df_grading = pd.DataFrame()

df_grading['num_chp'] = df_predict['num_chp']
df_grading['model'] = df_predict['model']
df_grading['predicted_summary'] = df_predict['summary']
df_grading['expected_summary'] = df_grading.apply(lambda row: df_test.iloc[row.num_chp]["summary"], axis=1) 

In [39]:
predicted_summary = df_grading["predicted_summary"].tolist()
expected_summary = df_grading["expected_summary"].tolist()

### Compute scores on both DataFrames

Implement Bertscore on both DataFrames

In [40]:
scorer = BERTScorer(lang="en", rescale_with_baseline=False)

P, R, F1 = scorer.score(predicted_summary, expected_summary)

# Add results to DataFrame
df_grading["bertscore_P"] = P.tolist()
df_grading["bertscore_R"] = R.tolist()
df_grading["bertscore_F1"] = F1.tolist()

P_b, R_b, F1_b = scorer.score(original_text, generated_summary)

# Add results to DataFrame
df_benchmark["bertscore_P"] = P_b.tolist()
df_benchmark["bertscore_R"] = R_b.tolist()
df_benchmark["bertscore_F1"] = F1_b.tolist()

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Implement Rougescore on both DataFrames


In [41]:
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

In [42]:
def compute_rouge(row):
    scores = scorer.score(
        row["expected_summary"],   # reference
        row["predicted_summary"]   # candidate
    )
    return (
        scores["rouge1"].fmeasure,
        scores["rouge2"].fmeasure,
        scores["rougeL"].fmeasure,
    )

# Apply row by row
df_grading[["rouge1", "rouge2", "rougeL"]] = df_grading.apply(
    compute_rouge, axis=1, result_type="expand"
)

In [43]:
def compute_rouge_for_benchmark(row):
    scores = scorer.score(
        row['original_text'],   # reference
        row['predicted_summary']   # candidate
    )
    return (
        scores["rouge1"].fmeasure,
        scores["rouge2"].fmeasure,
        scores["rougeL"].fmeasure,
    )

# Apply row by row
df_benchmark[["rouge1", "rouge2", "rougeL"]] = df_benchmark.apply(
    compute_rouge_for_benchmark, axis=1, result_type="expand"
)

### Display grading results

In [44]:
df_global_score = df_grading.groupby(["model"]).agg(
    bertscore_P=('bertscore_P', 'mean'),
    bertscore_R=('bertscore_R', 'mean'),
    bertscore_F1=('bertscore_F1', 'mean'),
    rouge1=('rouge1', 'mean'),
    rouge2=('rouge2', 'mean'),
    rougeL=('rougeL', 'mean')
)
print("score : summary vs handmade summary")
print(df_global_score)
print("")

score : summary vs handmade summary
                                    bertscore_P  bertscore_R  bertscore_F1    rouge1    rouge2    rougeL
model                                                                                                   
allenai/led-base-16384                 0.809921     0.817721      0.813758  0.277321  0.035704  0.139669
facebook/bart-large-cnn                0.813301     0.819481      0.816366  0.305096  0.038361  0.154470
google/flan-t5-base                    0.802704     0.812035      0.807326  0.270229  0.030472  0.138510
google/pegasus-xsum                    0.766373     0.802612      0.783905  0.227252  0.026955  0.143147
gpt2-medium                            0.797691     0.805384      0.801427  0.222925  0.023282  0.124996
microsoft/phi-3-mini-128k-instruct     0.845819     0.826555      0.836044  0.275925  0.048339  0.146655
t5-large                               0.736195     0.793587      0.763597  0.154104  0.018911  0.103453



In [45]:
df_benchmark_score = df_benchmark.groupby(["model"]).agg(
    bertscore_P=('bertscore_P', 'mean'),
    bertscore_R=('bertscore_R', 'mean'),
    bertscore_F1=('bertscore_F1', 'mean'),
    rouge1=('rouge1', 'mean'),
    rouge2=('rouge2', 'mean'),
    rougeL=('rougeL', 'mean')
)
print("score : summary vs text")
print(df_benchmark_score)
print("")

score : summary vs text
                                    bertscore_P  bertscore_R  bertscore_F1    rouge1    rouge2    rougeL
model                                                                                                   
allenai/led-base-16384                 0.806687     0.815557      0.811064  0.146817  0.040200  0.077072
facebook/bart-large-cnn                0.807059     0.816831      0.811905  0.146754  0.039599  0.076553
google/flan-t5-base                    0.796246     0.803600      0.799895  0.116140  0.027620  0.066004
google/pegasus-xsum                    0.786850     0.769717      0.778068  0.120223  0.029950  0.074981
gpt2-medium                            0.796492     0.808278      0.802274  0.129006  0.028298  0.071833
handmade_summary                       0.804837     0.822709      0.813660  0.118834  0.027364  0.065481
microsoft/phi-3-mini-128k-instruct     0.797912     0.832487      0.814800  0.072698  0.021526  0.045544
t5-large                       

We display below the summaries generated by the different models of the first text of the dataset

In [46]:

from IPython.display import display, Markdown
#take a look at the summaries of the first text
test_chp = min(df_predict['num_chp'])

my_df = df_predict[df_predict['num_chp']==test_chp]

print(f"Expected summary of chapter {test_chp}: ")
display(Markdown(df_test[df_test["num_chp"] == test_chp]["summary"].tolist()[0]))
print("")

for model_ in my_df['model']:
    print(f"--{model_} summary of chapter {test_chp}--")
    display(Markdown(my_df.loc[my_df['model']== model_, 'summary'].iloc[0]))
    print("")

Expected summary of chapter 2: 


Zhou Mingrui's gaze falls upon a dessicated corpse-like reflection in the dressing mirror, leaving him reeling in fear. Despite his initial terror, he tries to calm himself and inspects his body, finding his wounds to be grievous but his vitality strong. He attempts to examine his head injury but is hindered by the dim lighting. As he searches for a solution, a memory fragment from his past life as Klein Moretti surfaces, reminding him of the gas lamp's capabilities. He finds the lamp on the wall and attempts to light it, but it doesn't work at first. After recalling his brother Benson's resourcefulness in installing gas pipes, Zhou Mingrui discovers a gas meter and uses a copper penny to activate it, finally illuminating the room with a warm glow. With the darkness receded, Zhou Mingrui inspects his wound again, finding it to be rapidly healing. He attributes this to the restorative effects of transmigration. Feeling relieved, he decides to clean up the blood stains on his head and ventures out into the dark corridor, where the crimson moonlight casts eerie silhouettes.


--gpt2-medium summary of chapter 2--


 to find out if he could fire the revolver again!
If he could do it, he would then have the chance to try to assassinate his brother again!
If he failed, then he would not be able to kill his brother again!
If he succeeded, he would not be able to kill his brother again!
Zhou Mingrui finally opened his eyes.
He was facing a wall.
At the side of the wall was a small hole that was about two inches wide.
That hole was the one that had been drilled through Zhou Mingrui's skull.
The hole had been filled with a thick layer of blood.
Zhou Mingrui did not know why the hole had been filled, but he knew that it was because the bullet had pierced through his head.
Zhou Mingrui was a man with a long and illustrious career.
He had been a deputy of the People's Republic of China for many years, and he had been a member of the Supreme People's Assembly of the People's Republic of China for a long time.
He had also served as the deputy head of the Central Military Commission.
He had been a member of the People's Liberation Army for over 20 years, and he had been a member of the People's Liberation Army for over a decade.
He had been a member of the People's Republic of China for over 15 years, and he had been a member of the People's Republic of


--microsoft/phi-3-mini-128k-instruct summary of chapter 2--



Zhou Mingrui, experiencing a series of bad luck events, recalled performing a luck enhancement ritual involving placing four portions of staple food in his room and taking four counterclockwise steps while chanting blessings. After his transmigration, which occurred overnight, he contemplated the possibility that his transmigration could be a result of the ritual and decided to repeat it in the hopes of returning.




--google/flan-t5-base summary of chapter 2--


Zhou Mingrui drew back in fear at the sight that greeted him. It was as though the person in the dressing mirror was not himself, but a dessicated corpse. How could a person with such grievous wounds be still alive Zhou Mingrui was not a rash person who did not think of the consequences of taking on debt. He was a literate and had worked for several years. He insisted on creating conducive studying conditions for Klein even if it meant taking on debt. Zhou Mingrui's transmigration to China began with a single penny, which was only minted and circulated after King George III ascended to the throne. After flipping the coin-which was only minted and circulated after King Zhou Mingrui would be able to hear the sound of the water from the sink, but if the water gushed too loudly, Mr. Franky would be able to hear the sound of the water from the sink. Zhou Mingrui had a rough idea of how Klein had died. He was in no hurry to verify his guess. Instead, he wiped away the blood stains and cleaned up the'scene' beneath the desk. Zhou Mingrui's right hand was subconsciously pulling out the revolver's cylinder and s


--facebook/bart-large-cnn summary of chapter 2--


Zhou Mingrui's brother, Benson, installed gas pipes in his apartment to improve the apartment's standards. Zhou's brother wanted to create conducive studying conditions for Klein Moretti even if it meant taking on debt. Zhou took out a coin from his pocket and inserted it into the gas meter's vertical'mouth' The penny fell to the bottom of the meter, producing a short but melodious mechanical rhythm. Zhou then touched his exposed skin. Beneath the slight coldness was flowing warmth. It was as though the person in the dressing mirror was not himself, but a dessicated corpse. Zhou was shocked to see the penetrating wound and dark red blood stains in the mirror. He was still alive! He was alive! How could a person with such grievous wounds be still alive!? Zhou was stunned to see that he was alive. He had been injured! He had just been hit in the head with a coin.                A fire plume ignited and rapidly grew. Bright light first occupied the internals of a wall lamp before penetrating the transparent glass, blanketing the Zhou Mingruo tried a luck enhancement ritual before dinner today. The ritual was extremely simple, without any basic foundation requirements. The first step required him to sincerely chant ‘Blessings Stem From The Immortal Lord of Heaven and Earth' The second step was to silently chant, 'Blessing Stem from The Sky Lord of heaven and earth' The third step


--google/pegasus-xsum summary of chapter 2--


Zhou Mingrui had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, but he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's capital city, and he had no choice but to use his savings to buy a gas meter for the Loen Kingdom's


--t5-large summary of chapter 2--


a Hail Mary! He had to attempt a Hail Mary! He had to attempt a Hail Mary! He had to attempt a Hail Mary! He had to attempt a Hail Mary! He had to attempt a Hail Mary! paragraph without bullet points, there was not a single blood stain on his temple.. The coin was a copper penny.. The coin was engraved with a portrait of a crown-wearing man on the back., there.......,...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................,,,,............................................................................................................................................................................................,?.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


--allenai/led-base-16384 summary of chapter 2--


Zhou Mingrui looked around the room. He saw that there was no one in the room with a warm glow.Zhou Mingrui reeled back in fear at the sight that greeted him. It was as though the person in the dressing mirror was not himself, but a dessicated corpse.Zhou Mingrui looked around the room. "How could a person with such grievous wounds be still alive!?”He turned his head in disbelief again and checked the other side. Even though he was a distance away and the lighting was poor, he could still see the penetrating wound and dark red blood stains.Zhou Mingrui looked around the room."This..."Zhou Mingrui drew a deep breath as he tried hard to calm himself. "This is the most important thing in the world."He reached out to press his left chest and sensed his racing heart that exuded immense vitality. "This is the most important thing in the world."He then touched his exposed skin. Beneath the slight coldness was flowing warmth. "This is the most important thing in the world.""The restorative effects that transmigration brings?" Zhou Mingrui curled up the right corner of his mouth as he muttered silently. "This is the most important thing in the world.""What's happening?" he muttered with a frown. He planned to inspect his head injury seriously once more. "This is the most important thing in the world."


