# Pegasus
**The Pegasus architecture** is a sequence-to-sequence transformer-based model that has been adapted to excel at various tasks in natural language processing, especially in the field of summarization. In this article, we will introduce several pre-trained models that are based on the Pegasus architecture and have achieved state-of-the-art performance on different summarization tasks. These models include Pegasus-XSUM, Pegasus-CNN/DailyMail, Pegasus-Newsroom, Pegasus-MultiNews, Pegasus-Gigaword, Pegasus-WikiHow, Pegasus-Reddit-TIFU, and Pegasus-BigPatent. We will provide a brief description of each model and the dataset it was trained on, as well as their performance on their respective benchmarks. All of these models are available as part of the Hugging Face Transformers library, making them easy to use for natural language processing tasks.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer, AutoTokenizer

In [None]:
!pip install transformers
!pip install sentencepiece
!pip install rouge
!pip install nltk
from rouge import Rouge 
import nltk
import nltk.translate.bleu_score as bleu
nltk.download('punkt')
import torch
rouge = Rouge()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[0mhuggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[0mhuggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[0mhuggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlo

In [None]:
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
def summarize(text):
    batch = tokenizer.prepare_seq2seq_batch([text], truncation=True, padding='longest', max_length=1024, return_tensors='pt').to(torch_device)
    translated = model.generate(**batch)
    summary = tokenizer.batch_decode(translated, skip_special_tokens=True)[0]
    return summary

In [None]:
def gigasummarize(text):
    batch = tokenizer.prepare_seq2seq_batch([text], truncation=True, padding='longest', max_length=128, return_tensors='pt').to(torch_device)
    translated = model.generate(**batch)
    summary = tokenizer.batch_decode(translated, skip_special_tokens=True)[0]
    return summary

In [None]:
text = """
India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its vaccination drive in mid-January, starting with healthcare and frontline workers. Since then, over 13 million doses have been administered across the country. However, the pace of the vaccination drive has been slower than expected, with concerns raised over vaccine hesitancy and logistical challenges.The expansion of the vaccination drive to include the elderly and those with co-morbidities is a major step towards achieving herd immunity and controlling the spread of the virus in India. The Health Ministry has also urged eligible individuals to come forward and get vaccinated at the earliest.India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.In summary, India's Health Ministry has announced that the country's COVID-19 vaccination drive will be expanded to include people over 60 and those over 45 with co-morbidities, covering an additional 270 million people. The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19, and is a major step towards achieving herd immunity and controlling the spread of the virus in India.
"""

In [None]:
from nltk.translate.bleu_score import sentence_bleu

def summary_to_sentences(summary):
    # Split the summary into sentences using the '.' character as a separator
    sentences = summary.split('.')
    
    # Convert each sentence into a list of words
    sentence_lists = [sentence.split() for sentence in sentences]
    
    return sentence_lists

def paragraph_to_wordlist(paragraph):
    # Split the paragraph into words using whitespace as a separator
    words = paragraph.split()
    return words
reference_paragraph = text
reference_summary = summary_to_sentences(reference_paragraph)


**Pegasus-XSUM** is a pre-trained language model that was introduced in 2020 for the task of extreme summarization. It is based on the Pegasus architecture, which uses a transformer-based sequence-to-sequence model. Pegasus-XSUM was trained on a large dataset of document-summary pairs called XSUM, which contains over 220,000 examples. The model has achieved state-of-the-art performance on the XSUM dataset and has shown promising results on other summarization tasks. Pegasus-XSUM is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks.

In [None]:
model_name = 'google/pegasus-xsum'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [None]:
summary = summarize(text)
print(summary)

Millions more people are to be vaccinated against the deadly CO-19 virus.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.583
Recall: 0.046
F1-Score: 0.085


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)

score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.462


**Pegasus-CNN/DailyMail** is a pre-trained language model introduced in 2019 that is designed for the task of summarizing news articles. It is a variant of the Pegasus model that is trained on the CNN/DailyMail dataset, which contains over 300,000 news articles and their corresponding summaries. Pegasus-CNN/DailyMail uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original article. The model has achieved state-of-the-art performance on the CNN/DailyMail dataset and has also shown promising results on other summarization tasks. Like Pegasus-XSUM, Pegasus-CNN/DailyMail is available as part of the Hugging Face Transformers library.

In [None]:
model_name = 'google/pegasus-cnn_dailymail'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [None]:
summary = summarize(text)
print(summary)

India's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities.<n>The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.<n>Private hospitals may also be allowed to administer the vaccine.<n>India has reported over 11 million cases of COVID-19, making it the second-worst affected country in the world after the United States.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.947
Recall: 0.355
F1-Score: 0.517


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.796


**Pegasus-Newsroom** is a pre-trained language model introduced in 2021 for the task of summarizing news articles. It is based on the Pegasus architecture and is trained on a large and diverse dataset of news articles from various sources. The model is designed to handle multiple languages and can generate high-quality summaries that capture the most important information from the original article. Pegasus-Newsroom has achieved state-of-the-art performance on the Newsroom benchmark, which consists of news articles from multiple sources and in multiple languages. The model is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks.

In [None]:
model_name = 'google/pegasus-newsroom'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [None]:
summary = summarize(text)
print(summary)

India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.The decision was taken after a meeting of the National Expert Group on Vaccine Administration for COVID-19 (NEGVAC), which recommended the expansion of the vaccination program. The NEGVAC also suggested that private hospitals may be allowed to administer the vaccine, although the details of this are yet to be finalized.India began its


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.526
F1-Score: 0.690


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.863


**Pegasus-MultiNews** is a pre-trained language model introduced in 2021 that is designed to summarize news articles from multiple sources in multiple languages. It is based on the Pegasus architecture and is trained on a large and diverse dataset of news articles from over 30 languages and hundreds of sources. Pegasus-MultiNews is capable of generating high-quality summaries that capture the most important information from the original article, regardless of the language or source. The model has achieved state-of-the-art performance on the MultiNews dataset, which contains news articles from multiple sources in multiple languages. Pegasus-MultiNews is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks.

In [None]:
model_name = 'google/pegasus-multi_news'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [None]:
summary = summarize(text)
print(summary)

– Pandemic H1N1 isn't confined to the US: India is now aiming to cover an additional 270 million people with a vaccine against the virus, making it one of the biggest such drives in the world, reports the Times of India. The nation's Health Ministry announced the expansion of its drive to include people over 60 and those over 45 with co-morbidities after an expert group recommended the move. India has reported more than 11 million cases of the virus, making it the second-worst affected country in the world after the US. The country's daily case count has been declining in recent weeks, but experts have warned that the pandemic is far from over and that precautions need to be maintained.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.776
Recall: 0.434
F1-Score: 0.557


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.510


**Pegasus-Gigaword** is a pre-trained language model introduced in 2019 for the task of summarizing news articles. It is a variant of the Pegasus model that is trained on the Gigaword dataset, which contains over 4 million news articles and their corresponding summaries. Pegasus-Gigaword uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original article. The model has achieved state-of-the-art performance on the Gigaword dataset and has also shown promising results on other summarization tasks. Pegasus-Gigaword is available as part of the Hugging Face Transformers library.

In [None]:
model_name = 'google/pegasus-gigaword'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [None]:
summary = gigasummarize(text)
print(summary)

u.s. to expand vaccination drive to include people over the age of ## and those over the age of ## with bc-na-gen 


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.706
Recall: 0.079
F1-Score: 0.142


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.437


**Pegasus-WikiHow** is a pre-trained language model introduced in 2020 that is designed for the task of summarizing how-to articles. It is a variant of the Pegasus model that is trained on the WikiHow dataset, which contains over 230,000 how-to articles and their corresponding summaries. Pegasus-WikiHow uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original how-to article. The model has achieved state-of-the-art performance on the WikiHow dataset and has also shown promising results on other summarization tasks related to procedural text. Pegasus-WikiHow is available as part of the Hugging Face Transformers library.

In [None]:
model_name = 'google/pegasus-wikihow'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

Confirm that the COVID-19 vaccination drive will now be expanded to include the elderly and those over 45 with co-morbidities.<n>Confirm that the COVID-19 vaccination drive will now be expanded to include the elderly and those over 45 with co-morbidities.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.900
Recall: 0.118
F1-Score: 0.209


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.420


**Pegasus-Reddit-TIFU** is a pre-trained language model introduced in 2021 for the task of summarizing Reddit posts from the subreddit TIFU (Today I F##ked Up). It is based on the Pegasus architecture and is trained on a large dataset of TIFU posts and their corresponding summaries. Pegasus-Reddit-TIFU uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original post. The model has achieved state-of-the-art performance on the Reddit-TIFU dataset and has also shown promising results on other summarization tasks related to social media posts. Pegasus-Reddit-TIFU is available as part of the Hugging Face Transformers library.

In [None]:
model_name = 'google/pegasus-reddit_tifu'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

India's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities, making it one of the largest vaccination drives in the world.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.184
F1-Score: 0.311


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.887


**Pegasus-BigPatent** is a pre-trained language model introduced in 2020 that is designed for the task of summarizing patent documents. It is based on the Pegasus architecture and is trained on a large and diverse dataset of patent documents from various fields. Pegasus-BigPatent uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original patent document. The model has achieved state-of-the-art performance on the BigPatent dataset, which contains over 1.3 million patent documents and their corresponding summaries. Pegasus-BigPatent is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks related to patents.

In [None]:
model_name = 'google/pegasus-big_patent'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

The country&#39;s vaccine drive is expanded to include people over the age of 60 and those over 45 with co-morbidities, covering an additional 270 million people.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.958
Recall: 0.151
F1-Score: 0.261


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.790


**Pegasus-Arxiv** is a pre-trained language model introduced in 2020 for the task of summarizing scientific papers from the ArXiv repository. It is based on the Pegasus architecture and is trained on a large and diverse dataset of scientific papers covering various fields, including computer science, physics, mathematics, and more. Pegasus-Arxiv uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original scientific paper. The model has achieved state-of-the-art performance on the ArXiv dataset and has also shown promising results on other summarization tasks related to scientific papers. Pegasus-Arxiv is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks related to scientific papers.

In [None]:
model_name = 'google/pegasus-arxiv'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

began in in mid-, country vaccination has been administered over 13 million doses have been administered across 13 million doses have been administered across its vaccination drive to include its expansion towards achieving vaccination and controlling the spread of the virus in. began in in mid-, country vaccination has been administered over 13 million doses have been administered across 13 million doses have been administered across its vaccination drive to include its expansion towards achieving vaccination and controlling the spread of the virus in. began in in mid-, country vaccination has been administered over 13 million doses have been administered across 13 million doses have been administered across its vaccination drive to include its expansion towards achieving vaccination and controlling the spread of the virus in. began in in mid-, country vaccination has been administered over 13 million doses have been administered across 13 million doses have been administered across i

In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.963
Recall: 0.171
F1-Score: 0.291


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.074


**Pegasus-PubMed** is a pre-trained language model introduced in 2020 for the task of summarizing biomedical literature from the PubMed database. It is based on the Pegasus architecture and is trained on a large and diverse dataset of biomedical literature covering various fields, including genetics, pharmacology, microbiology, and more. Pegasus-PubMed uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original biomedical literature. The model has achieved state-of-the-art performance on the PubMed dataset and has also shown promising results on other summarization tasks related to biomedical literature. Pegasus-PubMed is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks related to biomedical literature.

In [None]:
model_name = 'google/pegasus-pubmed'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

pandemic influenza ( h1n1 ) is a threat to global public health. <n> the world health organization ( who ) has declared a pandemic influenza a(h1n1 ) outbreak involving h1n1 virus in the middle east. <n> currently, the world health organization ( who ) and the united nations children's fund ( uscf ) are coordinating the response to the outbreak. in the united kingdom, <n> the vaccination drive against the pandemic influenza a(h1n1 ) virus is being conducted in accordance with the who's guidelines. <n> the uk department of health has announced that the country will be expanding its vaccination drive against the pandemic influenza a(h1n1 ) virus to include those over 60 years of age and those with comorbidities. <n> the uk department of health also suggested that private hospitals may be allowed to administer vaccine, although the details of this are yet to be finalized. in the united kingdom, the vaccination drive against the pandemic influenza a(h1n1 ) virus is being conducted in accor

In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 0.519
Recall: 0.263
F1-Score: 0.349


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.150



**Pegasus-AESLC** is a pre-trained language model introduced in 2020 for the task of summarizing English as a second language (ESL) text. It is based on the Pegasus architecture and is trained on a large and diverse dataset of ESL texts, covering various levels of proficiency and text genres, including news articles, emails, and blogs. Pegasus-AESLC uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original ESL text. The model has achieved state-of-the-art performance on the AESLC dataset and has shown promising results on other summarization tasks related to non-native English text. Pegasus-AESLC is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks related to ESL text.

In [None]:
model_name = 'google/pegasus-aeslc'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/259 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

India's Health Ministry has announced that the country's COVID-19 drive will be expanded to include people over the age of 60


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.132
F1-Score: 0.233


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.917


**Pegasus-Billsum** is a pre-trained language model introduced in 2020 for the task of summarizing US Congressional bills. It is based on the Pegasus architecture and is trained on a large and diverse dataset of Congressional bills, covering various topics such as health care, taxes, and education. Pegasus-Billsum uses a transformer-based sequence-to-sequence model that is capable of generating high-quality summaries that capture the most important information from the original bill text. The model has achieved state-of-the-art performance on the Billsum dataset, which contains over 10,000 Congressional bills and their corresponding summaries. Pegasus-Billsum is available as part of the Hugging Face Transformers library, making it easy to use for natural language processing tasks related to Congressional bills.

In [None]:
model_name = 'google/pegasus-billsum'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
summary = summarize(text)
print(summary)

India's Health Ministry has announced that the country's COVID-19 vaccination drive will now be expanded to include people over the age of 60 and those over 45 with co-morbidities. The move is expected to cover an additional 270 million people, making it one of the largest vaccination drives in the world.


In [None]:
scores = rouge.get_scores(summary, text)
print("ROUGE Score:")
print("Precision: {:.3f}".format(scores[0]['rouge-1']['p']))
print("Recall: {:.3f}".format(scores[0]['rouge-1']['r']))
print("F1-Score: {:.3f}".format(scores[0]['rouge-1']['f']))

ROUGE Score:
Precision: 1.000
Recall: 0.289
F1-Score: 0.449


In [None]:
predicted_paragraph = summary
predicted_summary = paragraph_to_wordlist(predicted_paragraph)
score = sentence_bleu(reference_summary, predicted_summary)
print("BLEU Score: {:.3f}".format(score))

BLEU Score: 0.924
