# Many thanks to:

"Evaluate 🤗's BigBirdPegasus on Pubmed

In this notebook, we evaluate BigBird on the long-range summarization task of pubmed. BigBird was introduced in Big Bird: Transformers for Longer Sequences by Manzil Zaheer et al..."

https://colab.research.google.com/github/vasudevgupta7/bigbird/blob/main/notebooks/bigbird_pegasus_evaluation.ipynb#scrollTo=74CoqZ3rmV6v


---


"Transformers Based Text Summarization Model"
https://colab.research.google.com/drive/13q6jJvnzF7vmgUMqSAUKe7XlDlnDhh-H?usp=sharing#scrollTo=yq4maqAOQVsj


---

"Metric: rouge

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing..."
https://huggingface.co/spaces/evaluate-metric/rouge

# **0. Install Dependencies 🔋**

In [None]:
!pip install torch
!pip install datasets
!pip install rouge
!pip install transformers

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
Collectin

In [None]:
from datasets import load_dataset
import torch
import pandas as pd
from rouge import Rouge # performance metrics
from tqdm import tqdm # تقدّم , a smart progress meter

# **1. Import Dataset 📚**

In [None]:
# initilaizing variables for dataset, from hugging face.
DATASET_NAME = "arxiv" # putting the dataset name specifically since in Huggingface, "scientific_papers" dataset has 2 subsets, we are only using the arxiv subset.
DEVICE = "cuda" # With CUDA, we are able to dramatically speed up computing applications by harnessing the power of GPUs.
CACHE_DIR = DATASET_NAME #Directory to read data, defalut would be "~/.cache/huggingface/datasets", we are assigning our dataset name


In [None]:
# the test set for the arxiv subset of scientific papers dataset in Hugging face has 6440 rows, however we only need 10 rows to evaluate the midel
test_dataset = load_dataset("scientific_papers", DATASET_NAME, split="test[0:10]", cache_dir=CACHE_DIR)
test_dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/5.35k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.27k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.62G [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/880M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/203037 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/6436 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/6440 [00:00<?, ? examples/s]

Dataset({
    features: ['article', 'abstract', 'section_names'],
    num_rows: 10
})

In [None]:
#turning the data into a pandas DataFrame for easier use
data = pd.DataFrame(test_dataset)
data

Unnamed: 0,article,abstract,section_names
0,for about 20 years the problem of properties o...,the short - term periodicities of the daily s...,introduction\nmethods of periodicity analysis\...
1,it is believed that the direct detection of gr...,we study the detectability of circular polari...,introduction\nstokes parameters for plane grav...
2,"as a common quantum phenomenon , the tunneling...","starting from the wkb approximation , a new b...",[sec:intro]introduction\n[sec:formalism]formal...
3,for the hybrid monte carlo algorithm ( hmc)@xc...,we study a novel class of numerical integrato...,introduction\ngeometric integrators for hmc\nt...
4,recently it was discovered that feynman integr...,new methods for obtaining functional equation...,introduction\nderiving functional equations fr...
5,one of the main goals of the search for period...,in the hierarchical search for periodic sourc...,introduction\nscheme of the hierarchical proce...
6,this review focuses specifically on what we ha...,i summarize what we have learned about the na...,introduction\nbackground: sn classification an...
7,single - transverse spin asymmetries ( ssas ) ...,we present a phenomenological study of the si...,introduction\nspin-dependent cross section and...
8,kingman s coalescent is a random tree introduc...,kingman s coalescent is a random tree that ar...,introduction\nmain results\nproof of theorem[t...
9,rapid progress in the design and manufacture o...,we discuss several novel types of multi - com...,introduction\ntemporal and spatial solitons\nb...


In [None]:
# we only need the columns article and abstract for our task, so we drop the column section_names
data = data.drop(columns = ['section_names'])
data

Unnamed: 0,article,abstract
0,for about 20 years the problem of properties o...,the short - term periodicities of the daily s...
1,it is believed that the direct detection of gr...,we study the detectability of circular polari...
2,"as a common quantum phenomenon , the tunneling...","starting from the wkb approximation , a new b..."
3,for the hybrid monte carlo algorithm ( hmc)@xc...,we study a novel class of numerical integrato...
4,recently it was discovered that feynman integr...,new methods for obtaining functional equation...
5,one of the main goals of the search for period...,in the hierarchical search for periodic sourc...
6,this review focuses specifically on what we ha...,i summarize what we have learned about the na...
7,single - transverse spin asymmetries ( ssas ) ...,we present a phenomenological study of the si...
8,kingman s coalescent is a random tree introduc...,kingman s coalescent is a random tree that ar...
9,rapid progress in the design and manufacture o...,we discuss several novel types of multi - com...


In [None]:
# lets see the type of data to confirm we are dealing with a dataframe
type(data)

# **2. BigBirdPegasus Model 🦅🐎**

In [None]:
# importing the classes necessary fornour model from the transformers library
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
# loading the tokenizer associated wth our model
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
# loading the pre-trained BigBirdPegasus model, the model is then moved to DEVICE which we instiniated as CUDA before
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv").to(DEVICE)

tokenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.92M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.51M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/775 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.31G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/232 [00:00<?, ?B/s]

In [None]:
# lets see the configuration of the model
config = model.config
print(config)
# lets print the maximum input token length and the maximum output token length

# Print the maximum position embeddings (maximum input token length)
print("Maximum input token length:", config.max_position_embeddings)
# Print the maximum output token length
print("Maximum output token length:", config.max_length)

BigBirdPegasusConfig {
  "_name_or_path": "google/bigbird-pegasus-large-arxiv",
  "activation_dropout": 0.0,
  "activation_function": "gelu_new",
  "architectures": [
    "BigBirdPegasusForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "attention_type": "block_sparse",
  "block_size": 64,
  "bos_token_id": 2,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 16,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 16,
  "eos_token_id": 1,
  "gradient_checkpointing": false,
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "length_penalty": 0.8,
  "max_length": 256,
  "max_position_embeddings": 4096,
  "model_type": "bigbird_pegasus",
  "num_beams": 5,
  "num_hidden_layers": 16,
  "num_random_blocks": 3,
  "pad_token_id": 0,
  "scale_embedding": true,
  "tokenizer_cla

# **3.Summarization Function 📝**

In [None]:
# Set the repetition penalty and length constraint
repetition_penalty = 2.0  #penalizes the generation of repeated tokens in the generated text, max is 2.0
length_constraint = 4096 #maximum length constraint for the generated summary, the model already
# write the function for generating the summary -->
def summarize(article):
  #tokenize the input, truncation = True to ensures that the article is truncated if it exceeds the maximum token length
  #padding parameter pads the input sequence to the maximum length if it is shorter
  # the token is then converted to PyTorch tensors and moved to the specified DEVICE
  input_ids = tokenizer.encode(article, truncation =True, padding ='longest', return_tensors='pt').to(DEVICE)
  #generate the summary, it takes the tokenized inout and generate the summary,
  #repetition_penalty and max_length parameters control the generation process.
  summary_ids = model.generate(input_ids, repetition_penalty=repetition_penalty, max_length=length_constraint)
  #decodes the generated summary tokens back into text
  Pred_summary = tokenizer.decode(summary_ids[0])

  return Pred_summary


In [None]:
# lets print the 3rd article and see the full article and it's human abstract
document = data['article'][3]
print("The 3rd article content:", document)
print("-----------------------------------------")
human_abstract = data['abstract'][3]
print("The 3rd abstract content:", human_abstract)

The 3rd article content: for the hybrid monte carlo algorithm ( hmc)@xcite , often used to study quantum chromodynamics ( qcd ) on the lattice , one is interested in efficient numerical time integration schemes which are optimal in terms of computational costs per trajectory for a given acceptance rate . high order
numerical methods allow the use of larger step sizes , but demand a larger computational effort per step ; low order schemes do not require such large computational costs per step , but need more steps per trajectory .
so there is a need to balance these opposing effects .
omelyan integration schemes @xcite of a force - gradient type have proved to be an efficient choice , since it is easy to obtain higher order schemes that demand a small additional computational effort .
these schemes use higher - order information from force - gradient terms to both increase the convergence of the method and decrease the size of the leading error coefficient . other ideas to achieve bette

In [None]:
# lets send in the chosen docuemnt for summarization and print the preidcted summary
predicted = summarize(document)
print(predicted)

<s> we present a new class of numerical time integration schemes for the hybrid monte carlo algorithm, which combine the advantages of both force - gradient integrators and multirate approaches.<n> we apply these schemes to the calculation of the two - dimensional quantum electrodynamics ( qed ) in the quenched approximation using the schwinger model as a test case.</s>


# **4.Model Evaluation 🔍**

In [None]:
#define the function for the performance metrics (Rouge-1, Rouge-2,Rouge-L, Precision, Recall, and F-1) scores.
def get_rouge_scores3(actual_summary, predicted_summary):
    rouge = Rouge() #initializes an instant of the Rouge class
    scores = rouge.get_scores(predicted_summary, actual_summary)[0]  # computes the Rouge scores of predicted summary compared to the actual summary.
    #extract the Rouge F1 scores for Rouge-1, Rouge-2, and Rouge-L from the scores dictionary and assign them to variables.
    rouge_1_f = scores['rouge-1']['f']
    rouge_2_f = scores['rouge-2']['f']
    rouge_l_f = scores['rouge-l']['f']

    # calculate the average precision and recall scores across Rouge-1, Rouge-2, and Rouge-L.
    #precision --> measures the proportion of predicted positive cases that are actually positive.
    precision = (scores['rouge-1']['p'] + scores['rouge-2']['p'] + scores['rouge-l']['p']) / 3
    #recall --> measures the proportion of actual positive cases that were correctly identified by the model.
    recall = (scores['rouge-1']['r'] + scores['rouge-2']['r'] + scores['rouge-l']['r']) / 3
    #F-1 --> it's the harmonic mean of precision and recall. It checks if the sum of precision and recall is greater than zero to avoid division by zero.
    f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0  # Avoid division by zero
    #returning a list containing the performance metrics.
    return [rouge_1_f, rouge_2_f, rouge_l_f, precision, recall, f1]


#initialize empty list for the metrices, as well as the preidcted summaries.
rouge1_scores = []
rouge2_scores = []
rougel_scores = []
precision_scores = []
recall_scores = []
f1_scores = []
pred_summary_list = []
# for loop iterating over each entry in the data DataFrame.
for i in tqdm(range(len(data))):
    #extract the article text and store it in doc
    doc = data.loc[i]['article']
    # send in the doc to perform summarization
    pred_summary = summarize(doc)
    #extract the original abstract and store it in human_summary
    human_summary = data.loc[i]['abstract']
    # using get_rouge_scores3, calculate the scores
    scores = get_rouge_scores3(human_summary, pred_summary)
    # Append scores to each list respectivly
    rouge1_scores.append(scores[0])
    rouge2_scores.append(scores[1])
    rougel_scores.append(scores[2])
    precision_scores.append(scores[3])
    recall_scores.append(scores[4])
    f1_scores.append(scores[5])
    #append the predicted summary to pred_summary
    pred_summary_list.append(pred_summary)

# Add predicted summaries to data DataFrame, for visualization,
data["pred_summary"] = pred_summary_list

#adding the performance metrics scores to the dataframe as new columns.
data['rouge1'] = rouge1_scores
data['rouge2'] = rouge2_scores
data['rougel'] = rougel_scores
data['precision'] = precision_scores
data['recall'] = recall_scores
data['f1'] = f1_scores
#display the full DataFrame
data

 20%|██        | 2/10 [04:02<13:28, 101.03s/it]Input ids are automatically padded from 3768 to 3776 to be a multiple of `config.block_size`: 64
100%|██████████| 10/10 [04:40<00:00, 28.03s/it]


Unnamed: 0,article,abstract,pred_summary,rouge1,rouge2,rougel,precision,recall,f1
0,for about 20 years the problem of properties o...,the short - term periodicities of the daily s...,"<s> the daily sunspot areas, the mean sunspot ...",0.4,0.154613,0.365957,0.25249,0.391076,0.306861
1,it is believed that the direct detection of gr...,we study the detectability of circular polari...,<s> we investigate the detectability of circul...,0.517241,0.240506,0.465517,0.442935,0.377893,0.407837
2,"as a common quantum phenomenon , the tunneling...","starting from the wkb approximation , a new b...",<s> we present a new analytical formula for ba...,0.514286,0.262069,0.495238,0.447861,0.402409,0.42392
3,for the hybrid monte carlo algorithm ( hmc)@xc...,we study a novel class of numerical integrato...,<s> we present a new class of numerical time i...,0.425926,0.173913,0.37037,0.353333,0.298408,0.323557
4,recently it was discovered that feynman integr...,new methods for obtaining functional equation...,<s> a method for deriving functional equations...,0.231405,0.060241,0.132231,0.218056,0.104659,0.141435
5,one of the main goals of the search for period...,in the hierarchical search for periodic sourc...,<s> in this paper we propose a new frequency h...,0.251497,0.127119,0.203593,0.30499,0.142397,0.194149
6,this review focuses specifically on what we ha...,i summarize what we have learned about the na...,<s> i present a brief review of what we have l...,0.325991,0.108434,0.264317,0.354382,0.173478,0.232931
7,single - transverse spin asymmetries ( ssas ) ...,we present a phenomenological study of the si...,<s> we present predictions for the single - tr...,0.478632,0.263158,0.393162,0.411433,0.350163,0.378333
8,kingman s coalescent is a random tree introduc...,kingman s coalescent is a random tree that ar...,<s> it is well known that the time @xmath0 to ...,0.47205,0.139918,0.385093,0.454167,0.262488,0.332694
9,rapid progress in the design and manufacture o...,we discuss several novel types of multi - com...,<s> we discuss three different examples of mul...,0.504854,0.218978,0.446602,0.420923,0.363636,0.390188


In [None]:
#print the average of each score
print("The Average of The Rouge-1 Score: ",data['rouge1'].mean())
print("The Average of The Rouge-2 Score: ",data['rouge2'].mean())
print("The Average of The Rouge-L Score: ",data['rougel'].mean())
print("The Average of The Precision Score: ",data['precision'].mean())
print("The Average of The Recall Score: ",data['recall'].mean())
print("The Average of The F-1 Score: ",data['f1'].mean())

The Average of The Rouge-1 Score:  0.4121882663082337
The Average of The Rouge-2 Score:  0.1748948793053628
The Average of The Rouge-L Score:  0.3522082008816609
The Average of The Precision Score:  0.36605692960426217
The Average of The Recall Score:  0.2866607900276158
The Average of The F-1 Score:  0.3131904232315196
