<a href="https://colab.research.google.com/github/ayoubbensakhria/finance_algo/blob/master/Bigbird_Pegasus_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Evaluate 🤗's BigBirdPegasus on Pubmed**

In this notebook, we evaluate BigBird on the long-range summarization task of **[pubmed](https://huggingface.co/datasets/scientific_papers)**. BigBird was introduced in [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by *Manzil Zaheer et al.* It has achieved outstanding performance on long document summarization using an efficient block sparse attention mechanism. Please refer to this [blog post](https://huggingface.co/blog/big-bird) for an in-detail explanation of BigBird's block sparse attention.

Let's see what GPU we got. We need at least ~12 GB GPU memory to be able to run this notebook.

In [1]:
!nvidia-smi

Fri Jun 10 10:45:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

Let's first install `transformers`, `datasets`, `rouge_score` and `sentencepiece`.

In [7]:
%%capture
!pip3 install datasets
!pip3 install rouge_score
!pip3 install git+https://github.com/huggingface/transformers
!pip3 install sentencepiece

As mentioned above, we will evaluate **BigBirdPegasus** on the **_pubmed_** dataset using the **Rouge-2** metric. For this, let's 
import the two loading functions `load_dataset` and `load_metric`. Futher, we import the `BigBirdPegasusForConditionalGeneration` and `AutoTokenizer` tokenizer.

In [3]:
from datasets import load_dataset, load_metric
import torch
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

Let's define some variables which will be useful later on.

In [4]:
DATASET_NAME = "pubmed"
DEVICE = "cuda"
CACHE_DIR = DATASET_NAME
MODEL_ID = f"google/bigbird-pegasus-large-{DATASET_NAME}"

To begin with, let's take a look at the PubMed dataset ([click to see on 🤗Datasets Hub](https://huggingface.co/datasets/scientific_papers)).
PubMed consists of scientific papers in the field of medicine. The dataset splits each paper into the *article*, and the *abstract* whereas the article consists of the whole paper minus the abstract. Thus, the input to be summarized is defined by the article and the gold label by the abstract.

The following table summarizes the size of the *train*, *validation*, and *test* split of the dataset.

|               |Training | Validation | Test |
|---------------|---------|------------|------|
| Total samples | 119924  | 6633       | 6658 |

In this notebook, we are only interested in evaluating *BigBird*. To do so, let's download the *test* split of the `pubmed` dataset. This can take a couple of minutes **☕** .

The official checkpoint `google/bigbird-pegasus-large-pubmed` ([click to see on 🤗Model Hub](https://huggingface.co/google/bigbird-pegasus-large-pubmed)) has already been fine-tuned on pubmed, so we can simply load the weights are run the model in inference mode.

In [5]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = BigBirdPegasusForConditionalGeneration.from_pretrained(MODEL_ID).to(DEVICE)
rouge = load_metric("rouge")

Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.03k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.83M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.35M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/775 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.15G [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

`BigBirdPegasus` makes use of *block sparse attention*. Let's verify the `config`'s attention type and the `block_size`.

In [6]:
model.config.attention_type, model.config.block_size

('block_sparse', 64)

Next, we will take a look at the length distribution of the dataset. The following table shows the *median* and the 90% quantile of the article, and abstract (summary). 

|                 | Median | 90%-ile |
|-----------------|--------|---------|
| Articles Length | 2715   | 6101    |
| Summary Length  | 212    | 318     |

`BigBirdPegasus` can handle sequence up to a length of **4096** which is significantly higher than the median input length of **2715**. However, many input samples are longer than **4096**, which consequently need to be truncated. 
The summaries have a median length of **212** with 90% being shorter than **318**. Given this data, 256 seems to be a reasonable choice as the model's maximum generation length.

Now we can write the evaluation function for BigBirdPegasus.
First, we tokenize each *article* up to a maximum length of 4096 tokens.
We will make use of beam search (with `num_beams=5` & `length_penalty=0.8`) to generate the predicted *abstract* of the *article*. Finally, the predicted *abstract* tokens are decoded and the resulting predicted *abstract* string is saved in the batch.

In [8]:
def generate_answer(batch):
  inputs_dict = tokenizer(batch["article"], padding="max_length", max_length=4096, return_tensors="pt", truncation=True)
  inputs_dict = {k: inputs_dict[k].to(DEVICE) for k in inputs_dict}
  predicted_abstract_ids = model.generate(**inputs_dict, max_length=256, num_beams=5, length_penalty=0.8)
  batch["predicted_abstract"] = tokenizer.decode(predicted_abstract_ids[0], skip_special_tokens=True)
  print(batch["predicted_abstract"])
  return batch

def summarize(text):
  inputs_dict = tokenizer(text, padding="max_length", max_length=4096, return_tensors="pt", truncation=True)
  inputs_dict = {k: inputs_dict[k].to(DEVICE) for k in inputs_dict}
  predicted_abstract_ids = model.generate(**inputs_dict, max_length=256, num_beams=5, length_penalty=0.8)
  summary = tokenizer.decode(predicted_abstract_ids[0], skip_special_tokens=True)
  print(summary)
  return summary

In [9]:
text = """
Erythropoietin (EPO) is a growth factor produced in the kidneys that stimulates the production of red blood cells. It works by promoting the division and differentiation of committed erythroid progenitors in the bone marrow [FDA Label]. Epoetin alfa (Epoge) was developed by Amgen Inc. in 1983 as the first rhEPO commercialized in the United States, followed by other alfa and beta formulations. Epoetin alfa is a 165-amino acid erythropoiesis-stimulating glycoprotein produced in cell culture using recombinant DNA technology and is used for the treatment of patients with anemia associated with various clinical conditions, such as chronic renal failure, antiviral drug therapy, chemotherapy, or a high risk for perioperative blood loss from surgical procedures [FDA Label]. It has a molecular weight of approximately 30,400 daltons and is produced by mammalian cells into which the human erythropoietin gene has been introduced. The product contains the identical amino acid sequence of isolated natural erythropoietin and has the same biological activity as the endogenous erythropoietin. Epoetin alfa biosimilar, such as Retacrit (epoetin alfa-epbx or epoetin zeta), has been formulated to allow more access to treatment options for patients in the market [L2784]. The biosimilar is approved by the FDA and EMA as a safe, effective and affordable biological product and displays equivalent clinical efficacy, potency, and purity to the reference product [A7504]. Epoetin alfa formulations can be administered intravenously or subcutaneously.
"""

summarize(text)

  * num_indices_to_pick_from


the first biosimilar of erythropoietin ( epoetin alfa ) has been approved for use in the united states.<n> biosimilar erythropoietin ( epoetin ) is a monoclonal antibody that binds the cognate receptor erythropoietin.<n> biosimilar erythropoietin ( epoetin alfa ) is characterized by low incidence of serious adverse events, comparable efficacy and comparable pharmacokinetics and pharmacodynamics.<n> the most common adverse events with biosimilar erythropoietin ( epoetin alfa ) are fatigue, myalgia, headache, nausea, and diarrhea.<n> the incidence of serious adverse events with biosimilar erythropoietin ( epoetin alfa ) is similar to that with other commonly used monoclonal antibodies.<n> biosimilar erythropoietin ( epoetin alfa ) is also used in the treatment of anemia associated with various clinical conditions, such as chronic renal failure, antiviral drug therapy, chemotherapy, or perioperative blood loss.<n> the efficacy of biosimilar erythropoietin ( epoetin alfa ) is similar to th

'the first biosimilar of erythropoietin ( epoetin alfa ) has been approved for use in the united states.<n> biosimilar erythropoietin ( epoetin ) is a monoclonal antibody that binds the cognate receptor erythropoietin.<n> biosimilar erythropoietin ( epoetin alfa ) is characterized by low incidence of serious adverse events, comparable efficacy and comparable pharmacokinetics and pharmacodynamics.<n> the most common adverse events with biosimilar erythropoietin ( epoetin alfa ) are fatigue, myalgia, headache, nausea, and diarrhea.<n> the incidence of serious adverse events with biosimilar erythropoietin ( epoetin alfa ) is similar to that with other commonly used monoclonal antibodies.<n> biosimilar erythropoietin ( epoetin alfa ) is also used in the treatment of anemia associated with various clinical conditions, such as chronic renal failure, antiviral drug therapy, chemotherapy, or perioperative blood loss.<n> the efficacy of biosimilar erythropoietin ( epoetin alfa ) is similar to t