# Text summarization using the 'transformers' library

---



In [8]:
import transformers

# Suppress warning messages from the transformers library to keep the output clean.
transformers.logging.set_verbosity_error()

In [9]:
# Define a verbose text containing information about Earth.
verbose_text = """
Earth is the third planet from the Sun and the only astronomical object
known to harbor life.
While large volumes of water can be found
throughout the Solar System, only Earth sustains liquid surface water.
About 71% of Earth's surface is made up of the ocean, dwarfing
Earth's polar ice, lakes, and rivers.
The remaining 29% of Earth's
surface is land, consisting of continents and islands.
Earth's surface layer is formed of several slowly moving tectonic plates,
interacting to produce mountain ranges, volcanoes, and earthquakes.
Earth's liquid outer core generates the magnetic field that shapes Earth's
magnetosphere, deflecting destructive solar winds.
"""

# Remove newline characters from the text to prepare it for summarization.
verbose_text = verbose_text.replace("\n", "")


In [10]:

# Import the pipeline function from the transformers library.
from transformers import pipeline

# Initialize the summarization pipeline, which generates a concise summary
# of the provided text. The summary will be between 10 and 100 words long.
summarizer = pipeline("summarization",
                      min_length=10,
                      max_length=100)

# Generate a summary of the verbose text.
summary = summarizer(verbose_text)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [11]:

# Extract and print the summary text from the result.
print(summary[0].get("summary_text"))

 Earth is the third planet from the Sun and the only astronomical object known to harbor life . About 71% of Earth's surface is made up of the ocean, dwarfing Earth's polar ice, lakes, and rivers . The remaining 29% of the surface is land, consisting of continents and islands .


In [12]:
print("Checkpoint used: ", summarizer.model.config)

Checkpoint used:  BartConfig {
  "_name_or_path": "sshleifer/distilbart-cnn-12-6",
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_bias_logits": false,
  "add_final_layer_norm": false,
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 6,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "extra_pos_embeddings": 2,
  "force_bos_token_to_be_generated": true,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_en

## Evaluation Using ROUGE metric


### Using single text


In [13]:
!pip install rouge_score
!pip install evaluate



In [14]:
import evaluate

# Load the ROUGE evaluation metric using the evaluate library.
rouge_evaluator = evaluate.load("rouge")

# Define the reference and prediction texts for evaluation.
reference_text = ["This is the same string"]
predict_text = ["This is the same string"]

# Compute the ROUGE scores by comparing the prediction with the reference text.
eval_results = rouge_evaluator.compute(predictions=predict_text,
                                       references=reference_text)

# Display the ROUGE evaluation results for the exact match scenario.
print("Results for Exact Match:", eval_results)

Results for Exact Match: {'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}


In [15]:
# Define the reference and predicted texts for evaluation, which do not match.
reference_text = ["This is the different string"]
predict_text = ["Google can predict warm weather"]

# Compute the ROUGE scores to assess the similarity between the reference and predicted texts.
eval_results = rouge_evaluator.compute(predictions=predict_text,
                                       references=reference_text)

# Print the evaluation results for the case where there is no match between reference and predicted texts.
print("\nEvaluation Results for No Match:", eval_results)



Evaluation Results for No Match: {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}


Observation:

Computing ROUGE Scores:

    The rouge_evaluator.compute method compares the predicted text against the reference text using ROUGE metrics. The key ROUGE scores are:

      - ROUGE-1: Measures the overlap of unigrams (single words).
      - ROUGE-2: Measures the overlap of bigrams (two-word sequences).
      - ROUGE-L: Measures the longest common subsequence.

    Given that the reference and prediction texts are different, the ROUGE scores will likely be close to zero, indicating minimal overlap.


Explanation of Results

  - ROUGE-1 Score: A score of 0.0 indicates no overlap between unigrams (words) in the reference and prediction.
  - ROUGE-2 Score: A score of 0.0 indicates no overlap between bigrams (two-word sequences) in the reference and prediction.
  - ROUGE-L Score: A score of 0.0 indicates no common subsequences between the reference and prediction.
  - ROUGE-Lsum Score: Similar to ROUGE-L but for summarization tasks, also showing 0.0 due to no overlap.

### Using Multiple Texts

In [18]:
import evaluate

# Load the ROUGE evaluation metric using the evaluate library.
rouge_evaluator = evaluate.load("rouge")

# Define lists of reference texts and corresponding predicted texts for evaluation.
reference_texts = [
    "Earth is the third planet from the Sun and the only astronomical object known to harbor life.",
    "The Solar System contains large volumes of water, but only Earth has liquid surface water."
]
predict_texts = [
    "Earth is the third planet from the Sun and is known to harbor life.",
    "Earth is the only planet with liquid surface water."
]

# Compute the ROUGE scores by comparing the lists of predictions with the lists of reference texts.
eval_results = rouge_evaluator.compute(predictions=predict_texts,
                                       references=reference_texts)

# Display the ROUGE evaluation results for the given texts.
print("Results for Multiple Texts:", eval_results)


Results for Multiple Texts: {'rouge1': 0.6693548387096775, 'rouge2': 0.4702194357366771, 'rougeL': 0.6276881720430108, 'rougeLsum': 0.6276881720430108}


### Observations:

1. **ROUGE-1 Score (0.669):**
   - **Meaning:** ROUGE-1 measures the overlap of unigrams (single words) between the predicted and reference texts. A score of 0.669 indicates a moderate level of overlap in individual words.
   - **Interpretation:** The predictions have a good amount of overlap with the reference texts in terms of individual words, but there is still room for improvement. This score reflects a reasonable level of accuracy in capturing the main terms and concepts.

2. **ROUGE-2 Score (0.470):**
   - **Meaning:** ROUGE-2 assesses the overlap of bigrams (pairs of consecutive words) between the prediction and the reference. A score of 0.470 suggests a lower overlap of bigrams compared to unigrams.
   - **Interpretation:** This indicates that while the individual words are reasonably well-matched, the connections between pairs of words are less well aligned. This suggests that the predictions may have some gaps in capturing the specific phrases or sequences present in the references.

3. **ROUGE-L Score (0.628):**
   - **Meaning:** ROUGE-L evaluates the longest common subsequence between the reference and the prediction texts. A score of 0.628 reflects a moderate overlap in longer sequences of words.
   - **Interpretation:** This score indicates a decent match in terms of the longer, meaningful sequences of words. The predictions align reasonably well with the reference texts in terms of maintaining the structure and coherence of longer text spans.

4. **ROUGE-Lsum Score (0.628):**
   - **Meaning:** ROUGE-Lsum is similar to ROUGE-L but is used in the context of summarization tasks. It measures the overlap of longest common subsequences in the context of summaries.
   - **Interpretation:** The score matches the ROUGE-L score, suggesting consistency in evaluating the longest common subsequences within the summaries. This reflects a reasonable level of alignment in summarizing the key content of the reference texts.

### Summary of Observations:

- **Overall Alignment:** The ROUGE-1 and ROUGE-L scores are relatively high, indicating that the predictions align well with the reference texts in terms of individual words and longer sequences. However, ROUGE-2 shows a lower score, suggesting that the predictions may not capture all the specific bigrams or key phrases as effectively.

- **Improvement Areas:** The lower ROUGE-2 score highlights that there is room for improvement in capturing more specific sequences of words or phrases. Enhancing the ability to match bigrams can lead to a more precise alignment with the reference texts.

- **Consistency:** The ROUGE-L and ROUGE-Lsum scores are consistent, reflecting that the summarization aligns well in terms of the longer sequences of text. This indicates that the predictions maintain coherence and key content relatively well.
