# **Hands-On Text Summarization with Pretrained Models üìùü§ñ**

## 1. Introduction üìò
Welcome to this hands-on guide to **Text Summarization**! 

In this notebook, we will explore how to automatically summarize text using state-of-the-art **Pretrained Transformer Models** from Hugging Face. You will learn how to load these models, process text, and generate concise summaries for news articles or stories.

**By the end of this notebook, you will understand:**
*   What pretrained models and transformers are.
*   The difference between extractive and abstractive summarization.
*   How to use libraries like `transformers` to perform NLP tasks.
*   How to implement and compare different models like **BART**, **Pegasus**, and **mT5**.

---
## 2. What Are Pretrained Models? üß†

### **The Era of Transfer Learning**
In the past, training a Neural Network for NLP required massive datasets and weeks of computing time. Today, we use **Transfer Learning**:
1.  **Pretraining**: A model is trained on a huge amount of text (like the entire internet) to learn the general rules of language (grammar, context, facts).
2.  **Fine-Tuning (or Inference)**: We take this "smart" model and use it for a specific task, like summarization.

### **Why Transformers?**
*   **Context Awareness**: Unlike older models (RNNs), Transformers can look at the *entire* sentence at once using a mechanism called **Self-Attention**.
*   **Parallel Processing**: They are faster to train and run.
*   **State-of-the-Art**: Models like BERT, GPT, and T5 are all based on the Transformer architecture.

---
## 3. What Is Text Summarization? ‚ú®

Text summarization is the task of shortening a text while keeping its main ideas. There are two main types:

| Type | Description | Analogy |
| :--- | :--- | :--- |
| **Extractive** | Selects and copies the most important sentences directly from the text. | Like highlighting key sentences with a marker. |
| **Abstractive** | Generates new sentences to rephrase the core meaning. It can use words not present in the original text. | Like a human rewriting a summary in their own words. |

**In this notebook, we will focus on Abstractive Summarization**, which is more powerful but harder to achieve than extractive methods. Transformers excel at this!


---
## 4. Hugging Face & Pretrained Models üöÄ

[Hugging Face](https://huggingface.co/) is the "GitHub of Machine Learning". It provides:
*   **The Hub**: A repository of thousands of pretrained models shared by the community (Google, Facebook, Microsoft, etc.).
*   **`transformers` Library**: A Python package that makes it incredibly easy to download and use these models.

We will be using three famous models today:
1.  **BART** (Facebook)
2.  **Pegasus** (Google)
3.  **mT5** (Multilingual T5)

---
## 5. Notebook Walkthrough
### üõ†Ô∏è Step 1: Install Dependencies
We need to install the `transformers` library to access the models and `torch` (PyTorch) as the underlying Deep Learning framework.

In [1]:
#!pip install transformers torch

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


### üìù Step 2: Define the Source Text
Here, we define a sample text that we want to summarize. This article talks about cultural differences in dining etiquette between the US and Denmark.

In [2]:
text = """
When Brooke Black and her Danish husband first lived together in the United States, she doesn‚Äôt recall their different dining habits ever really being a thing. It wasn‚Äôt until the 44-year-old mother of two moved to Denmark in 2020 that she became acutely aware that she didn‚Äôt use eating utensils like her husband ‚Äî or pretty much any of the Europeans around her.
Growing up in Illinois, Black says her mother only set their family dinner table with forks, unless there was something being served, such as steak, that warranted a knife to cut it.
‚ÄúI have not used a knife my whole life,‚Äù says Black, who shares cultural commentary about her daily life in Denmark on her Instagram account. While she jokes that she ‚Äústands by that a fork can also be a knife,‚Äù she never learned to eat in the ‚Äúzigzagging‚Äù manner of many Americans who will cut meat with the knife in their dominant hand before switching the fork back into that one to eat.
But in Denmark at family gatherings, with her fork held in her right hand from the get-go ‚Äî tines up ‚Äî and her knife largely untouched beside the plate, Black soon realized she stuck out.
"""

---
## 6. Models Used in This Notebook

### üîπ Model 1: BART (Facebook)
**BART (Bidirectional and Auto-Regressive Transformers)** is a model designed for text generation. 
*   **Architecture**: It combines the bidirectional encoder of BERT (good for understanding) with the autoregressive decoder of GPT (good for writing).
*   **Best For**: Summarization and translation.
*   **Why use it?**: It is robust and produces coherent, fluent English summaries.

### ‚öôÔ∏è Step 3: Load BART Model & Tokenizer
We load the `facebook/bart-large-cnn` model, which has been specifically fine-tuned on the CNN/Daily Mail dataset for news summarization.
*   **Tokenizer**: Converts text into numbers (tokens) the model can understand.
*   **Model**: The brain that performs the summarization.

In [3]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

BERT = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(BERT)
model = AutoModelForSeq2SeqLM.from_pretrained(BERT)

config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

2026-01-07 16:04:28.248266: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767801868.521418      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767801868.599296      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767801869.270739      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767801869.270807      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767801869.270809      55 computation_placer.cc:177] computation placer alr

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

### üß™ Step 4: Create the Summarization Function
We define a function `summarize` to handle the generation process.
*   **`tokenizer(...)`**: Prepares the text. `truncation=True` ensures we don't exceed the model's limit.
*   **`model.generate(...)`**:
    *   `num_beams=4`: Uses "Beam Search" to find the best possible sentence path, not just the first good word.
    *   `no_repeat_ngram_size=2`: Prevents the model from repeating the same 2-word phrases.
    *   `max_length`: Limits the summary size.

In [4]:
import re

def summarize(text, max_input_len=512, max_summary_len=100):
    text = re.sub(r'\s+', ' ', text.strip())  # clean spaces and newlines
    inputs = tokenizer(
        text, return_tensors="pt", truncation=True, max_length=max_input_len
    )
    output_ids = model.generate(
        inputs["input_ids"],
        max_length=max_summary_len,
        num_beams=4,
        no_repeat_ngram_size=2
    )
    summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return summary

### üöÄ Step 5: Run BART Summarization
Let's see what BART thinks is important in our text!

In [5]:
summary = summarize(text)
print("üìÑ Summary:\n", summary)

üìÑ Summary:
 Brooke Black moved to Denmark with her husband in 2020. She quickly realized she didn't use eating utensils like other Europeans. The 44-year-old mother of two shares cultural commentary about her daily life in Denmark on her Instagram account, @brookeblackdanes.


---
### üîπ Model 2: Pegasus (Google)
**Pegasus (Pre-training with Extracted Gap-sentences for Abstractive SUmmarization)** was designed explicitly for summarization.
*   **Unique Training**: During training, entire important sentences were removed from documents, and the model had to generate them.
*   **Strength**: It typically achieves higher performance on summarization benchmarks compared to general-purpose models.

### ‚öôÔ∏è Step 6: Load Pegasus Model
We are using `google/pegasus-cnn_dailymail`, also fine-tuned for news.

In [6]:
model_name_pegasus = "google/pegasus-cnn_dailymail"
tokenizer_pegasus = AutoTokenizer.from_pretrained(model_name_pegasus)
model_pegasus = AutoModelForSeq2SeqLM.from_pretrained(model_name_pegasus)

tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-cnn_dailymail and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

### üß™ Step 7: Define Pegasus Function
Similar to the BART function, but using the Pegasus tokenizer and model instances.

In [7]:
def summarize_pegasus(text, max_input_len=512, max_summary_len=100):
    import re
    text = re.sub(r'\s+', ' ', text.strip())
    inputs = tokenizer_pegasus(text, return_tensors="pt", truncation=True, max_length=max_input_len)
    output_ids = model_pegasus.generate(inputs["input_ids"], max_length=max_summary_len, num_beams=4, no_repeat_ngram_size=2)
    summary = tokenizer_pegasus.decode(output_ids[0], skip_special_tokens=True)
    return summary

### üöÄ Step 8: Run Pegasus Summarization
Compare this result with BART. Is it more concise? Does it capture different details?

In [8]:
# Example
print("üìÑ PEGASUS Summary:\n", summarize_pegasus(text))

üìÑ PEGASUS Summary:
 Brooke Black and her Danish husband first lived together in the United States in 2020 .<n>It wasn't until the 44-year-old mother of two moved to Denmark that she became acutely aware she didn‚Äôt use eating utensils like her husband ‚Äî or pretty much any of the Europeans around her <n>Growing up in Illinois, Black says her mother only set their family dinner table with forks, unless there was something being served, such as steak, that warranted a knife to cut it


---
### üîπ Model 3: mT5 (Multilingual T5)
**mT5 (Multilingual Text-to-Text Transfer Transformer)** is a massive model trained on 101 languages!
*   **Universal**: It treats every NLP problem as a "text-to-text" task.
*   **Multilingual**: Unlike BART or Pegasus (which are mostly English-focused), mT5 can handle Arabic, French, Chinese, etc.
*   **XLSum**: We are using a version fine-tuned on the XLSum dataset, covering 45 languages.

### ‚öôÔ∏è Step 9: Load mT5 Model
We load `csebuetnlp/mT5_multilingual_XLSum`.

In [9]:
model_name_mt5 = "csebuetnlp/mT5_multilingual_XLSum"
tokenizer_mt5 = AutoTokenizer.from_pretrained(model_name_mt5)
model_mt5 = AutoModelForSeq2SeqLM.from_pretrained(model_name_mt5)

tokenizer_config.json:   0%|          | 0.00/375 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/730 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

### üß™ Step 10: Define mT5 Function
Notice we allow a slightly longer summary length here (`max_summary_len=250`) as mT5 might be more verbose or handle different languages differently.

In [10]:
def summarize_mt5(text, max_input_len=512, max_summary_len=250):
    import re
    text = re.sub(r'\s+', ' ', text.strip())
    inputs = tokenizer_mt5(text, return_tensors="pt", truncation=True, max_length=max_input_len)
    output_ids = model_mt5.generate(inputs["input_ids"], max_length=max_summary_len, num_beams=4, no_repeat_ngram_size=2)
    summary = tokenizer_mt5.decode(output_ids[0], skip_special_tokens=True)
    return summary

### üöÄ Step 11: Run mT5 Summarization
Even on English text, mT5 performs well. Try replacing the `text` variable with **Arabic** text to see its true power!

In [11]:
# Example (can use Arabic text too)
print("üìÑ mT5 Summary:\n", summarize_mt5(text))

üìÑ mT5 Summary:
 When Brooke Black moved to Denmark in 2020, she decided not to use a knife.


In [12]:
!pip install gradio

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting pydantic<2.12,>=2.0 (from gradio)
  Downloading pydantic-2.11.10-py3-none-any.whl.metadata (68 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m68.6/68.6 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pydantic-core==2.33.2 (from pydantic<2.12,>=2.0->gradio)
  Downloading pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Downloading pydantic-2.11.10-py3-none-any.whl (444 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m444.8/444.8 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hDownloading pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

## **7. Models Deployment üöÄ**

### üöÄ What is Gradio?

**Gradio** is an open-source Python library that allows us to build simple and interactive web interfaces for Machine Learning models with just a few lines of code.

Instead of running models only in the notebook or terminal, Gradio enables users to:
- Enter text through a web interface ‚úçÔ∏è
- Run the model interactively ‚öôÔ∏è
- View predictions instantly üìä

### Why use Gradio?
- No frontend experience required
- Very fast to prototype ML demos
- Perfect for showcasing models to non-technical users
- Widely used in ML demos and Hugging Face Spaces

In this notebook, we use Gradio to:
- Accept user input text
- Generate summaries using multiple pretrained models
- Display results clearly in a table format


In [1]:
import re
import pandas as pd
import gradio as gr
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from datetime import datetime

### ü§ñ Loading Multiple Pretrained Summarization Models

We load **multiple pretrained text summarization models** from Hugging Face to allow **model comparison**.

In [2]:
MODELS = {
    "BART (English)": "facebook/bart-large-cnn",
    "PEGASUS (English - News)": "google/pegasus-cnn_dailymail",
    "mT5 (Multilingual)": "csebuetnlp/mT5_multilingual_XLSum"
}

tokenizers = {}
models = {}

for name, path in MODELS.items():
    tokenizers[name] = AutoTokenizer.from_pretrained(path)
    models[name] = AutoModelForSeq2SeqLM.from_pretrained(path)

config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

2026-01-07 18:12:05.898456: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767809526.203532      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767809526.293861      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767809527.073508      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767809527.073565      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767809527.073569      55 computation_placer.cc:177] computation placer alr

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-cnn_dailymail and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/375 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/730 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

### üìä Creating a Results DataFrame

We create an **empty DataFrame** to store all summarization results generated by different models.

In [3]:
results_df = pd.DataFrame(
    columns=["Timestamp", "Model", "Summary"]
)

### üß† Summarization and Result Storage Function

We define the core function that powers our application:
**`summarize_and_store()`**

This function is responsible for:
- Generating a summary using a selected pretrained model
- Cleaning and post-processing the output
- Saving the result in a shared DataFrame
- Returning both the summary and the updated table to the user


In [4]:
def summarize_and_store(text, model_name):
    global results_df

    if not text.strip():
        return "‚ö†Ô∏è Please enter some text.", results_df

    text_clean = re.sub(r'\s+', ' ', text.strip())

    tokenizer = tokenizers[model_name]
    model = models[model_name]

    inputs = tokenizer(
        text_clean,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )

    output_ids = model.generate(
        inputs["input_ids"],
        max_length=100,
        num_beams=4,
        no_repeat_ngram_size=2
    )

    summary = tokenizer.decode(
        output_ids[0],
        skip_special_tokens=True
    )

    summary = re.sub(r'<n>', ' ', summary)
    summary = re.sub(r'\s+', ' ', summary).strip()

    new_row = {
        "Timestamp": datetime.now().strftime("%H:%M:%S"),
        "Model": model_name,
        "Summary": summary
    }

    results_df = pd.concat(
        [results_df, pd.DataFrame([new_row])],
        ignore_index=True
    )

    return summary, results_df

### üñ•Ô∏è Building the Gradio User Interface

In this cell, we create an **interactive web interface** using **Gradio**

In [5]:
interface = gr.Interface(
    fn=summarize_and_store,
    inputs=[
        gr.Textbox(
            lines=10,
            placeholder="‚úçÔ∏è Paste your text here...",
            label="Input Text"
        ),
        gr.Dropdown(
            choices=list(MODELS.keys()),
            value="PEGASUS (English - News)",
            label="Choose Model"
        )
    ],
    outputs=[
        gr.Textbox(
            lines=4,
            label="üìÑ Generated Summary"
        ),
        gr.Dataframe(
            label="üìä All Generated Summaries",
            interactive=False
        )
    ],
    title="üìù Text Summarization with Pretrained Models",
    description=(
        "Generate summaries using different pretrained models.\n\n"
        "‚¨áÔ∏è All results appear in the table below for easy comparison."
    )
)

interface.launch()

* Running on local URL:  http://127.0.0.1:7860
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://d968d8a369af17eccd.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




---
## 8. Conclusion üéì

Congratulations! You have successfully built a text summarization pipeline using three different Transformer models.

**Key Takeaways:**
*   **Hugging Face** makes it easy to swap between powerful models with just a few lines of code.
*   **Abstractive Summarization** creates human-like summaries by rewriting text.
*   **Different Models, Different Flavors**: BART is great for general English, Pegasus is specialized for news, and mT5 is the go-to for multilingual tasks.

**Next Steps:**
*   **Try your own text**: Copy a news article or a paragraph from a book.
*   **Evaluation**: Learn about **ROUGE scores** to mathematically measure how good a summary is.
*   **Fine-Tuning**: Train these models on your own specific dataset (e.g., medical reports, legal assumptions).
