Assessment: Building a Production-Ready Abstractive Summarizer

Model used: T5 (Text-to-Text Transfer Transformer)
Library used: Hugging Face Transformers
Use Case: QuickNews mobile notification summaries (2 sentences)

Task 1: Text-to-Text Data Pipeline (T5 Formatting)
 Load Long-Form Nigerian News Article
This is an Example article on Starlink’s expansion in Nigeria (3+ paragraphs).

In [None]:
news_article = """
Starlink, the satellite internet service owned by Elon Musk’s SpaceX, has continued its rapid expansion across Nigeria, 
bringing high-speed broadband connectivity to underserved and rural areas. The company, which launched in Nigeria in 2023, 
has gained significant adoption due to its ability to bypass traditional fiber infrastructure limitations.

According to industry analysts, Starlink’s presence has intensified competition in Nigeria’s broadband market, 
forcing local Internet Service Providers to reconsider pricing and service quality. While the Nigerian Communications Commission (NCC) 
has welcomed innovation, it has also emphasized the need for regulatory compliance and consumer protection.

Despite concerns over affordability and foreign dominance in critical infrastructure, many businesses and remote workers 
have praised Starlink for improving productivity and digital access. As Nigeria pushes forward with its digital economy agenda, 
the expansion of satellite-based internet could play a major role in bridging the country’s connectivity gap.
"""


Preprocessing Function:
using T5 requires a task prefix such as "summarize: "

In [3]:
import re

def preprocess_for_t5(text: str) -> str:
    """
    Cleans text and prepends T5 summarization prefix.
    """
    text = re.sub(r"\s+", " ", text) # this Replaces multiple whitespace with single space
    text = text.strip()
    return "summarize: " + text
preprocessed_article = preprocess_for_t5(news_article)
print(preprocessed_article)

summarize: Starlink, the satellite internet service owned by Elon Musk’s SpaceX, has continued its rapid expansion across Nigeria, bringing high-speed broadband connectivity to underserved and rural areas. The company, which launched in Nigeria in 2023, has gained significant adoption due to its ability to bypass traditional fiber infrastructure limitations. According to industry analysts, Starlink’s presence has intensified competition in Nigeria’s broadband market, forcing local Internet Service Providers to reconsider pricing and service quality. While the Nigerian Communications Commission (NCC) has welcomed innovation, it has also emphasized the need for regulatory compliance and consumer protection. Despite concerns over affordability and foreign dominance in critical infrastructure, many businesses and remote workers have praised Starlink for improving productivity and digital access. As Nigeria pushes forward with its digital economy agenda, the expansion of satellite-based int

Tokenization

In [7]:
from transformers import T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained("t5-small")

processed_text = preprocess_for_t5(news_article)

tokenized_inputs = tokenizer(
    processed_text,
    max_length=512,
    padding="max_length",
    truncation=True,
    return_tensors="pt"
)
print(tokenized_inputs)

{'input_ids': tensor([[21603,    10,  2042,  4907,     6,     8,  7605,  1396,   313,  4157,
            57,  1289,   106, 23763,    22,     7,  5844,     4,     6,    65,
          2925,   165,  3607,  5919,   640,  7904,     6,     3,  3770,   306,
            18,  9993, 19276, 12841,    12,   365,  3473,    15,    26,    11,
          5372,   844,     5,    37,   349,     6,    84,  3759,    16,  7904,
            16,   460,  2773,     6,    65,  6886,  1516,  9284,   788,    12,
           165,  1418,    12, 20720,  1435,  6433,  3620, 10005,     5,  2150,
            12,   681, 15639,     6,  2042,  4907,    22,     7,  3053,    65,
          9608,  3676,  2259,    16,  7904,    22,     7, 19276,   512,     6,
         19060,   415,  1284,  1387,  7740,    52,     7,    12, 27812,  5769,
            11,   313,   463,     5,   818,     8,  7904,    29, 11538,  3527,
            41,   567,  2823,    61,    65, 13001,  4337,     6,    34,    65,
            92,     3, 25472,     8,  

Before vs After

In [8]:
print("BEFORE (Raw Text):\n", news_article[:300], "...\n")
print("AFTER (With Prefix):\n", processed_text[:300], "...\n")
print("Tokenized Input IDs Shape:", tokenized_inputs["input_ids"].shape)


BEFORE (Raw Text):
 
Starlink, the satellite internet service owned by Elon Musk’s SpaceX, has continued its rapid expansion across Nigeria, 
bringing high-speed broadband connectivity to underserved and rural areas. The company, which launched in Nigeria in 2023, 
has gained significant adoption due to its ability to  ...

AFTER (With Prefix):
 summarize: Starlink, the satellite internet service owned by Elon Musk’s SpaceX, has continued its rapid expansion across Nigeria, bringing high-speed broadband connectivity to underserved and rural areas. The company, which launched in Nigeria in 2023, has gained significant adoption due to its abi ...

Tokenized Input IDs Shape: torch.Size([1, 512])


Task 2: Decoding Strategies

Load the Model

In [10]:
%pip install hf_xet

Collecting hf_xet
  Downloading hf_xet-1.2.0-cp37-abi3-win_amd64.whl.metadata (5.0 kB)
Downloading hf_xet-1.2.0-cp37-abi3-win_amd64.whl (2.9 MB)
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ------------------------


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-small")
model.eval()
summary_ids = model.generate(
    tokenized_inputs["input_ids"],
    attention_mask=tokenized_inputs["attention_mask"],
    max_length=150,
    num_beams=4,
    early_stopping=True
)

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


Strategy A: Greedy Search (Fast, Lower Quality)

In [12]:
summary_greedy_ids = model.generate(
    tokenized_inputs["input_ids"],
    max_length=60,
    num_beams=1
)

summary_greedy = tokenizer.decode(
    summary_greedy_ids[0],
    skip_special_tokens=True
)

print("Greedy Search Summary:\n", summary_greedy)


Greedy Search Summary:
 satellite internet service launched in 2023 in 2023. it has gained significant adoption due to its ability to bypass traditional fiber infrastructure limitations.


Strategy B: Beam Search (Higher Quality)

In [13]:
summary_beam_ids = model.generate(
    tokenized_inputs["input_ids"],
    max_length=60,
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

summary_beam = tokenizer.decode(
    summary_beam_ids[0],
    skip_special_tokens=True
)

print("Beam Search Summary:\n", summary_beam)


Beam Search Summary:
 satellite internet service has continued its rapid expansion across the country. the company has gained significant adoption due to its ability to bypass traditional fiber infrastructure limitations, according to industry analysts. despite concerns over affordability and foreign dominance in critical infrastructure, many businesses and remote workers have praised Starlink for


Brief Analysis:

The beam search summary was more coherent and informative, preserving key details without redundancy.
The greedy search output was faster but showed minor repetition and weaker sentence structure.
Beam search better balances fluency and coverage, making it more suitable for production summaries.

Task 3: Quantitative Evaluation Using ROUGE

my Gold Standard

In [14]:
reference_summary = (
    "Starlink’s expansion in Nigeria is boosting broadband access, intensifying competition among ISPs, "
    "and supporting the country’s digital economy despite regulatory and affordability concerns."
)


ROUGE Score Calculation

In [16]:
%pip install datasets rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py): started
  Building wheel for rouge_score (setup.py): finished with status 'done'
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=25027 sha256=57fc0df215cc1bd55d28aec7c48fd6b446ddfbae05f7bfbc58a1567bf5b4702b
  Stored in directory: c:\users\dell 7400\appdata\local\pip\cache\wheels\44\af\da\5ffc433e2786f0b1a9c6f458d5fb8f611d8eb332387f18698f
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2
Note: you may need to restart the kernel to use updated packages.


  DEPRECATION: Building 'rouge_score' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'rouge_score'. Discussion can be found at https://github.com/pypa/pip/issues/6334

[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [17]:
from rouge_score import rouge_scorer

scorer = rouge_scorer.RougeScorer(
    ["rouge1", "rougeL"],
    use_stemmer=True
)

scores = scorer.score(reference_summary, summary_beam)

print("ROUGE-1 Score:", scores["rouge1"])
print("ROUGE-L Score:", scores["rougeL"])


ROUGE-1 Score: Score(precision=0.2, recall=0.4, fmeasure=0.26666666666666666)
ROUGE-L Score: Score(precision=0.1, recall=0.2, fmeasure=0.13333333333333333)
