# Text Summarization Inference with T5
## Introduction
In this notebook, we will use the pre-trained T5 model to summarize real-world text data.
We will:
- Load the model and tokenizer.
- Test summarization on various types of long text.
- Experiment with different decoding strategies to improve results.

## Step 1: Load Model & Tokenizer
Before performing inference, we need to load the pre-trained T5 model.


In [1]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

## Step 1: Load Model & Tokenizer
We start by loading the pre-trained **T5-small** model and its tokenizer. 

- `T5Tokenizer.from_pretrained("t5-small")` → Loads the tokenizer.
- `T5ForConditionalGeneration.from_pretrained("t5-small")` → Loads the summarization model.
- We use **PyTorch tensors** (`return_tensors="pt"`) for model compatibility.


In [2]:
# Load the tokenizer and model
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Print Model Info
print(f"Model '{model_name}' loaded successfully.")
print(f"Running on: {device}")


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Model 't5-small' loaded successfully.
Running on: cpu


In [3]:
input_text = """summarize: Green Arrow is a superhero who appears in American comic books published by DC Comics. Created by Mort Weisinger and designed by George Papp, he first appeared in More Fun Comics No. 73 on September 19, 1941 (cover dated November 1941), the same issue that debuted Aquaman. His real name is Oliver Jonas Queen, a wealthy businessman, owner of Queen Industries, and a well-known celebrity in Star City. He uses this position to hide the fact that he is Green Arrow.[1] Partly inspired by Robin Hood, Green Arrow is an archer who uses his skills to fight crime in his home cities of Star City and Seattle, as well as alongside his fellow superheroes as a member of the Justice League. The world's greatest archer, as well as a competent swordsman and martial artist, Green Arrow deploys a range of trick arrows (in contemporary times, they are referred as "specialty arrows"[2]) with various special functions, such as glue, explosive-tipped, grappling hook, flash grenade, tear gas, and even kryptonite arrows for use in a range of special situations.

Green Arrow enjoyed moderate success in his early years, becoming the cover feature of More Fun, as well as having occasional appearances in other comics. Throughout his first twenty-five years, however, the character never enjoyed greater popularity. In the late 1960s, writer Denny O'Neil, inspired by the character's dramatic visual redesign by Neal Adams, chose to have him lose his fortune, giving him the then-unique role of a streetwise crusader for the working class and the disadvantaged. In 1970, he was paired with a more law and order-oriented hero, Green Lantern, in a ground-breaking, socially conscious comic book series.[3] Since then, he has been popular among comic book fans and most writers have taken an urban, gritty approach to the character. Oliver Queen was killed off in the 1990s and replaced by a new character, Oliver's son Connor Hawke. Connor, however, proved a less popular character, and the original Oliver Queen character was resurrected in the 2001 "Quiver" storyline, by writer Kevin Smith. In the 2000s, the character has been featured in bigger storylines focusing on Green Arrow and Black Canary, such as the DC event The Green Arrow/Black Canary Wedding and the high-profile Justice League: Cry for Justice storyline, prior to the character's relaunch alongside most of DC's properties in 2011.

Green Arrow was not initially a well-known character outside of comic book fandom: He had appeared in a single episode of the animated series Super Friends in 1973. In the 2000s, the character appeared in a number of DC television properties, including the animated series Justice League Unlimited, Young Justice, The Batman and Batman: The Brave and the Bold, and several DC Universe Animated Original Movies. In live action, he appeared in the series Smallville, played by actor Justin Hartley, and became a core cast member. In 2012, the live action series Arrow debuted on The CW, in which the title character was portrayed by Stephen Amell, and launching several spin-off series, becoming the starting point for a shared television franchise called the Arrowverse."""

In [4]:
input_tokens = tokenizer(input_text, return_tensors="pt")
print(input_tokens)

Token indices sequence length is longer than the specified maximum sequence length for this model (751 > 512). Running this sequence through the model will result in indexing errors


{'input_ids': tensor([[21603,    10,  1862, 25810,    19,     3,     9, 23586,   113,  3475,
            16,   797,  7967,  1335,  1790,    57,  5795, 15175,     7,     5,
          6357,    26,    57, 19729,   101,  4890,    49,    11,   876,    57,
          3080,   276,  3096,     6,     3,    88,   166,  4283,    16,  1537,
          9259, 15175,     7,   465,     5,     3,  4552,    30,  1600, 12370,
         24822,    41,  9817,     3, 14134,  1671, 24822,   201,     8,   337,
           962,    24,    20, 29261, 11154,   348,     5,   978,   490,   564,
            19, 15865,  8178,     9,     7,  5286,     6,     3,     9, 18407,
           268,   348,     6,  2527,    13,  5286, 18080,     6,    11,     3,
             9,   168,    18,  5661, 17086,    16,  2042,   896,     5,   216,
          2284,    48,  1102,    12,  7387,     8,   685,    24,     3,    88,
            19,  1862, 25810,     5,  6306,   536,   908,  2733,   120,  3555,
            57, 14059, 19804,     6,  

## Step 3: Generating a Summary
We pass the tokenized input to `model.generate()` to create a summary. 

### **Decoding Strategies Used:**
- `max_length=200` → Limits the length of the summary.
- `num_beams=20` → Uses beam search for better-quality outputs.
- `no_repeat_ngram_size=2` → Prevents repeated phrases.
- `early_stopping=True` → Stops generation when output seems complete.


In [10]:
output_ids = model.generate(
    input_ids = input_tokens["input_ids"],
    attention_mask = input_tokens["attention_mask"],
    max_length = 200,
    num_beams = 20,
    early_stopping = True,
    no_repeat_ngram_size=2
)


summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(summary)

green Arrow is an archer who uses his skills to fight crime in his home cities. he was paired with a more law and order-oriented hero, Green Lantern, in 1970s, resurrected in the 2001 "Quiver" storyline. in 2012, the live action series Arrow debuted on The CW.
