#🧪 Practical: Text Summarization using Pretrained T5 or BART

#We will:

Install and import required libraries

Load a pretrained T5/BART model and tokenizer

Define sample input text

Run summarization

Print and explain the result

#🔹 Step 1: Install Required Libraries

In [13]:
!pip install -q transformers torch


#🔹 Step 2: Import Required Libraries

In [14]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch


#🔹 Step 3: Load Pretrained Summarization Model
We'll use either T5 or BART. Here we show both — choose one.

🔸 Option A: T5 (t5-small)

In [15]:
# T5 model and tokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

#-------------------------------------------------------------------------------

#🔸 Option B: BART (facebook/bart-base)





In [21]:
# BART model and tokenizer
#model_name = "facebook/bart-base"
#tokenizer = AutoTokenizer.from_pretrained(model_name)
#model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

#-------------------------------------------------------------------------------

#🔹 Step 4: Define Long Input Text (Synthetic Article)

In [16]:
long_text = """
Artificial Intelligence (AI) is transforming the way businesses operate, enabling automation, enhanced decision-making, and improved customer experiences.
With advancements in machine learning, natural language processing, and computer vision, AI applications are growing rapidly across industries such as healthcare, finance, retail, and manufacturing.
Organizations are investing heavily in AI research and infrastructure to stay competitive.
However, ethical concerns, data privacy, and the need for transparency in AI systems remain important challenges to address.
"""


#🔹 Step 5: Prepare Input for Model
Each model has slightly different input formatting:

🔸 For T5:

In [17]:
# T5 expects a prefix "summarize: " before the text
input_text = "summarize: " + long_text


#-------------------------------------------------------------------------------

🔸 For BART:

In [22]:
# BART does not require a prefix
#input_text = long_text


#-------------------------------------------------------------------------------

In [19]:
# Tokenize input
inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)


#🔹 Step 6: Generate Summary


In [20]:
# Generate summary using beam search
summary_ids = model.generate(inputs, max_length=60, num_beams=4, early_stopping=True)

# Decode and print summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📄 Summary:\n", summary)


📄 Summary:
 AI applications are growing rapidly across industries such as healthcare, finance, retail, and manufacturing. ethical concerns, data privacy, and the need for transparency remain important challenges to address.
