# Text Summarization using T5 in Python
This notebook demonstrates how to perform extractive summarization using the T5 (Text-to-Text Transfer Transformer) model from Hugging Face.

In [None]:
!pip install transformers

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load the pre-trained T5 model and tokenizer from Hugging Face
model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

# Define the text you want to summarize
text = '''
Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by animals including humans. 
Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize 
its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that 
mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".
As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. 
A quip in Tesler's Theorem says "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from things 
considered to be AI, having become a routine technology.
Modern machine capabilities generally classified as AI include successfully understanding human speech, competing at the highest level in strategic game systems 
(such as chess and Go), autonomously operating cars, intelligent routing in content delivery networks, and military simulations.
'''

# Preprocess the input text (add "summarize: " as a prompt for T5)
preprocessed_text = "summarize: " + text

# Tokenize the input
inputs = tokenizer.encode(preprocessed_text, return_tensors='pt', max_length=512, truncation=True)

# Generate the summary (you can tweak the max_length and num_beams for better summarization)
summary_ids = model.generate(inputs, max_length=150, num_beams=4, length_penalty=2.0, early_stopping=True)

# Decode and print the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:", summary)