# üìù Text Summarization with HuggingFace Transformers

This notebook demonstrates how to build a text summarization tool using pre-trained transformer models.

**What you'll learn:**
- How to use HuggingFace Transformers library
- Different pre-trained models for summarization
- Customizing summary length and quality
- Batch processing multiple texts

## 1. Installation

First, let's install the required libraries. Run this cell once:

In [1]:
%pip install transformers torch sentencepiece accelerate -q

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Import Libraries

Import all necessary libraries:

In [1]:
from transformers import pipeline
import torch
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully!")
print(f"üî• CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"   GPU: {torch.cuda.get_device_name(0)}")

‚úÖ Libraries imported successfully!
üî• CUDA Available: False


## 3. Initialize the Summarization Model

We'll use BART (Bidirectional and Auto-Regressive Transformers) which is excellent for summarization.

**Available models:**
- `facebook/bart-large-cnn` - Best quality (default)
- `t5-small` - Faster, lighter
- `t5-base` - Good balance
- `google/pegasus-xsum` - Extreme summarization

In [2]:
# Initialize the summarization pipeline
print("Loading model... (this may take a minute on first run)")

device = 0 if torch.cuda.is_available() else -1
summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    device=device
)

print("‚úÖ Model loaded successfully!")

Loading model... (this may take a minute on first run)



Device set to use cpu


‚úÖ Model loaded successfully!


## 4. Basic Summarization Example

Let's try summarizing a sample text:

In [3]:
# Sample text
sample_text = """
Artificial intelligence has made remarkable progress in recent years, 
transforming industries from healthcare to finance. Machine learning 
algorithms can now diagnose diseases, predict market trends, and even 
create art. Deep learning, a subset of machine learning, uses neural 
networks with multiple layers to process complex patterns in data. 
This technology has enabled breakthroughs in natural language processing, 
computer vision, and robotics. However, the rapid advancement of AI also 
raises important ethical questions about privacy, job displacement, and 
the need for responsible AI development. Researchers and policymakers 
are working together to ensure that AI benefits society while minimizing 
potential risks. The future of AI holds immense promise, but it requires 
careful consideration and collaborative effort from all stakeholders.
"""

print("üìÑ ORIGINAL TEXT:")
print("=" * 70)
print(sample_text.strip())
print(f"\nüìä Word count: {len(sample_text.split())} words")

üìÑ ORIGINAL TEXT:
Artificial intelligence has made remarkable progress in recent years, 
transforming industries from healthcare to finance. Machine learning 
algorithms can now diagnose diseases, predict market trends, and even 
create art. Deep learning, a subset of machine learning, uses neural 
networks with multiple layers to process complex patterns in data. 
This technology has enabled breakthroughs in natural language processing, 
computer vision, and robotics. However, the rapid advancement of AI also 
raises important ethical questions about privacy, job displacement, and 
the need for responsible AI development. Researchers and policymakers 
are working together to ensure that AI benefits society while minimizing 
potential risks. The future of AI holds immense promise, but it requires 
careful consideration and collaborative effort from all stakeholders.

üìä Word count: 117 words


In [4]:
# Generate summary
summary = summarizer(
    sample_text,
    max_length=60,
    min_length=20,
    do_sample=False
)

print("\n‚ú® SUMMARY:")
print("=" * 70)
print(summary[0]['summary_text'])
print(f"\nüìä Word count: {len(summary[0]['summary_text'].split())} words")


‚ú® SUMMARY:
Artificial intelligence has made remarkable progress in recent years. Machine learning can now diagnose diseases, predict market trends, and even create art. The rapid advancement of AI also raises important ethical questions about privacy and job displacement.

üìä Word count: 37 words


## 5. Experiment with Different Summary Lengths

You can control how long or short the summary should be:

In [5]:
# Short summary
short_summary = summarizer(
    sample_text,
    max_length=30,
    min_length=10
)

print("üìå SHORT SUMMARY:")
print(short_summary[0]['summary_text'])
print()

# Medium summary
medium_summary = summarizer(
    sample_text,
    max_length=80,
    min_length=30
)

print("üìå MEDIUM SUMMARY:")
print(medium_summary[0]['summary_text'])
print()

# Long summary
long_summary = summarizer(
    sample_text,
    max_length=130,
    min_length=50
)

print("üìå LONG SUMMARY:")
print(long_summary[0]['summary_text'])

üìå SHORT SUMMARY:
Artificial intelligence has made remarkable progress in recent years. Machine learning can now diagnose diseases, predict market trends, and even create art.

üìå MEDIUM SUMMARY:
Artificial intelligence has made remarkable progress in recent years. Machine learning can now diagnose diseases, predict market trends, and even create art. The rapid advancement of AI also raises important ethical questions about privacy and job displacement.

üìå LONG SUMMARY:
Artificial intelligence has made remarkable progress in recent years, transforming industries from healthcare to finance. The rapid advancement of AI also raises important ethical questions about privacy, job displacement, and the need for responsible AI development. Researchers and policymakers are working together to ensure that AI benefits society while minimizing potential risks.


## 6. Create a Reusable Summarization Function

Let's build a clean function you can reuse:

In [6]:
def summarize_text(text, length='medium', verbose=True):
    """
    Summarize text with different length options.
    
    Args:
        text: Input text to summarize
        length: 'short', 'medium', or 'long'
        verbose: Print details
    
    Returns:
        Summary text
    """
    length_params = {
        'short': {'max_length': 40, 'min_length': 10},
        'medium': {'max_length': 80, 'min_length': 30},
        'long': {'max_length': 130, 'min_length': 50}
    }
    
    params = length_params.get(length, length_params['medium'])
    
    if verbose:
        print(f"Summarizing with {length} length...")
    
    result = summarizer(text, **params, do_sample=False)
    summary = result[0]['summary_text']
    
    if verbose:
        print(f"‚úÖ Original: {len(text.split())} words")
        print(f"‚úÖ Summary: {len(summary.split())} words")
        print(f"‚úÖ Reduction: {100 - (len(summary.split())/len(text.split())*100):.1f}%")
    
    return summary

# Test the function
result = summarize_text(sample_text, length='medium')
print(f"\n{result}")

Summarizing with medium length...
‚úÖ Original: 117 words
‚úÖ Summary: 37 words
‚úÖ Reduction: 68.4%

Artificial intelligence has made remarkable progress in recent years. Machine learning can now diagnose diseases, predict market trends, and even create art. The rapid advancement of AI also raises important ethical questions about privacy and job displacement.


## 7. Batch Summarization

Process multiple texts at once (more efficient):

In [7]:
# Multiple texts
texts = [
    "Climate change is causing global temperatures to rise. Scientists warn that urgent action is needed to reduce carbon emissions and prevent catastrophic environmental damage.",
    "The stock market experienced significant volatility today. Major indices fell sharply in morning trading but recovered somewhat by the close.",
    "A new study shows that regular exercise improves mental health. Participants who exercised three times per week reported reduced anxiety and better mood."
]

# Batch summarize
summaries = summarizer(texts, max_length=30, min_length=10)

# Display results
for i, (original, summary) in enumerate(zip(texts, summaries), 1):
    print(f"\n{'='*70}")
    print(f"TEXT {i}:")
    print(original)
    print(f"\nSUMMARY {i}:")
    print(summary['summary_text'])

Your max_length is set to 30, but your input_length is only 28. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=14)
Your max_length is set to 30, but your input_length is only 24. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=12)
Your max_length is set to 30, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)



TEXT 1:
Climate change is causing global temperatures to rise. Scientists warn that urgent action is needed to reduce carbon emissions and prevent catastrophic environmental damage.

SUMMARY 1:
Climate change is causing global temperatures to rise. Scientists warn urgent action is needed to reduce carbon emissions.

TEXT 2:
The stock market experienced significant volatility today. Major indices fell sharply in morning trading but recovered somewhat by the close.

SUMMARY 2:
Major indices fell sharply in morning trading but recovered somewhat by the close.

TEXT 3:
A new study shows that regular exercise improves mental health. Participants who exercised three times per week reported reduced anxiety and better mood.

SUMMARY 3:
Study shows regular exercise improves mental health. Participants who exercised three times per week reported reduced anxiety and better mood.


## 8. Try Your Own Text!

Replace the text below with your own content:

In [11]:
# üëá Paste your text here
your_text = """Modern medicine has entered an era of innovation driven by technology and research. Artificial intelligence helps doctors detect diseases earlier through advanced image analysis and predictive modeling. Breakthroughs in genetic engineering allow scientists to develop personalized treatments tailored to an individual‚Äôs DNA, improving recovery rates and reducing side effects. Telemedicine has made healthcare accessible to remote areas, enabling patients to consult specialists without traveling. Despite these advancements, challenges such as data privacy, affordability, and equitable access to care remain. The future of medicine depends on balancing technological progress with ethical responsibility and global collaboration.
"""

# Generate summary
your_summary = summarize_text(your_text, length='medium')
print(f"\nüìù Your Summary:\n{your_summary}")

Summarizing with medium length...
‚úÖ Original: 94 words
‚úÖ Summary: 43 words
‚úÖ Reduction: 54.3%

üìù Your Summary:
Modern medicine has entered an era of innovation driven by technology and research. Despite these advancements, challenges such as data privacy, affordability, and equitable access to care remain. The future of medicine depends on balancing technological progress with ethical responsibility and global collaboration.


## 9. Advanced: Compare Different Models

Let's compare different summarization models:

In [9]:
# Test different models (optional - takes time to download)
models = {
    'BART': 'facebook/bart-large-cnn',
    'T5-Small': 't5-small',
}

test_text = """The Amazon rainforest, often called the lungs of the Earth, 
produces about 20% of the world's oxygen. However, deforestation rates have 
accelerated in recent years, threatening biodiversity and contributing to 
climate change. Conservation efforts are crucial to preserve this vital ecosystem."""

print("Comparing models...\n")

for name, model_name in models.items():
    print(f"\n{'='*70}")
    print(f"Model: {name}")
    try:
        model_pipeline = pipeline('summarization', model=model_name, device=device)
        result = model_pipeline(test_text, max_length=50, min_length=15)
        print(f"Summary: {result[0]['summary_text']}")
    except Exception as e:
        print(f"Error: {e}")

Comparing models...


Model: BART


Device set to use cpu


Summary: The Amazon rainforest, often called the lungs of the Earth, produces about 20% of the world's oxygen. However, deforestation rates have accelerated in recent years. Conservation efforts are crucial to preserve this vital ecosystem

Model: T5-Small


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu


Summary: deforestation rates have accelerated in recent years, threatening biodiversity and contributing to climate change . conservation efforts are crucial to preserve this vital ecosystem .


## 10. Save and Load Summaries

Export your summaries to a file:

In [10]:
import datetime

def save_summary(original_text, summary, filename=None):
    """Save summary to a text file."""
    if filename is None:
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"summary_{timestamp}.txt"
    
    with open(filename, 'w', encoding='utf-8') as f:
        f.write("ORIGINAL TEXT:\n")
        f.write("=" * 70 + "\n")
        f.write(original_text + "\n\n")
        f.write("SUMMARY:\n")
        f.write("=" * 70 + "\n")
        f.write(summary + "\n")
    
    print(f"‚úÖ Summary saved to: {filename}")
    return filename

# Example usage
# save_summary(sample_text, summary[0]['summary_text'])

## üìö Summary

You've learned how to:
- ‚úÖ Load and use pre-trained transformer models
- ‚úÖ Summarize text with different lengths
- ‚úÖ Batch process multiple texts
- ‚úÖ Compare different models
- ‚úÖ Save summaries to files

**Next Steps:**
- Try different models for your use case
- Fine-tune models on your own dataset
- Integrate into a web application
- Explore other NLP tasks (translation, question-answering, etc.)

**Resources:**
- [HuggingFace Transformers Docs](https://huggingface.co/docs/transformers)
- [Model Hub](https://huggingface.co/models?pipeline_tag=summarization)
- [Summarization Guide](https://huggingface.co/docs/transformers/tasks/summarization)