# Text Summarization using


## 1. Setup & Installation

In [3]:
!pip install -q transformers datasets torch pandas accelerate sentencepiece

In [4]:
# Import libraries
import torch
from transformers import pipeline
from datasets import load_dataset
import pandas as pd
import time
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Setup complete!")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"Device: {'GPU' if torch.cuda.is_available() else 'CPU'}")

‚úÖ Setup complete!
PyTorch: 2.9.0+cu126
CUDA Available: True
Device: GPU


## 2. Model Selection & Justification

### Selected Model: **facebook/bart-large-cnn** (BART-CNN)

**Why BART-CNN?**
- Produces informative 3-4 sentence summaries (vs PEGASUS-XSum's 1 sentence)
- Trained on CNN/DailyMail dataset, ideal for news articles

**Comparison:**
| Criterion | BART-CNN | PEGASUS-XSum | LED-16K |
|-----------|----------|--------------|----------|
| Output | 3-4 sentences | 1 sentence | 3-4 sentences |
| Business Value | High | Low | Medium |
| Industry Use | Standard | Niche | Specialized |

**Decision Rationale:**
BART-CNN is the industry standard for news summarization. It produces informative 3-4 sentence summaries suitable for business use, making it more valuable than PEGASUS-XSum's single-sentence output.

In [5]:
# Model configurations
MODELS = {
    "BART-CNN": {
        "name": "facebook/bart-large-cnn",
        "max_len": 142,
        "min_len": 56,
        "description": "Production standard - informative 3-4 sentence summaries"
    },
    "PEGASUS-XSum": {
        "name": "google/pegasus-xsum",
        "max_len": 64,
        "min_len": 10,
        "description": "Academic - single sentence extreme summarization"
    },
    "LED-16K": {
        "name": "allenai/led-base-16384",
        "max_len": 142,
        "min_len": 56,
        "description": "Specialized - handles very long documents"
    }
}

print("üìä Available Models:")
for name, config in MODELS.items():
    print(f"\n{name}:")
    print(f"  Model: {config['name']}")
    print(f"  Description: {config['description']}")

üìä Available Models:

BART-CNN:
  Model: facebook/bart-large-cnn
  Description: Production standard - informative 3-4 sentence summaries

PEGASUS-XSum:
  Model: google/pegasus-xsum
  Description: Academic - single sentence extreme summarization

LED-16K:
  Model: allenai/led-base-16384
  Description: Specialized - handles very long documents


## 3. Data Loading & Preprocessing

### Library Choices and Justification:

**`datasets` (HuggingFace)**
- Native integration with models/tokenizers
- Efficient lazy loading and caching
- Direct XSum dataset access
- *Alternative:* Manual CSV/JSON - rejected (too complex)

**`pandas`**
- Structured data operations and analysis
- Easy DataFrame filtering/transformation
- *Alternative:* NumPy - rejected (lacks structured data)

**`torch` (PyTorch)**
- Required backend for HuggingFace transformers
- Automatic GPU acceleration
- *Alternative:* TensorFlow - rejected (HF defaults to PyTorch)

In [6]:
# Load XSum dataset
print("üì• Loading XSum dataset...")
dataset = load_dataset("xsum", split="test")
samples = dataset.select(range(50))  # 50 samples for demo

print(f"‚úÖ Loaded {len(samples)} samples")
print(f"\nDataset structure: {samples.column_names}")
print(f"First example keys: {samples[0].keys()}")

üì• Loading XSum dataset...


'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 2f00601d-91a7-40dc-b49d-a9e6440fe800)')' thrown while requesting HEAD https://huggingface.co/datasets/xsum/resolve/7d4d486c2f8ef850b1a11aead99b894ff3dd7da9/dataset_infos.json
Retrying in 1s [Retry 1/5].


‚úÖ Loaded 50 samples

Dataset structure: ['document', 'summary', 'id']
First example keys: dict_keys(['document', 'summary', 'id'])


In [7]:
# Inspect sample data
print("üìä Sample Data Inspection:\n")
print("=" * 80)

for i in range(3):
    print(f"\nExample {i+1}:")
    print(f"Article (first 200 chars): {samples[i]['document'][:200]}...")
    print(f"XSum Reference: {samples[i]['summary']}")
    print("-" * 80)

üìä Sample Data Inspection:


Example 1:
Article (first 200 chars): Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.
Workers at the charity claim investment in housing...
XSum Reference: There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.
--------------------------------------------------------------------------------

Example 2:
Article (first 200 chars): Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday.
Detectives said three firearms, ammunition and a five-figure sum of money were recovered.
A 26-yea...
XSum Reference: A man has appeared in court after firearms, ammunition and cash were seized by police in Edinburgh.
--------------------------------------------------------------------------------

Example 3:
Article (first 200 chars): Jordan Hill, Brittany Covington and Tesfaye Cooper, all

## 4. Pipeline Implementation

### Pipeline Abstraction

Used HuggingFace `pipeline()` to abstract model complexities:

```
Input Text ‚Üí Tokenization ‚Üí Model Inference ‚Üí Decoding ‚Üí Summary Output
```

**Complexities Abstracted:**
1. **Tokenization:** Special tokens, padding, truncation
2. **Model Loading:** Weights, config, tokenizer in single call
3. **Device Management:** CPU/GPU auto-detection
4. **Decoding:** Token IDs ‚Üí clean text output
5. **Batch Processing:** Automatic batching with attention masks

**Implementation:**
```python
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
result = summarizer(text, max_length=142, min_length=56, do_sample=False)
```

**Benefits:** Reduces ~50 lines to 1 line, battle-tested, easy model switching

In [8]:
# Initialize BART-CNN (Primary Model)
print("üöÄ Loading BART-CNN model...")
print("‚ö†Ô∏è  First run downloads ~2GB - subsequent runs are fast\n")

device = 0 if torch.cuda.is_available() else -1

bart_summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    device=device
)

print("‚úÖ BART-CNN ready!")
print(f"   Device: {'GPU' if device == 0 else 'CPU'}")

üöÄ Loading BART-CNN model...
‚ö†Ô∏è  First run downloads ~2GB - subsequent runs are fast



Device set to use cuda:0


‚úÖ BART-CNN ready!
   Device: GPU


## 5. Single Text Summarization

In [9]:
# Test with first example
import torch
from transformers import pipeline

# Check if model exists, reload if necessary
if 'bart_summarizer' not in globals():
    print("üîÑ Reloading BART-CNN model...")
    device = 0 if torch.cuda.is_available() else -1
    bart_summarizer = pipeline(
        "summarization",
        model="facebook/bart-large-cnn",
        device=device
    )

test_text = samples[0]['document']
xsum_ref = samples[0]['summary']

print("üìÑ Input Article (first 400 chars):")
print("=" * 80)
print(test_text[:400] + "...\n")

# Generate summary
print("‚è≥ Generating summary...")
start = time.time()

# Added truncation=True to prevent index out of bounds errors
result = bart_summarizer(
    test_text,
    max_length=142,
    min_length=56,
    do_sample=False,
    truncation=True
)

elapsed = time.time() - start
summary = result[0]['summary_text']

print("\n" + "=" * 80)
print("üìù BART-CNN Summary (3-4 sentences):")
print("=" * 80)
print(summary)

print("\n" + "=" * 80)
print("üéØ XSum Reference (1 sentence):")
print("=" * 80)
print(xsum_ref)

print("\n" + "=" * 80)
print("üìä Metrics:")
print("=" * 80)
print(f"Input length: {len(test_text.split())} words")
print(f"BART summary: {len(summary.split())} words")
print(f"XSum reference: {len(xsum_ref.split())} words")
print(f"Inference time: {elapsed:.2f} seconds")
print(f"Compression ratio: {len(summary.split())/len(test_text.split())*100:.1f}%")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


üìÑ Input Article (first 400 chars):
Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.
Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.
The Welsh Government said more people than ever were getting help to address housing problems.
Changes to the Housing Act in Wales, introduced...

‚è≥ Generating summary...

üìù BART-CNN Summary (3-4 sentences):
Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. Welsh Government said more people than ever were getting help to address housing problems.

üéØ XSum Reference (1 sentence):
There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.

üìä Metri

## 6. Model Comparison (BART vs PEGASUS vs LED)

This demonstrates understanding of different model characteristics.

In [10]:
# Prepare for sequential comparison (Save Memory)
import gc
import torch

print("üîÑ Preparing for model comparison...")
print("To save GPU memory, we will load models sequentially in the next cell rather than all at once.")

# Clear GPU memory from previous cells to avoid OOM
if 'bart_summarizer' in globals():
    print("Unloading initial BART model to free space...")
    del bart_summarizer

if 'models_loaded' in globals():
    del models_loaded

try:
    torch.cuda.empty_cache()
    gc.collect()
    print("‚úÖ Memory cleared. Ready for sequential comparison.")
except Exception as e:
    print(f"‚ö†Ô∏è Memory clear failed: {e}")
    print("‚ùó CRITICAL: The GPU is in a bad state (Device-side assert). You MUST restart the runtime (Runtime > Restart session) to continue.")

üîÑ Preparing for model comparison...
To save GPU memory, we will load models sequentially in the next cell rather than all at once.
Unloading initial BART model to free space...
‚úÖ Memory cleared. Ready for sequential comparison.


In [11]:
# Compare on same text using Sequential Loading
comparison_results = []

print("‚öñÔ∏è  Comparing Models on Same Article (Sequential Load/Unload)\n")
print("=" * 80)

for model_name, config in MODELS.items():
    print(f"\n{model_name}:")
    print("-" * 80)

    try:
        print(f"Loading {model_name}...")
        # Load model just-in-time
        current_model = pipeline(
            "summarization",
            model=config['name'],
            device=device
        )

        start = time.time()
        # Added truncation=True to prevent CUDA asserts on long docs
        result = current_model(
            test_text,
            max_length=config['max_len'],
            min_length=config['min_len'],
            do_sample=False,
            truncation=True
        )
        elapsed = time.time() - start

        summary = result[0]['summary_text']

        print(f"Summary: {summary}")
        print(f"Length: {len(summary.split())} words")
        print(f"Time: {elapsed:.2f}s")

        comparison_results.append({
            'Model': model_name,
            'Summary': summary,
            'Words': len(summary.split()),
            'Time (s)': f"{elapsed:.2f}",
            'Description': config['description']
        })

        # IMMEDIATE CLEANUP to save memory
        del current_model
        torch.cuda.empty_cache()
        gc.collect()

    except Exception as e:
        print(f"‚ùå Error during summarization with {model_name}: {e}")
        comparison_results.append({
            'Model': model_name,
            'Summary': 'Error: Could not generate summary',
            'Words': 0,
            'Time (s)': 'N/A',
            'Description': config['description']
        })

print("\n" + "=" * 80)

‚öñÔ∏è  Comparing Models on Same Article (Sequential Load/Unload)


BART-CNN:
--------------------------------------------------------------------------------
Loading BART-CNN...


Device set to use cuda:0
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Summary: Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. Welsh Government said more people than ever were getting help to address housing problems.
Length: 54 words
Time: 1.26s

PEGASUS-XSum:
--------------------------------------------------------------------------------
Loading PEGASUS-XSum...


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


Summary: More than 1,000 homeless people were referred to a charity in Wales last year.
Length: 14 words
Time: 0.48s

LED-16K:
--------------------------------------------------------------------------------
Loading LED-16K...


Device set to use cuda:0
Input ids are automatically padded from 748 to 1024 to be a multiple of `config.attention_window`: 1024
Both `max_new_tokens` (=256) and `max_length`(=142) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Summary: Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.Image copyright PAWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.Image copyright PAThe Welsh Government said more people than ever were getting help to address housing problems.Changes to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.Image copyright PAPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.However, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.Image copyright PAAndrew Stevens, who works in Welsh prison

In [12]:
# Display comparison table
comparison_df = pd.DataFrame(comparison_results)

print("\nüìä Model Comparison Table:")
print("=" * 80)
print(comparison_df.to_string(index=False))

print("\nüí° Key Insights:")
print("=" * 80)
print("BART-CNN: Longest, most informative (production choice)")
print("‚ö° PEGASUS-XSum: Shortest, single sentence (headline style)")
print("LED-16K: Similar to BART, better for very long docs")


üìä Model Comparison Table:
       Model                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

## 7. Batch Processing

Demonstrates efficient processing of multiple documents.

In [18]:
# Batch processing with BART
import torch
import gc
import time
from transformers import pipeline, AutoTokenizer

print("üì¶ Batch Processing Demo\n")

# Flag to track if we need to fallback to CPU due to GPU errors
force_cpu = False

# 1. Try to clear GPU memory to check health
try:
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        gc.collect()
except Exception as e:
    print(f"‚ö†Ô∏è GPU Error detected: {e}")
    print("‚ö†Ô∏è GPU appears corrupted. Switching to CPU mode to continue (slower but stable)...")
    force_cpu = True

# 2. Determine Device (CPU fallback if GPU is broken)
device = 0 if (torch.cuda.is_available() and not force_cpu) else -1
print(f"üîß Running on: {'GPU' if device == 0 else 'CPU'}")

# 3. Reload Model with Explicit Tokenizer
print(f"üîÑ (Re)Loading BART-CNN model and tokenizer on {'GPU' if device==0 else 'CPU'}...")
try:
    # Load tokenizer explicitly to ensure max_length is set
    tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
    # Enforce BART's hard limit
    tokenizer.model_max_length = 1024

    # Clean up old model variable if it exists
    if 'bart_summarizer' in globals():
        del bart_summarizer
        gc.collect()

    bart_summarizer = pipeline(
        "summarization",
        model="facebook/bart-large-cnn",
        tokenizer=tokenizer,
        device=device
    )
except Exception as e:
    print(f"‚ùå Primary Load Failed: {e}")
    print("‚ö†Ô∏è Force-switching to CPU...")
    device = -1

    # Reload on CPU
    tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
    tokenizer.model_max_length = 1024

    bart_summarizer = pipeline(
        "summarization",
        model="facebook/bart-large-cnn",
        tokenizer=tokenizer,
        device=-1
    )

# Get 10 documents
batch_docs = [samples[i]['document'] for i in range(10)]
batch_refs = [samples[i]['summary'] for i in range(10)]

print(f"\nProcessing {len(batch_docs)} documents...")

start = time.time()

try:
    batch_results = bart_summarizer(
        batch_docs,
        max_length=142,
        min_length=56,
        do_sample=False,
        batch_size=1,
        truncation=True
    )
    total_time = time.time() - start

    batch_summaries = [r['summary_text'] for r in batch_results]

    print("‚úÖ Batch processing complete!\n")
    print(f"Total time: {total_time:.2f}s")
    print(f"Avg per document: {total_time/len(batch_docs):.2f}s")
    print(f"Throughput: {len(batch_docs)/total_time:.2f} docs/second")

except Exception as e:
    print(f"‚ùå Processing failed: {e}")
    print("Tips: Check if tokenizer.model_max_length is set correctly.")

üì¶ Batch Processing Demo

‚ö†Ô∏è GPU Error detected: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

‚ö†Ô∏è GPU appears corrupted. Switching to CPU mode to continue (slower but stable)...
üîß Running on: CPU
üîÑ (Re)Loading BART-CNN model and tokenizer on CPU...


Device set to use cpu



Processing 10 documents...


Your max_length is set to 142, but your input_length is only 62. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=31)
Your max_length is set to 142, but your input_length is only 66. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=33)


‚úÖ Batch processing complete!

Total time: 55.83s
Avg per document: 5.58s
Throughput: 0.18 docs/second


In [19]:
# Display batch results
if 'batch_summaries' in locals() and 'batch_refs' in locals() and 'batch_docs' in locals():
    batch_df = pd.DataFrame({
        'Article': [d[:100] + '...' for d in batch_docs[:5]],
        'BART Summary': [s[:100] + '...' for s in batch_summaries[:5]],
        'XSum Ref': batch_refs[:5],
        'BART Words': [len(s.split()) for s in batch_summaries[:5]],
        'Ref Words': [len(r.split()) for r in batch_refs[:5]]
    })

    print("\nüìã Sample Results (first 5):")
    print("=" * 80)
    print(batch_df.to_string(index=False))
else:
    print("‚ö†Ô∏è Batch results not available. Please run the Batch Processing cell successfully.")


üìã Sample Results (first 5):
                                                                                                 Article                                                                                            BART Summary                                                                                                                                             XSum Ref  BART Words  Ref Words
 Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up... Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up...                                                      There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.          54         17
 Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday... Officers searched properties in the Waterfront Park and Colonsay View areas of the city. Detectives .

## 8. Performance Analysis

In [20]:
# Calculate metrics
from statistics import mean

if 'batch_summaries' in locals() and batch_summaries:
    metrics = {
        'Average BART Length': mean([len(s.split()) for s in batch_summaries]),
        'Average XSum Length': mean([len(r.split()) for r in batch_refs]),
        'Compression Ratio': mean([
            len(batch_summaries[i].split()) / len(batch_docs[i].split()) * 100
            for i in range(len(batch_docs))
        ])
    }

    print("\nüìä Performance Metrics:")
    print("=" * 80)
    for metric, value in metrics.items():
        print(f"{metric}: {value:.2f}")

    print("\nüí° Analysis:")
    print("=" * 80)
    print(f"‚úÖ BART produces {metrics['Average BART Length']/metrics['Average XSum Length']:.1f}x longer summaries than XSum")
    print("‚úÖ More informative for business use")
    print("‚úÖ Compresses original to ~{:.1f}% of original length".format(metrics['Compression Ratio']))
else:
    print("‚ö†Ô∏è Metrics cannot be calculated because batch summaries are missing.")


üìä Performance Metrics:
Average BART Length: 49.50
Average XSum Length: 21.10
Compression Ratio: 26.55

üí° Analysis:
‚úÖ BART produces 2.3x longer summaries than XSum
‚úÖ More informative for business use
‚úÖ Compresses original to ~26.5% of original length


## 9. Save Results

In [22]:
# Save comprehensive results
if 'batch_summaries' in locals() and batch_summaries:
    results_df = pd.DataFrame({
        'Document': batch_docs,
        'BART_Summary': batch_summaries,
        'XSum_Reference': batch_refs,
        'BART_Words': [len(s.split()) for s in batch_summaries],
        'XSum_Words': [len(r.split()) for r in batch_refs]
    })

    results_df.to_csv('summarization_results.csv', index=False)
    print("‚úÖ Results saved to 'summarization_results.csv'")
    print(f"   Total entries: {len(results_df)}")
else:
    print("‚ö†Ô∏è No results to save.")

‚úÖ Results saved to 'summarization_results.csv'
   Total entries: 10
