Exercise Set: Transformer-Based Encoder-Decoder Practice

Hint: Consider the order of operations (e.g., should you translate before or after summarizing?). Handle different input/output formats for each task.

In [24]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
import spacy
import nltk

from transformers import pipeline, logging

# Suppress warnings from transformers
logging.set_verbosity_error()


In [2]:
#Step 2: Load a summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Getting Started Tips:

    Exercise 1 - it's the simplest modification of your existing code
    Copy your current code and modify it step by step
    Test each change with the same input text to see differences
    Print intermediate results to understand what's happening
    Don't worry if some experiments don't work perfectly - learning from failures is valuable!



In [4]:
text = (
    "Hey team, hope you're all doing great. Just a heads-up that we'll be having our project sync-up call on Thursday at 2 PM. "
    "We’ll review the current sprint progress, discuss any blockers, and finalize the scope for the next sprint. "
    "As always, feel free to bring up any ideas that could improve delivery or collaboration. Looking forward to a great discussion!"
)

summary_f = summarizer(text, max_length=45, min_length=20, do_sample=False)
summary_t = summarizer(text, max_length=45, min_length=20, do_sample=True)

print("Summary F:", summary_f[0]['summary_text'])
print("Summary T:", summary_t[0]['summary_text'])

Summary F: We'll have a project sync-up call on Thursday at 2 PM. We'll review the current sprint progress, discuss any blockers, and finalize the scope for the next sprint.
Summary T: Project sync-up call on Thursday at 2 PM. We’ll review the current sprint progress, discuss any blockers, and finalize the scope for the next sprint. As always, feel free to bring


Exercise 2: Different Text Types

Task: Test your summarizer on different types of text:


In [12]:
# Add these text samples to your code
news_article = "Scientists at MIT have developed a new battery technology that could revolutionize electric vehicles. The lithium-metal battery can store twice as much energy as current batteries and charge 50% faster. The research team, led by Dr. Sarah Johnson, spent three years developing this breakthrough. The technology could be commercially available within five years, potentially making electric cars more affordable and practical for everyday use."

recipe_text = "To make chocolate chip cookies, you'll need flour, sugar, butter, eggs, vanilla, and chocolate chips. First, preheat your oven to 375°F. Mix the dry ingredients in one bowl and wet ingredients in another. Combine them slowly, then fold in chocolate chips. Drop spoonfuls of dough on a baking sheet and bake for 10-12 minutes until golden brown."

email_text = "Hi everyone, I wanted to update you on our quarterly sales results. We exceeded our target by 15% this quarter, thanks to strong performance in the mobile app division. The marketing campaign we launched in July was particularly successful, generating 200 new leads. Our customer satisfaction scores also improved by 8%. Great work team, and let's keep the momentum going into Q4!"

In [13]:
summary_news = summarizer(news_article, max_length=45, min_length=20, do_sample=False)
print("Summary News:", summary_news[0]['summary_text'])

summary_recipe = summarizer(recipe_text, max_length=45, min_length=20, do_sample=False)
print("Summary Recipe:", summary_recipe[0]['summary_text'])

summary_text = summarizer(email_text, max_length=45, min_length=20, do_sample=False)
print("Summary Text:", summary_text[0]['summary_text'])

Summary News: Scientists at MIT have developed a new battery technology that could revolutionize electric vehicles. The lithium-metal battery can store twice as much energy as current batteries. The technology could be commercially available within five years.
Summary Recipe: To make chocolate chip cookies, you'll need flour, sugar, butter, eggs, vanilla, and chocolate chips. Drop spoonfuls of dough on a baking sheet and bake for 10-12 minutes until golden
Summary Text: We exceeded our target by 15% this quarter, thanks to strong performance in the mobile app division. The marketing campaign we launched in July was particularly successful, generating 200 new leads. Our customer satisfaction scores also improved


Exercise 3: Model Comparison

Task: Compare different pre-trained models by replacing "facebook/bart-large-cnn" with these alternatives:

    "google/t5-small" (T5 model)
    "facebook/bart-base" (smaller BART)
    "sshleifer/distilbart-cnn-12-6" (lightweight BART)

Hint: You'll need to add a prefix for T5: "summarize: " + text. Compare speed, output quality, and resource usage. Which model works best for your needs?

In [22]:
#summarizer_t5 = pipeline("summarization", model="summerize:google/t5-small")   # TODO: This doesn't work.
summarizer_bart_base = pipeline("summarization", model="facebook/bart-large-cnn")
summarizer_bart_lightweight = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

In [23]:
summary_text_bart_base = summarizer_bart_base(text, max_length=45, min_length=20, do_sample=False)
print("Summary Text BART base:", summary_text_bart_base[0]['summary_text'])

summary_text_bart_lightweight = summarizer_bart_lightweight(text, max_length=45, min_length=20, do_sample=False)
print("Summary Text BART lightweight:", summary_text_bart_lightweight[0]['summary_text'])

Summary Text BART base: We'll have a project sync-up call on Thursday at 2 PM. We'll review the current sprint progress, discuss any blockers, and finalize the scope for the next sprint.
Summary Text BART lightweight:  The project sync-up call will be held on Thursday at 2 PM . We'll review the current sprint progress, discuss any blockers, and finalize the scope for the next sprint . Feel free to bring up


Exercise 4: Custom Input Processing
Task: Create a function that automatically splits long texts and summarizes each part:

In [27]:
# Ensure punkt is downloaded for sentence splitting (if needed)
nltk.download('punkt')

def smart_summarize(text, chunk_size=500, overlap=50):
    # Initialize summarization pipeline
    summarizer = summarizer_bart_base

    # Tokenize text into words
    words = text.split()
    chunks = []

    # Split into overlapping word chunks
    start = 0
    while start < len(words):
        end = start + chunk_size
        chunk = ' '.join(words[start:end])
        chunks.append(chunk)
        start += chunk_size - overlap

    # Summarize each chunk
    summaries = []
    for chunk in chunks:
        summary = summarizer(chunk, max_length=100, min_length=20, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Combine all summaries
    final_summary = ' '.join(summaries)
    return final_summary

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [28]:
# Test
smart_summary = smart_summarize(text)
print("Smart Summary:", smart_summary)

Smart Summary: We'll have a project sync-up call on Thursday at 2 PM. We'll review the current sprint progress, discuss any blockers, and finalize the scope for the next sprint.


Exercise 5: Interactive Summarizer

Task: Build a simple interactive program



In [29]:
def interactive_summarizer():
    print("📝 Text Summarizer")
    print("Enter 'quit' to exit")

    # Initialize summarization pipeline
    summarizer = summarizer_bart_base

    while True:
        user_input = input("\nEnter text to summarize: ")
        if user_input.lower().strip() == 'quit':
            break

        # Handle empty input
        if not user_input.strip():
            print("⚠️ Please enter some text.")
            continue

        # Handle very short input
        if len(user_input.split()) < 10:
            print("⚠️ Text is too short to summarize. Try adding more content.")
            continue

        try:
            # Generate summary
            summary = summarizer(user_input, max_length=100, min_length=20, do_sample=False)
            print("\n📚 Summary:")
            print(summary[0]['summary_text'])
        except Exception as e:
            print(f"❌ Error: {e}")

    print("👋 Thanks for using the summarizer!")