Exercise 1: Parameter Experimentation

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """Your long input text here..."""

# Experiment with different parameters
params = [
    {"max_length": 30, "min_length": 10},
    {"max_length": 60, "min_length": 25},
    {"max_length": 100, "min_length": 40},
    {"max_length": 60, "min_length": 25, "do_sample": True, "temperature": 0.7}
]

for i, p in enumerate(params):
    summary = summarizer(text, **p)[0]['summary_text']
    print(f"\n--- Summary {i+1} ---")
    print(summary)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Your max_length is set to 30, but your input_length is only 8. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)
Your max_length is set to 60, but your input_length is only 8. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)



--- Summary 1 ---
Your long input text is needed for this article. Use the Daily Discussion to help people understand today's featured news stories.


Your max_length is set to 100, but your input_length is only 8. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)



--- Summary 2 ---
Your long input text is needed for this article. Use the Daily Discussion to help people understand today's featured news stories.


Your max_length is set to 60, but your input_length is only 8. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)



--- Summary 3 ---
Your long input text is needed for this article. Use the Daily Discussion to help people understand today's featured news stories. Please share your long input texts with us at iReport.com.

--- Summary 4 ---
Your long input text will be displayed in this article. Use the Daily Discussion to help people understand today's featured news stories.


Exercise 2: Different Text Types

In [None]:
# Add these text samples to your code
news_article = "Scientists at MIT have developed a new battery technology that could revolutionize electric vehicles. The lithium-metal battery can store twice as much energy as current batteries and charge 50% faster. The research team, led by Dr. Sarah Johnson, spent three years developing this breakthrough. The technology could be commercially available within five years, potentially making electric cars more affordable and practical for everyday use."

recipe_text = "To make chocolate chip cookies, you'll need flour, sugar, butter, eggs, vanilla, and chocolate chips. First, preheat your oven to 375°F. Mix the dry ingredients in one bowl and wet ingredients in another. Combine them slowly, then fold in chocolate chips. Drop spoonfuls of dough on a baking sheet and bake for 10-12 minutes until golden brown."

email_text = "Hi everyone, I wanted to update you on our quarterly sales results. We exceeded our target by 15% this quarter, thanks to strong performance in the mobile app division. The marketing campaign we launched in July was particularly successful, generating 200 new leads. Our customer satisfaction scores also improved by 8%. Great work team, and let's keep the momentum going into Q4!"
samples = {
    "News": news_article,
    "Recipe": recipe_text,
    "Email": email_text
}

for name, content in samples.items():
    print(f"\n--- {name} Summary ---")
    print(summarizer(content, max_length=60, min_length=25)[0]['summary_text'])


--- News Summary ---
Scientists at MIT have developed a new battery technology that could revolutionize electric vehicles. The lithium-metal battery can store twice as much energy as current batteries. The technology could be commercially available within five years.

--- Recipe Summary ---
To make chocolate chip cookies, you'll need flour, sugar, butter, eggs, vanilla, and chocolate chips. Drop spoonfuls of dough on a baking sheet and bake for 10-12 minutes.

--- Email Summary ---
We exceeded our target by 15% thanks to strong performance in the mobile app division. The marketing campaign we launched in July was particularly successful, generating 200 new leads. Our customer satisfaction scores also improved by 8%.


Exercise 3: Model Comparison

In [None]:
from transformers import pipeline

models = {
    "BART Large": "facebook/bart-large-cnn",
    "BART Base": "facebook/bart-base",
    "DistilBART": "sshleifer/distilbart-cnn-12-6",
    "T5 Small": "t5-small"
}

text = "Your input text..."

for name, model_name in models.items():
    print(f"\n--- {name} ---")
    if "t5" in model_name:
        text_input = "summarize: " + text
    else:
        text_input = text
    summarizer = pipeline("summarization", model=model_name)
    summary = summarizer(text_input, max_length=60, min_length=25)[0]['summary_text']
    print(summary)


--- BART Large ---


Device set to use cpu
Your max_length is set to 60, but your input_length is only 6. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=3)


CNN.com will feature iReporter photos in a weekly Travel Snapshots gallery. Please submit your best shots of the U.S. for next week. Visit CNN.com/Travel next Wednesday for a new gallery of snapshots.

--- BART Base ---


Device set to use cpu
Your max_length is set to 60, but your input_length is only 6. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=3)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Your input text... and your input code... and... and the input text.... and...and... and ... and

--- DistilBART ---


Device set to use cpu
Your max_length is set to 60, but your input_length is only 6. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=3)


 Your input text is required to submit a photo of a person with a caption . Please submit a picture of yourself to the gallery for a new gallery .

--- T5 Small ---


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu
Your max_length is set to 60, but your input_length is only 9. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


your input text . . the input text is a great example of a . input text that you can use to create your input ..


Exercise 4: Custom Input Processing

In [None]:
import textwrap

def smart_summarize(text, chunk_size=500, overlap=50):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start += chunk_size - overlap

    summaries = [summarizer(c, max_length=60, min_length=25)[0]['summary_text'] for c in chunks]
    return " ".join(summaries)

Exercise 5: Interactive Summarizer

In [None]:
def interactive_summarizer():
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    print("📝 Text Summarizer (Type 'quit' to exit)")

    while True:
        text = input("\nEnter text to summarize: ")
        if text.lower() == 'quit':
            break
        if len(text.strip()) < 20:
            print("Text too short to summarize.")
            continue
        try:
            summary = summarizer(text, max_length=60, min_length=25)[0]['summary_text']
            print("\n📝 Summary:")
            print(summary)
        except Exception as e:
            print(f"Error: {e}")
    print("Thanks for using the summarizer!")

Exercise 6: Evaluation Metrics

In [None]:
def evaluate_summary(original_text, summary):
    compression_ratio = len(summary) / len(original_text)
    original_sentences = original_text.count('.') + original_text.count('!') + original_text.count('?')
    summary_sentences = summary.count('.') + summary.count('!') + summary.count('?')

    return {
        'compression_ratio': round(compression_ratio, 2),
        'original_sentences': original_sentences,
        'summary_sentences': summary_sentences
    }

# Example usage:
summary = summarizer(news_article, max_length=60, min_length=25)[0]['summary_text']
print(evaluate_summary(news_article, summary))

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


{'compression_ratio': 0.38, 'original_sentences': 5, 'summary_sentences': 2}


In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
classifier = pipeline("sentiment-analysis")
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")

def multi_task_processor(text, tasks=['summarize', 'sentiment', 'translate']):
    results = {}

    if 'summarize' in tasks:
        results['summary'] = summarizer(text, max_length=60, min_length=25)[0]['summary_text']
    if 'sentiment' in tasks:
        results['sentiment'] = classifier(text)[0]
    if 'translate' in tasks:
        results['translation_fr'] = translator(text)[0]['translation_text']

    return results

# Example
text = "Electric vehicles are the future of sustainable transportation."
print(multi_task_processor(text))

Device set to use cpu
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Your max_length is set to 60, but your input_length is only 11. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)


{'summary': 'Electric vehicles are the future of sustainable transportation, says the U.S. National Transportation Safety Board. The NTSB has set up a panel of experts to advise on how to make electric vehicles more sustainable.', 'sentiment': {'label': 'POSITIVE', 'score': 0.9990712404251099}, 'translation_fr': "Les véhicules électriques sont l'avenir du transport durable."}
