# **Leveraging LLMs for Text Generation and Summarization**


# **Table of Contents**

1. [Architectural Overview of LLMs](#architectural-overview-of-llms")
2. [Categories of LLMs](#categories-of-llms)
3. [Advanced Text Generation Techniques](#advanced-text-generation-techniques)
4. [Parameter Tuning for Different Needs](#parameter-tuning-for-different-needs)
5. [Temperature tuning experiment](#temperature-tuning-experiment)
6. [Structured output generation](#structured-output-generation)
7. [Extractive Summarization](#extractive-summarization)
8. [Evaluation Metrics for Summarization](#evaluation-metrics-for-summarization)
9. [Abstractive Summarization & Control Parameters](#abstractive-summarization--control-parameters)
10. [Key Models for Abstractive Summarization](#key-models-for-abstractive-summarization)
11. [Controllable Summarization](#controllable-summarization)
12. [Building a Multi-Stage Summarizer](#building-a-multi-stage-summarization-pipeline)
13. [Multimodal Summarization](#multimodal-summarization)
14. [Practical Exercise: Building Your Custom Summarization System](#practical-exercise-building-your-custom-summarization-system)

# PART I

### Learning Objectives for Part 1:
- Understand the architecture of modern LLMs and their categories
- Implement extractive summarization techniques
- Establish evaluation metrics for summarization quality
- Begin building a multi-stage summarization pipeline

# **Architectural Overview of LLMs**

## The Transformer Architecture

The transformer architecture revolutionized NLP when introduced in the paper "Attention is All You Need" (Vaswanathan et al., 2017).

<!-- Transformer Architecture -->
<div align="center">
<img src='images/transformer architecture.png' alt="Transformer architecture" width=1000 height=700>
</div>


### Key Components:
- **Self-Attention Mechanism**: Allows the model to weigh the importance of different words in context
- **Multi-Head Attention**: Parallel attention mechanisms capturing different relationships
- **Positional Encoding**: Helps the model understand word order
- **Feed-Forward Networks**: Process the representations from attention layers
- **Layer Normalization**: Stabilizes training


### Why Transformers Excel at Text Tasks,
1. **Parallel Processing**: Unlike RNNs, can process entire sequences simultaneously
2. **Long-Range Dependencies**: Attention mechanism captures distant relationships
3. **Context Awareness**: Each token attends to all other tokens in the sequence
4. **Scalability**: Architecture scales well with data and compute

# Categories of LLMs

LLMs come in different architectural variants, each with strengths for different tasks:

## 1. Decoder-Only Models (Autoregressive)
- Examples: GPT series, LLaMA, Claude
- Trained to predict the next token
- **Strengths for summarization**: Creative text generation, coherent narrative
- **Weaknesses**: May hallucinate or add information not in source

## 2. Encoder-Only Models
- Examples: BERT, RoBERTa
- Trained on masked language modeling
- **Strengths for summarization**: Understanding document context, good for extractive summarization
- **Weaknesses**: Not designed for generation

## 3. Encoder-Decoder Models
- Examples: T5, BART
- Trained on sequence-to-sequence tasks
- **Strengths for summarization**: Balanced understanding and generation, ideal for abstractive summarization
- **Weaknesses**: Larger compute requirements

### Which architecture is best for summarization?
It depends on the task! We'll explore the tradeoffs throughout this tutorial.

# Quick Exercise: Choosing the Right Architecture

For each summarization scenario below, identify which model architecture (decoder-only, encoder-only, or encoder-decoder) would be most appropriate and why:

1. Generating creative article summaries with novel phrasing
2. Extracting key sentences from legal documents
3. Translating and summarizing simultaneously
4. Creating concise bullet points from meeting transcripts

In [None]:
# Run this cell to view the answers
print("1. Decoder-only models excel at creative generation but may add details.")

print("2. Encoder-only models like BERT are great at understanding document context.")

print("3. Encoder-decoder models like T5 are designed for tasks requiring both understanding and generation.")

print("4. This could use either encoder-only for extraction or encoder-decoder for concise reformulation.")

# Advanced Text Generation Techniques

### The Anatomy of Effective Prompts

<div align="center">
<img src="images/co-star.png>" width=700 height=500>
</div>



| Element       | Description                              |
| ------------- | ---------------------------------------- |
| **C**ontext   | Provide background information           |
| **O**bjective | State the goal of the task               |
| **S**tyle     | Specify tone, format, or constraints     |
| **T**ask      | What the model should actually do        |
| **A**udience  | Who the output is intended for           |
| **R**esponse  | Clarify what the output should look like |

📌 **Prompt Example:**

*Context*: You are a career advisor writing content for a university’s job preparation website. Many students are unsure how to describe their achievements on resumes, particularly in action-result format.

*Objective*: Help students craft clear and impressive resume bullet points for internships in data science.

*Style*: Keep the tone professional and concise. Use strong action verbs and quantify results wherever possible. Avoid first-person language.

*Task*: Based on the details provided, generate 3 resume bullet points that follow best practices in resume writing.

*Audience*: Undergraduate students applying for internships in data science roles.

*Response*: Your output should be a bulleted list of exactly 3 resume-ready statements.


# Parameter Tuning for Different Needs

Understanding how generation parameters affect output quality:

- **Temperature (0.0-2.0)**: Controls randomness
  - 0.0-0.3: Deterministic, factual content
  - 0.4-0.7: Balanced creativity and coherence  
  - 0.8-1.2: Creative, varied output
  - 1.3+: Highly creative but potentially incoherent

- **Top-p (0.0-1.0)**: Nucleus sampling
  - Selects the most probable tokens whose cumulative probability exceeds a certain threshold p
  - Lower values: More focused, consistent
  - Higher values: More diverse vocabulary

- **Top-k**: Limits vocabulary to k most likely tokens and samples from it
  - Lower values: More predictable
  - Higher values: More creative word choices

- **Beam-Search**
<img src='images/beam-search.jpg' width=900 height=431>
  -  Sequence score is cumulative sum of the log probability of every token in the beam.

# Temperature tuning experiment

In [3]:
# Set up the environment
def is_colab():
    try:
        import google.colab
        return True
    except ImportError:
        return False

if is_colab():
    from google.colab import userdata
    api_key = userdata.get('OPENROUTER_API_KEY')
else:
  from dotenv import load_dotenv
  load_dotenv()

import os
import dotenv
from dotenv import load_dotenv
from openai import OpenAI
import json
from typing import List
load_dotenv()

True

In [4]:
class LLMClient:

    def __init__(self, model_name="google/gemini-2.0-flash-lite-001"):
        self.model = model_name
        self.api_key = os.getenv("OPENROUTER_API_KEY")
        self.client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=self.api_key)

    def generate(self, text, temperature=0.8, max_tokens=2000):
        """Generate summary using autoregressive model"""
        response = self.client.chat.completions.create(
            messages=[
                {"role": "user", "content": text}
            ],
            temperature=temperature,
            max_tokens=max_tokens,
            model=self.model,
            extra_body={},
        )
        
        # Extract and return the response
        result = response.choices[0].message.content
        return result

In [None]:
def compare_parameters(prompt: str, temperatures: List[float], client: LLMClient):

    results = []
    for temperature in temperatures:
        result = client.generate(prompt, temperature=temperature)
        results.append(
            {
                "temperature": temperature,
                "response": result,
                "length": len(result.split())
            }
        )   
    return results


test_prompt = "Write a creative opening sentence for a science fiction story about time travel."
temperatures = [0.1, 0.5, 0.9, 1.2]

results = compare_parameters(test_prompt, temperatures, LLMClient()),
for result in results[0]:
    print(f"Temperature: {result['temperature']}"),
    print(f"Response: {result['response']}"),
    print(f"Word count: {result['length']}"),
    print("-" * 40)


# Structured output generation

Getting LLMs to produce consistent, parseable output formats is crucial for integration with downstream systems.

### Prompt based structured generation

In [17]:
schema = {
    "name": "string",
    "age": "integer",
    "profession": "string",
    "language": "string",
    "origin": "string"
}

character_bio = """
"Dr. Amara Liu is a 35-year-old astrophysicist from Shanghai. She researches dark matter and enjoys stargazing and sketching. Fluent in Mandarin and English."
"""

prompt = (
    f"{character_bio}\n\n"
    "Extract the following structured data as JSON:\n"
    f"{json.dumps(schema, indent=2)}\n\n"
    "Respond ONLY with valid JSON that matches the schema above."
)
model = LLMClient()
response = model.generate(prompt)

# Try to extract JSON
try:
    structured_data = json.loads(response)
except json.JSONDecodeError:
    import re
    match = re.search(r'\{.*\}', response, re.DOTALL)
    structured_data = json.loads(match.group()) if match else {"error": "invalid format"}

print(structured_data)


{'name': 'Dr. Amara Liu', 'age': 35, 'profession': 'astrophysicist', 'language': 'Mandarin and English', 'origin': 'Shanghai'}


### Schema Aware structured Output

In [None]:
# Structure Output via function calling
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_profile",
            "description": "Extracts a person's profile.",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "profession": {"type": "string"},
                    "language": {"type": "string"},
                    "origin": {"type": "string"}
                },
                "required": ["name", "age", "profession", "language", "origin"]
            }
        }
    }
]

client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))

response = client.chat.completions.create(
    model="google/gemini-2.0-flash-lite-001",
    messages=[{
        "role": "user",
        "content": f"What is the name, age, profession, language, and origin of the person in the following text:\n {character_bio}"
    }],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls[0].function.arguments)


# Text Summarization

<img src="images/text summarization.jpg" width=1000>

# Extractive Summarization

Extractive summarization selects the most important sentences from the orginal text to create the summary.

## The Basic Process:
1. Score sentences based on importance
2. Select top-scoring sentences using ranking algorithm (TF IDF, SVM, etc)
3. Arrange in coherent order (usually original order)

## Advantages:
- Factually accurate (uses original text)
- Computationally efficient
- Works well for objective content

## Disadvantages:
- May be disconnected or redundant
- Cannot reformulate or simplify complex content
- Limited by quality of source material

In [None]:
# import the required libraries
import nltk
nltk.download('punkt') 
nltk.download('stopwords') 
from collections import Counter
from nltk.corpus import stopwords 
from nltk.tokenize import sent_tokenize, word_tokenize 

def extractive_summary(text, ratio=0.3):
    # Tokenize the text into individual sentences
    sentences = sent_tokenize(text)

    # Tokenize each sentence into individual words and remove stopwords
    stop_words = set(stopwords.words('english'))
    
    words = [word.lower() for word in word_tokenize(text) if word.lower() not in stop_words and word.isalnum()] # This removes any stop words and non-alphanumeric characters from the resulting list of words and converts them all to lowercase

    # Compute the frequency of each word
    word_freq = Counter(words)

    # Compute the score for each sentence based on the frequency of its words
    sentence_scores = {}
    for sentence in sentences:
        sentence_words = [word.lower() for word in word_tokenize(sentence) if word.lower() not in stop_words and word.isalnum()]
        sentence_score = sum([word_freq[word] for word in sentence_words])
        if len(sentence_words) < 20: # to filter short sentences
            sentence_scores[sentence] = sentence_score

    # Compute the number of sentences to include in the summary
    num_sentences = max(1, int(len(sentences) * ratio))


    summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:num_sentences]
    summary = ' '.join(summary_sentences)

    return summary

In [None]:
# Example usage

# Load a Ghanaian news article
with open('articles/article.txt', 'r', encoding='utf-8') as f:
    ghana_article = f.read()
    
# Print article length
print(f"Article contains {len(sent_tokenize(ghana_article))} sentences and {len(ghana_article.split())} words")
generated_summary= extractive_summary(ghana_article, ratio=0.3)
print(f"\nSUMMARY\n{generated_summary}")

# Evaluation Metrics for Summarization

How do we know if our summaries are good? Let's implement some common evaluation metrics:

## ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
- Measures overlap between machine-generated summary and reference summary
- ROUGE-N: N-gram recall
- ROUGE-L: Longest Common Subsequence 

<img src='images/Rouge L.jpeg' width=700>

LCS is the longest set of ordered tokens that occurs in both sequences (Ref, Gen)

## BLEU (Bilingual Evaluation Understudy)
- Originally designed for translation, but used for summarization
- Precision-focused (how many generated n-grams appear in reference)

## BERTScore
- Uses contextual embeddings to compute similarity
- Better semantic understanding than n-gram methods

## Human Evaluation Dimensions
- **Relevance**: How well does the summary capture the main points?
- **Coherence**: Does it flow logically?
- **Fluency**: Is it grammatically correct?
- **Factuality**: Does it contain errors or hallucinations?
- **Accuracy**: Does the summary accurately represent the original content?
- **Readability**: Is the summary well-written and easy to understand?

In [None]:
# Implement custom ROUGE socre
from collections import Counter


# Reference summary for comparison
with open("articles/reference summary.txt", 'r', encoding='utf-8') as f:
    reference_summary = f.read()

# Tokenize into words
ref_tokens = word_tokenize(reference_summary.lower())
cand_tokens = word_tokenize(generated_summary.lower())


# Generate n-grams


# Count ngrams


# Calculate precision


# Calculate recall


# Calculate F1 score




In [12]:
# Let's create a simple function to evaluate our summaries for ROUGE-1 and ROUGE-2
def evaluate_summary(reference, candidate):
    """Evaluate a summary using multiple metrics"""
    scores = {
        'ROUGE-1': calculate_rouge_n(reference, candidate, 1),
        'ROUGE-2': calculate_rouge_n(reference, candidate, 2),
    }

    # Add readability metric: average words per sentence
    cand_sentences = sent_tokenize(candidate)
    avg_sentence_length = len(word_tokenize(candidate)) / max(1, len(cand_sentences))
    scores['Avg Words/Sentence'] = avg_sentence_length
    
    return scores

In [None]:
# Let's try evaluating our summary against a reference summary

# Evaluate our extractive summary against the reference
evaluation_scores = evaluate_summary(reference_summary, generated_summary)

print("Evaluation Results:")
for metric, score in evaluation_scores.items():
    print(f"{metric}: {score:.4f}")

In [None]:
# Let's try different summary ratios and compare
ratios = [0.2, 0.3, 0.4, 0.5]
ratio_results = []

for ratio in ratios:
    test_summary = extractive_summary(ghana_article, ratio=ratio)
    scores = evaluate_summary(reference_summary, test_summary)
    scores['ratio'] = ratio
    scores['length'] = len(test_summary.split())
    ratio_results.append(scores)

# Display comparison
import pandas as pd

results_df = pd.DataFrame(ratio_results)
print("\nComparison of Different Summary Ratios:")
print(results_df[['ratio', 'length', 'ROUGE-1', 'ROUGE-2']])



# Part II

# Abstractive Summarization & Control Parameters


## What is Abstractive Summarization?

Abstractive summarization involves:
- Understanding the source content deeply
- Identifying key concepts and relationships
- Generating new text by paraphrasing existing text
- Condensing information in ways that extraction cannot

## Why Use Encoder-Decoder Models?

While autoregressive (decoder-only) models like GPT can perform abstractive summarization, encoder-decoder models like T5 and BART offer specific advantages:

1. **Bidirectional encoding**: The encoder comprehends the entire document before generation begins
2. **Source-target separation**: Clear distinction between understanding and generation
3. **Cross-attention mechanism**: Decoder can directly reference source material during generation
4. **Training objectives**: Pre-trained specifically for tasks that include summarization
5. **Control mechanisms**: Easier to implement length constraints and other controls

# Key Models for Abstractive Summarization

## BART (Bidirectional and Auto-Regressive Transformers)
- Combines bidirectional encoder (like BERT) with autoregressive decoder
- Pre-trained on denoising tasks, including text infilling and sentence shuffling
- Particularly effective for summarization tasks

## T5 (Text-to-Text Transfer Transformer)
- Treats all NLP tasks as "text-to-text" problems
- Consistent performance across summarization, translation, classification, etc.
- Uses a "prefix" to specify the task (e.g., "summarize:")

## Pegasus
- Specifically pre-trained for abstractive summarization
- Uses "gap sentences" pre-training, masking important sentences during training
- Optimized for news summarization tasks

## Comparing with Autoregressive Models (e.g., GPT)

| Aspect | Encoder-Decoder | Autoregressive |
|--------|-----------------|----------------|
| Source understanding | Bidirectional | Primarily left-to-right |
| Memory of source | Direct attention | Must retain in context |
| Training objective | Often summarization-specific | General next-token prediction |
| Length control | Easier to implement | Requires special techniques |
| Hallucination risk | Lower (with cross-attention) | Higher |
| Flexibility | Task-specific | More general-purpose |

In practice, the lines are blurring as models evolve. Modern autoregressive models can achieve excellent summarization through few-shot prompting and other techniques.

In [None]:
import os
import torch
import pandas as pd
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import T5Tokenizer, T5ForConditionalGeneration
import nltk
from nltk.tokenize import sent_tokenize
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
from transformers import T5Tokenizer, T5ForConditionalGeneration, BartForConditionalGeneration, BartTokenizer
nltk.download('punkt')

class T5Summarizer:
    """T5-based abstractive summarizer"""
    
    def __init__(self, model_name="t5-small", device='cuda' if torch.cuda.is_available() else 'cpu'):
        self.model_name = model_name
        self.device = device
        self.tokenizer = None
        self.model = None
        self.prefix = "summarize: "
        self.checkpoint_name = f"{model_name.replace('/', '_')}_t5"
        
    def load_model(self):
        """Load model and tokenizer with progress tracking"""
        print(f"Loading {self.model_name}...")
            
        # Load tokenizer and model
        self.tokenizer = T5Tokenizer.from_pretrained(self.model_name)
        self.model = T5ForConditionalGeneration.from_pretrained(self.model_name).to(self.device)
        
        
    def summarize(self, text, max_length=150, min_length=50, length_penalty=2.0, 
                  num_beams=4, early_stopping=True):
        """Generate summary with T5, adding the task prefix"""
        if not self.model or not self.tokenizer:
            self.load_model()
            
        # Add prefix for T5
        prefixed_text = self.prefix + text
        
        # Tokenize input text
        inputs = self.tokenizer(prefixed_text, return_tensors="pt", max_length=1024, truncation=True).to(self.device)
        
        # Generate summary
        summary_ids = self.model.generate(
            inputs.input_ids,
            max_length=max_length,
            min_length=min_length,
            length_penalty=length_penalty,
            num_beams=num_beams,
            early_stopping=early_stopping
        )
        
        # Decode and return summary
        summary = self.tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        
        return summary


class BartSummarizer:
    """BART-based abstractive summarizer"""
    
    def __init__(self, model_name="facebook/bart-base", device='cuda' if torch.cuda.is_available() else 'cpu'):
        self.model_name = model_name
        self.device = device
        self.tokenizer = None
        self.model = None
        self.checkpoint_name = f"{model_name.replace('/', '_')}_bart"
        
    def load_model(self):
        """Load model and tokenizer with progress tracking"""
        print(f"Loading {self.model_name}...")
            
        # Load tokenizer and model
        self.tokenizer = BartTokenizer.from_pretrained(self.model_name)
        self.model = BartForConditionalGeneration.from_pretrained(self.model_name).to(self.device)
        
    def summarize(self, text, max_length=150, min_length=50, length_penalty=2.0, 
                  num_beams=4, early_stopping=True):
        """Generate an abstractive summary with BART"""
        if not self.model or not self.tokenizer:
            self.load_model()
            
        # Tokenize input text
        inputs = self.tokenizer(text, return_tensors="pt", max_length=1024, truncation=True).to(self.device)
        
        # Generate summary
        summary_ids = self.model.generate(
            inputs.input_ids,
            max_length=max_length,
            min_length=min_length,
            length_penalty=length_penalty,
            num_beams=num_beams,
            early_stopping=early_stopping
        )
        
        # Decode and return summary
        summary = self.tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        
        return summary
    
class AutoregressiveSumarizer:

    def __init__(self, model_name="google/gemini-2.0-flash-lite-001"):
        self.model = model_name
        self.api_key = os.getenv("OPENROUTER_API_KEY")
        self.client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=self.api_key)

    def summarize(self, text, system_prompt, temperature=0.8, max_tokens=500):
        """Generate summary using autoregressive model"""
        response = self.client.chat.completions.create(
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": text}
            ],
            temperature=temperature,
            max_tokens=max_tokens,
            model=self.model,
            extra_body={},
        )
        
        # Extract and return the summary
        summary = response.choices[0].message.content
        return summary


In [None]:
# Let's try out our abstractive summarizers

# Let's create our summarizers
t5_summarizer = T5Summarizer("t5-small")  # 
bart_summarizer = BartSummarizer("facebook/bart-base")
autoregressive_summarizer = AutoregressiveSumarizer()

# Generate summaries
t5_summary = t5_summarizer.summarize(
    ghana_article,
    max_length=100,  # Target length in tokens
    min_length=30,
    num_beams=4      # Beam search for better quality
)

bart_summary = bart_summarizer.summarize(
    ghana_article,
    max_length=100,
    min_length=30,
    num_beams=4
)

autoregressive_summary = autoregressive_summarizer.summarize(
    ghana_article,
    "Summarize the following text in a concise manner.",
    max_tokens=100
)

# Let's also get our extractive summary for comparison
extractive_summary= extractive_summary(ghana_article, ratio=0.3)

# Display all summaries
print("Original Text Length:", len(ghana_article.split()))
print("\n--- T5 Summary ---")
print(t5_summary)
print("\nLength:", len(t5_summary.split()))

print("\n--- BART Summary ---")
print(bart_summary)
print("\nLength:", len(bart_summary.split()))

print("\n--- AutoRegressive Summary ---")
print(autoregressive_summary)
print("\nLength:", len(autoregressive_summary.split()))

print("\n--- Extractive Summary ---")
print(extractive_summary)
print("\nLength:", len(extractive_summary.split()))

In [None]:
# use the same news file so we use the same reference summary
# Evaluate all summaries
t5_scores = evaluate_summary(reference_summary, t5_summary)
bart_scores = evaluate_summary(reference_summary, bart_summary)
autoregressive_scores = evaluate_summary(reference_summary, autoregressive_summary)
extractive_scores = evaluate_summary(reference_summary, extractive_summary)

# Compare scores
summary_comparison = pd.DataFrame({
    'T5': t5_scores,
    'BART': bart_scores,
    'AutoRegressive': autoregressive_scores,
    'Extractive': extractive_scores
})

print("Objective Metrics Comparison:")
print(summary_comparison)

# Controllable Summarization

One of the major advantages of modern summarization systems is the ability to control various aspects of the generated summaries:

## Common Control Parameters:

1. **Length**: Controlling how long or short the summary should be
2. **Style**: Formal vs. casual, simple vs. technical
3. **Focus**: Emphasizing particular topics or aspects
4. **Structure**: Bullet points, narrative, or question-answering

## How to Implement Controls:

1. **Model-specific parameters**: Using built-in generation controls
2. **Prompt engineering**: Adding instructional prefixes
3. **Output filtering**: Post-processing generated summaries
4. **Fine-tuning**: Training the model with examples of desired style

Length control

In [None]:
# Implementing length control for our summarizers

def generate_controlled_summaries(text, model, lengths=[50, 100, 200]):
    """Generate summaries of different controlled lengths"""
    summaries = {}
    
    for length in lengths:
        if isinstance(model, T5Summarizer):
            summary = model.summarize(
                text,
                max_length=length,
                min_length=max(10, int(length * 0.7)),  # At least 70% of max length
                num_beams=4
            )
            summaries[f"T5 (length={length})"] = summary
            
        elif isinstance(model, BartSummarizer):
            summary = model.summarize(
                text,
                max_length=length,
                min_length=max(10, int(length * 0.7)),
                num_beams=4
            )
            summaries[f"BART (length={length})"] = summary
    
    return summaries

# Let's generate summaries of different lengths
length_controlled_t5 = generate_controlled_summaries(ghana_article, t5_summarizer, [50, 100, 150])
length_controlled_bart = generate_controlled_summaries(ghana_article, bart_summarizer, [50, 100, 150])

# Display and analyze length-controlled summaries
print("Length-Controlled Summaries:\n")

for name, summary in {**length_controlled_t5, **length_controlled_bart}.items():
    print(f"\n--- {name} ---")
    print(summary)
    print(f"Actual length: {len(summary.split())} words")

# Plotting the relationship between target length and actual length
import matplotlib.pyplot as plt

# Extract target lengths and actual lengths
target_lengths = []
actual_lengths_t5 = []
actual_lengths_bart = []

for length in [50, 100, 150]:
    target_lengths.append(length)
    actual_lengths_t5.append(len(length_controlled_t5[f"T5 (length={length})"].split()))
    actual_lengths_bart.append(len(length_controlled_bart[f"BART (length={length})"].split()))

plt.figure(figsize=(10, 6))
plt.plot(target_lengths, actual_lengths_t5, 'o-', label='T5')
plt.plot(target_lengths, actual_lengths_bart, 'o-', label='BART')
plt.plot(target_lengths, target_lengths, '--', label='Target=Actual', color='gray')
plt.xlabel('Target Length (tokens)')
plt.ylabel('Actual Length (words)')
plt.title('Length Control Effectiveness')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.savefig('length_control.png')
plt.close()

from IPython.display import Image
Image('length_control.png')

Prompt Control

In [None]:
# Implementing focus control through prompt engineering

def focus_controlled_summary(text, model, focus_area):
    """Generate a summary focused on a specific aspect"""
    
    # Create a focused prompt
    if isinstance(model, T5Summarizer):
        focused_prompt = f"summarize focusing on {focus_area}: {text}"
        summary = model.summarize(
            focused_prompt,
            max_length=100,
            min_length=30,
            num_beams=4
        )
    
    elif isinstance(model, BartSummarizer):
        # For BART, we need to be more creative since it doesn't have a prefix
        # Append the instruction to the beginning of the text
        focused_prompt = f"Focus on {focus_area} in your summary. {text}"
        summary = model.summarize(
            focused_prompt,
            max_length=100,
            min_length=30,
            num_beams=4
        )
    
    return summary

# Generate summaries focused on different aspects
focus_areas = ["cost of living comparison", "Brain Drain vs. Nation Building", "unity and collective action"]
focused_summaries = {}

for focus in focus_areas:
    focused_summaries[f"T5 (focus: {focus})"] = focus_controlled_summary(ghana_article, t5_summarizer, focus)
    focused_summaries[f"BART (focus: {focus})"] = focus_controlled_summary(ghana_article, bart_summarizer, focus)

# Display focused summaries
print("Focus-Controlled Summaries:\n")

for name, summary in focused_summaries.items():
    print(f"\n--- {name} ---")
    print(summary)

# Let's analyze how well the focus control worked
import re

def count_focus_related_words(text, focus):
    """Count words related to the focus area"""
    # Create a simple keyword list for each focus area
    focus_keywords = {
        "cost of living comparison": ["youth emigration", "greener pastures", "high unemployment", "limited opportunities", "collective responsibility", "development"],
        "Brain Drain vs. Nation Building": ["affordable housing", "lower living expenses", "imported goods",  "entrepreneurship opportunities"],
        "unity and collective action": [ "unity", "peace", "collaboration",  "technology leverage", "shared vision", "mutual respect","national development"]
    }
    
    # Count occurrences of focus keywords
    keywords = focus_keywords.get(focus, [])
    count = sum(1 for keyword in keywords if re.search(r'\b' + keyword + r'\b', text.lower()))
    
    return count, len(keywords)

# Analyze focus effectiveness
focus_effectiveness = {}

for focus in focus_areas:
    t5_count, total = count_focus_related_words(
        focused_summaries[f"T5 (focus: {focus})"], 
        focus
    )
    bart_count, _ = count_focus_related_words(
        focused_summaries[f"BART (focus: {focus})"], 
        focus
    )
    
    # Calculate percentage of focus keywords present
    focus_effectiveness[focus] = {
        "T5": f"{t5_count}/{total} keywords ({t5_count/total*100:.1f}%)",
        "BART": f"{bart_count}/{total} keywords ({bart_count/total*100:.1f}%)"
    }

# Display focus effectiveness
print("\nFocus Control Effectiveness:\n")
for focus, models in focus_effectiveness.items():
    print(f"Focus area: {focus}")
    print(f"  T5: {models['T5']}")
    print(f"  BART: {models['BART']}")

Style Control

In [None]:
# Implementing style and structure control

def style_controlled_summary(text, model, style):
    """Generate a summary with a specific style"""
    
    style_prompts = {
        "formal": "Generate a formal and technical summary of the following text:",
        "simple": "Generate a simple summary using basic vocabulary and short sentences:",
        "bullet_points": "Generate a summary in bullet point format highlighting key points:",
        "question_answering": "Generate a summary in question and answer format about:",
        "news_headline": "Write a news headline style summary of:"
    }
    
    prompt = f"{style_prompts[style]} {text}"
    
    if isinstance(model, T5Summarizer):
        # For T5, replace the standard "summarize:" prefix
        summary = model.summarize(
            prompt,
            max_length=120,
            min_length=30,
            num_beams=4
        )
    
    elif isinstance(model, BartSummarizer):
        summary = model.summarize(
            prompt,
            max_length=120,
            min_length=30,
            num_beams=4
        )
    
    return summary

# Generate summaries with different styles
styles = ["formal", "simple", "bullet_points", "question_answering", "news_headline"]
styled_summaries = {}

for style in styles:
    # Let's just use T5 for this demonstration
    styled_summaries[f"T5 (style: {style})"] = style_controlled_summary(ghana_article, t5_summarizer, style)

# Display styled summaries
print("Style-Controlled Summaries:\n")

for name, summary in styled_summaries.items():
    print(f"\n--- {name} ---")
    print(summary)

# Simple readability assessment
def assess_readability(text):
    """Calculate a simple readability score (average words per sentence)"""
    sentences = sent_tokenize(text)
    if not sentences:
        return 0
    
    words = text.split()
    avg_words_per_sentence = len(words) / len(sentences)
    
    return avg_words_per_sentence

# Analyze style effectiveness
style_assessment = {}

for style, summary_key in zip(styles, styled_summaries.keys()):
    summary = styled_summaries[summary_key]
    
    # Check if bullet points are present
    has_bullets = "•" in summary or "-" in summary.split() or any(line.strip().startswith("-") for line in summary.split("\n"))
    
    # Check if questions are present
    has_questions = "?" in summary
    
    # Assess readability
    readability = assess_readability(summary)
    
    style_assessment[style] = {
        "Readability (words/sentence)": f"{readability:.1f}",
        "Has bullet points": "Yes" if has_bullets else "No",
        "Has questions": "Yes" if has_questions else "No"
    }

# Display style assessment
print("\nStyle Control Assessment:\n")
style_df = pd.DataFrame(style_assessment).T
print(style_df)

# Building a Multi-Stage Summarization Pipeline

Now let's combine the best of extractive and abstractive approaches to create a more effective summarization pipeline:

## Benefits of a Multi-Stage Approach:

1. **Handling longer documents**: Extractive methods can select relevant content from long texts
2. **Improving factual accuracy**: Extractive first step preserves key facts
3. **Computational efficiency**: Processing only relevant portions with resource-intensive models
4. **Enhanced control**: Apply different strategies at different stages

## Our Pipeline Design:

1. **Stage 1**: Extractive selection of most relevant sentences
2. **Stage 2**: Abstractive summarization of the extracted content
3. **Stage 3**: Post-processing for quality control

In [69]:
# Implementing a multi-stage summarization pipeline

class MultiStageSummarizer:
    """Multi-stage pipeline combining extractive and abstractive summarization"""
    
    def __init__(self, 
                 extractive_summarizer,
                 abstractive_summarizer,
                 extractive_ratio=0.5):
        """Initialize the pipeline with component summarizers"""
        self.extractive_summarizer = extractive_summarizer
        self.abstractive_summarizer = abstractive_summarizer
        self.extractive_ratio = extractive_ratio
        
    def summarize(self, text, max_length=100, min_length=30):
        """Generate a summary using the multi-stage pipeline"""
        # Stage 1: Extractive summarization
        print("Stage 1: Extractive summarization...")
        extractive_summary = self.extractive_summarizer(
            text, 
            ratio=self.extractive_ratio
        )
        
        # Check the intermediate result
        print(f"  Extracted ({len(extractive_summary.split())} words)")
        
        # Stage 2: Abstractive summarization
        print("Stage 2: Abstractive summarization...")
        if isinstance(self.abstractive_summarizer, T5Summarizer):
            abstractive_summary = self.abstractive_summarizer.summarize(
                extractive_summary,
                max_length=max_length,
                min_length=min_length,
                num_beams=4
            )
        else:
            abstractive_summary = self.abstractive_summarizer.summarize(
                extractive_summary,
                max_length=max_length,
                min_length=min_length,
                num_beams=4
            )
            
        # Stage 3: Post-processing
        print("Stage 3: Post-processing...")
        final_summary = self.post_process(abstractive_summary)
        
        return {
            "extractive_summary": extractive_summary,
            "abstractive_summary": abstractive_summary,
            "final_summary": final_summary
        }
    
    def post_process(self, summary):
        """Apply post-processing to improve summary quality"""
        # Remove repeated phrases or sentences (a common issue)
        sentences = sent_tokenize(summary)
        unique_sentences = []
        
        for s in sentences:
            # Skip nearly identical sentences (simple approach)
            if not any(self.sentence_similarity(s, us) > 0.7 for us in unique_sentences):
                unique_sentences.append(s)
        
        # Rejoin the unique sentences
        processed_summary = ' '.join(unique_sentences)
        
        return processed_summary
    
    def sentence_similarity(self, s1, s2):
        """Calculate simple word overlap similarity between sentences"""
        words1 = set(s1.lower().split())
        words2 = set(s2.lower().split())
        
        if not words1 or not words2:
            return 0
            
        overlap = words1.intersection(words2)
        return len(overlap) / max(len(words1), len(words2))

In [None]:
multistage_summarizer=MultiStageSummarizer(
    extractive_summarizer=extractive_summary,
    abstractive_summarizer=t5_summarizer,
    extractive_ratio=0.3
)
result = multistage_summarizer.summarize(ghana_article)
print(result)

# Part III

# Multimodal Summarization


Multimodal summarization involves generating concise text that captures information from:
- Text documents
- Images
- Tables and charts
- Audio recordings
- Video content

## Approaches to Multimodal Summarization:

1. **Pipeline Approach**: Process each modality separately, then combine
2. **Unified Models**: Use multimodal models (like CLIP or GPT-4) that understand multiple modalities
3. **Extraction + Description**: Extract elements from non-text modalities and describe them in text

## Challenges:

- Aligning information across modalities
- Handling inconsistencies between modalities 
- Determining relative importance of different modalities
- Technical complexity of processing multiple formats

In [None]:
# Implementing a multimodal summarizer for text + image data 

import requests
import base64
import os
import json
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt

class MultimodalSummarizer:
    """Multimodal summarizer using Llama-4 Vision capabilities"""
    
    def __init__(self, api_key=None):
        """Initialize the multimodal summarizer with an OPENROUTER API key"""
        # Get API key from environment variable if not provided
        self.api_key = api_key or os.environ.get("OPENROUTER_API_KEY", "")
        if not self.api_key:
            print("Warning: No OPENROUTER API key provided. Please set your OPENROUTER_API_KEY.")
        
        self.api_url = "https://openrouter.ai/api/v1"
        self.model = "meta-llama/llama-4-maverick:free"
        
    def encode_image(self, image_path):
        """Encode an image to base64 for API submission"""
        # Check if it's a URL or local path
        if image_path.startswith(('http://', 'https://')):
            response = requests.get(image_path)
            image = Image.open(BytesIO(response.content))
            buffered = BytesIO()
            image.save(buffered, format="JPEG")
            return base64.b64encode(buffered.getvalue()).decode('utf-8')
        else:
            with open(image_path, "rb") as image_file:
                return base64.b64encode(image_file.read()).decode('utf-8')
    
    def create_gpt4_payload(self, text, image_paths, max_tokens=500):
        """Create the API payload with text and images"""
        messages = [
            {
                "role": "system",
                "content": "You are a helpful assistant that creates concise summaries from text and images."
            },
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": f"Please create a comprehensive summary that combines information from the following text and images. Focus on integrating visual information with the text content.\n\nTEXT: {text}"}
                ]
            }
        ]
        
        # Add images to the content
        for img_path in image_paths:
            try:
                base64_image = self.encode_image(img_path)
                messages[1]["content"].append(
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}",
                            "detail": "high"
                        }
                    }
                )
            except Exception as e:
                print(f"Error processing image {img_path}: {e}")
        
        return {
            "model": self.model,
            "messages": messages,
            "max_tokens": max_tokens
        }
    
    def summarize_multimodal(self, text, image_paths, max_tokens=500):
        """Generate a summary from text and images using GPT-4"""
        if not self.api_key:
            return {"error": "No API key provided. Please set your OPENROUTER_API_KEY."}
        
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self.api_key}"
        }
        
        payload = self.create_gpt4_payload(text, image_paths, max_tokens)
        
        try:
            # Simulate API call for workshop purposes
            print("Making API call to model...")

            response = requests.post(self.api_url, headers=headers, data=payload)
            result = response.json()
            
            return {
                'text_source': text,
                'image_paths': image_paths,
                'combined_summary': result
            }
            
        except Exception as e:
            return {"error": str(e)}
    
    def display_images(self, image_paths):
        """Display the images used in the multimodal summary"""
        num_images = len(image_paths)
        
        if num_images == 0:
            return
        
        fig, axes = plt.subplots(1, num_images, figsize=(5*num_images, 5))
        
        if num_images == 1:
            axes = [axes]  # Make it iterable for single image case
            
        for i, img_path in enumerate(image_paths):
            try:
                # Handle both URLs and local paths
                if img_path.startswith(('http://', 'https://')):
                    response = requests.get(img_path)
                    img = Image.open(BytesIO(response.content))
                else:
                    img = Image.open(img_path)
                
                axes[i].imshow(img)
                axes[i].set_title(f"Image {i+1}")
                axes[i].axis('off')
            except Exception as e:
                axes[i].text(0.5, 0.5, f"Error loading image: {e}", 
                             ha='center', va='center', transform=axes[i].transAxes)
        
        plt.tight_layout()
        plt.savefig('multimodal_input.png')
        plt.close()
        
        from IPython.display import Image
        return Image('multimodal_input.png')

In [None]:
image_paths = [
    # "images/image1.jpg",
    # "images/image2.jpg",
    # "images/image3.png"
]

multimodal_summarizer = MultimodalSummarizer()

# Generate a multimodal summary
multimodal_result = multimodal_summarizer.summarize_multimodal(
    #article,
    image_paths,
    max_tokens=300
)

# Practical Exercise: Building Your Custom Summarization System

Now it's your turn to build a complete summarization system by combining techniques we've explored.

## Exercise Goals:
1. Create a pipeline that combines multiple approaches
2. Customize control parameters for your specific needs
3. Evaluate results using advanced metrics
4. Compare performance across different text types

## Project Ideas:
1. **News Summarizer Bot**: Create a system that retrieves and summarizes news articles on specific topics
2. **Meeting Minutes Generator**: Transcribe and summarize meeting audio recordings
3. **Research Paper Summarizer**: Generate summaries of academic papers with focus on methodology and results
4. **Medical Conversation Summarizer**: Summarize doctor-patient conversations, creating dual summaries (technical for doctors, simplified for patients)
5. **EHR Summarizer**: Create a system that generates longitudinal patient summaries from fragmented electronic health records, retrieving and synthesizing information across multiple visits, lab results, and clinical notes


### Code and Libraries:
- [🤗 Transformers Documentation](https://huggingface.co/docs/transformers/index)
- [BART Model Card](https://huggingface.co/facebook/bart-large-cnn)
- [T5 Model Card](https://huggingface.co/t5-base)
- [PyTorch Documentation](https://pytorch.org/docs/stable/index.html)
- [Text Generation Parameters](https://huggingface.co/blog/mlabonne/decoding-strategies)

### Papers:
- "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension"
- "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
- "Neural Abstractive Text Summarization with Sequence-to-Sequence Models"


## Useful Resources:

### Datasets:
- [CNN/Daily Mail Dataset](https://huggingface.co/datasets/cnn_dailymail)
- [XSum Dataset](https://huggingface.co/datasets/xsum)
- [Multi-News](https://huggingface.co/datasets/multi_news)

### Evaluation Tools:
- [ROUGE Implementation in Python](https://github.com/google-research/google-research/tree/master/rouge)
- [BERTScore](https://github.com/Tiiiger/bert_score)


# **Facilitator(s) Details**

**Facilitator(s):**

*   Name: Nana Sam Yeboah                       
*   Email: nanayeb34@gmail.com
*   LinkedIn: [Nana Sam Yeboah](https://www.linkedin.com/in/nana-sam-yeboah-0b664484)

# 

*   Name: Audrey Eyram Agbeve
*   Email: audreyagbeve02@gmail.com
*   LinkedIn: [Audrey (Eyram) Agbeve](https://www.linkedin.com/in/audreyagbeve02/)

### Please rate this Tutorial

<img src="images/Day1_feedback.png" height=500 width=500  >