# Text Summarization with LLMs - Modern Approaches
## Interview Preparation Notebook for Senior Applied AI Scientist (Retail Banking)

---

**Goal**: Demonstrate mastery of abstractive summarization with LLMs, including T5/BART fine-tuning, prompt engineering, and handling the hallucination challenge.

**Interview Signal**: This notebook shows you understand how to leverage LLMs for summarization while managing the faithfulness/fluency tradeoff critical in banking.

## 1. Business Context (Banking Lens)

### Why LLMs for Summarization Now?

| Extractive Limitation | LLM Solution |
|----------------------|---------------|
| Choppy sentence fragments | Fluent, natural prose |
| Can't synthesize across sections | Integrates information |
| Fixed compression ratio | Flexible length control |
| No abstraction | Generates new phrasing |

### Banking Use Cases

1. **Executive Briefings**: Convert 50-page reports to 1-page summaries
2. **Customer Interaction Summaries**: Natural language call recaps for CRM
3. **Research Digests**: Summarize analyst reports for quick consumption
4. **Meeting Minutes**: Auto-generate action items from transcripts

### The Hallucination Problem

**Critical Banking Concern**: LLMs can generate fluent text that contains factual errors.

Example:
- Source: "Revenue grew 12% to $32.4 billion"
- Hallucinated summary: "Revenue grew 15% to $35 billion"

**This is why we need guardrails in banking summarization.**

## 2. Problem Definition

### LLM Summarization Approaches

| Approach | Fluency | Faithfulness | Cost | Use Case |
|----------|---------|--------------|------|----------|
| **Fine-tuned T5/BART** | High | High (if trained well) | Low | High volume |
| **GPT-4 Prompting** | Very High | Medium-High | High | Quality-critical |
| **Hybrid (Extract→Abstract)** | High | High | Medium | Best of both |

In [None]:
# Sample banking document
earnings_report = """In Q3 2024, the bank reported total revenue of $32.4 billion, representing an 8% increase 
year-over-year. Net interest income reached $14.2 billion, up $340 million from the previous quarter, 
driven primarily by higher interest rates. The net interest margin expanded by 15 basis points to 2.85%.

The consumer banking division showed particular strength with deposits growing 12% compared to the same 
period last year. This growth was attributed to successful marketing campaigns and competitive interest 
rates on savings products.

However, provision for credit losses increased to $1.2 billion, up from $800 million in Q2, reflecting 
the bank's conservative approach to the uncertain macroeconomic environment. Early signs of stress were 
noted in the credit card portfolio, particularly in the subprime segment.

On the commercial banking side, loan demand softened, especially in commercial real estate where loans 
declined by 3%. The bank maintained disciplined underwriting standards given ongoing concerns about 
office space lending.

Looking ahead, management expects continued pressure on net interest margin as deposit competition 
intensifies. Technology investments will increase by 12% next year to improve digital capabilities."""

print(f"Document length: {len(earnings_report.split())} words")

## 3-4. Implementation

### 4.1 T5/BART Summarization

In [None]:
# T5/BART summarization pseudocode
'''
from transformers import pipeline

# Load pre-trained summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Generate summary
summary = summarizer(
    earnings_report,
    max_length=150,
    min_length=50,
    do_sample=False  # Deterministic for banking
)

print(summary[0]['summary_text'])
'''

print("T5/BART summarization pseudocode shown above.")
print("Key parameters: max_length, min_length, do_sample")

### 4.2 GPT/Claude Prompting

In [None]:
def create_summarization_prompt(text, doc_type="earnings", max_words=100, style="executive"):
    """
    Create prompt for LLM summarization with banking guardrails.
    """
    
    type_instructions = {
        "earnings": """Focus on:
- Revenue and profit figures (use EXACT numbers from text)
- Key business segment performance
- Forward guidance
- Material risks""",
        "regulatory": """Focus on:
- New requirements and deadlines
- Compliance impacts
- Required actions""",
        "contract": """Focus on:
- Parties and obligations
- Key terms and dates
- Financial commitments"""
    }
    
    style_instructions = {
        "executive": "Write for a C-suite audience. Be concise and highlight implications.",
        "technical": "Include specific metrics and technical details.",
        "customer": "Use simple language appropriate for retail customers."
    }
    
    prompt = f"""You are summarizing a banking document. ACCURACY IS CRITICAL.

Document Type: {doc_type}
Target Audience: {style}
Maximum Words: {max_words}

{type_instructions.get(doc_type, '')}

{style_instructions.get(style, '')}

STRICT RULES:
1. Use EXACT numbers from the document - never round or approximate
2. Do NOT add information not in the document
3. Do NOT speculate or make predictions beyond what's stated
4. If uncertain about a fact, omit it rather than guess
5. Every claim must be directly traceable to the source

Document:
---
{text}
---

Summary:"""
    
    return prompt

# Example
prompt = create_summarization_prompt(earnings_report, "earnings", 80, "executive")
print("LLM SUMMARIZATION PROMPT")
print("=" * 50)
print(prompt[:1500] + "...")

### 4.3 Hybrid Approach (Extract → Abstract)

In [None]:
def hybrid_summarize(text, extract_sentences=5):
    """
    Hybrid approach: Extract key sentences, then rewrite fluently.
    
    Benefits:
    - Extraction ensures key facts are captured
    - Abstraction improves fluency
    - Reduces hallucination risk
    """
    
    # Step 1: Extract key sentences (use extractive methods)
    # In production, use TextRank or BM25
    sentences = text.split('. ')
    key_sentences = sentences[:extract_sentences]  # Simplified
    extracted = '. '.join(key_sentences)
    
    # Step 2: Create prompt for fluent rewriting
    prompt = f"""Rewrite the following key points as a coherent, fluent paragraph.
Do NOT add any new information. Only rephrase for readability.

Key points:
{extracted}

Fluent summary:"""
    
    return {
        'extracted': extracted,
        'rewrite_prompt': prompt
    }

result = hybrid_summarize(earnings_report, 4)
print("HYBRID APPROACH")
print("=" * 50)
print("\nExtracted sentences:")
print(result['extracted'][:500] + "...")

## 5-6. Evaluation

### Standard Metrics
- **ROUGE-1/2/L**: N-gram overlap with reference
- **BERTScore**: Semantic similarity

### Faithfulness Metrics (Critical for Banking)
- **Factual consistency**: Do facts in summary match source?
- **Hallucination detection**: Are there claims not in source?
- **Number accuracy**: Do numerical values match exactly?

In [None]:
def check_numerical_faithfulness(source, summary):
    """
    Check if numbers in summary match source.
    Critical for banking summarization.
    """
    import re
    
    # Extract numbers from both
    def extract_numbers(text):
        # Match dollar amounts, percentages, and plain numbers
        patterns = [
            r'\$[\d,]+(?:\.\d+)?\s*(?:billion|million)?',
            r'\d+(?:\.\d+)?%',
            r'\d+(?:\.\d+)?\s*(?:basis points|bps)',
        ]
        numbers = []
        for pattern in patterns:
            numbers.extend(re.findall(pattern, text.lower()))
        return set(numbers)
    
    source_numbers = extract_numbers(source)
    summary_numbers = extract_numbers(summary)
    
    # Numbers in summary should be subset of source
    hallucinated = summary_numbers - source_numbers
    
    return {
        'source_numbers': source_numbers,
        'summary_numbers': summary_numbers,
        'hallucinated': hallucinated,
        'faithful': len(hallucinated) == 0
    }

# Test with a good summary
good_summary = "Q3 revenue was $32.4 billion, up 8%. Credit provisions increased to $1.2 billion."
result = check_numerical_faithfulness(earnings_report, good_summary)
print(f"Good summary faithful: {result['faithful']}")

# Test with hallucinated summary
bad_summary = "Q3 revenue was $35 billion, up 15%. Credit provisions increased to $1.5 billion."
result = check_numerical_faithfulness(earnings_report, bad_summary)
print(f"Bad summary faithful: {result['faithful']}")
print(f"Hallucinated numbers: {result['hallucinated']}")

## 7. Production Readiness Checklist

```
FAITHFULNESS CHECKS
[ ] Verify all numbers in summary exist in source
[ ] Cross-check named entities
[ ] Detect unsupported claims (NLI-based)
[ ] Flag summaries that fail checks for human review

OUTPUT QUALITY
[ ] Length within specified bounds
[ ] No truncated sentences
[ ] Readability scoring
[ ] Grammar checking

BANKING-SPECIFIC
[ ] Disclaimer that summary is AI-generated
[ ] Link to source document
[ ] Timestamp of generation
[ ] Human review for external distribution
```

## 8. Traditional vs LLM Comparison

| Dimension | Extractive | Fine-tuned T5 | GPT-4 | Hybrid |
|-----------|-----------|---------------|-------|--------|
| **Fluency** | Low | High | Very High | High |
| **Faithfulness** | 100% | 90-95% | 85-95% | 95%+ |
| **Hallucination Risk** | None | Low | Medium | Low |
| **Cost/doc** | ~$0 | $0.001 | $0.01-0.05 | $0.005-0.02 |
| **Long docs** | Excellent | Limited (tokens) | Limited | Good |

## 9. Advanced Techniques

### Long Document Handling
```python
# For documents > context limit:
# 1. Chunk document into sections
# 2. Summarize each section
# 3. Summarize the summaries (hierarchical)

def hierarchical_summarize(doc, chunk_size=3000):
    chunks = [doc[i:i+chunk_size] for i in range(0, len(doc), chunk_size)]
    chunk_summaries = [summarize(chunk) for chunk in chunks]
    final_summary = summarize("\n".join(chunk_summaries))
    return final_summary
```

### Controllable Generation
```python
prompt = """Summarize focusing on:
- Risk factors (HIGH PRIORITY)
- Financial metrics (MEDIUM PRIORITY)
- Future outlook (LOW PRIORITY)
"""
```

### Fact Verification Pipeline
```python
# Post-generation verification:
1. Extract claims from summary
2. For each claim, check if source supports it (NLI)
3. Flag or remove unsupported claims
```

## 10. Interview Soundbites

**On Hallucination:**
> "Hallucination is the Achilles heel of LLM summarization in banking. A summary that says '12% growth' when the source says '8%' isn't a rounding error - it's material misrepresentation. I always run numerical verification and NLI-based fact checking on generated summaries."

**On Hybrid Approach:**
> "My go-to for banking is hybrid: extract key sentences first, then ask the LLM to rewrite for fluency. The extraction step ensures critical facts are captured; the abstraction step ensures readability. You get 95% of LLM fluency with near-zero hallucination risk."

**On Long Documents:**
> "For 100-page 10-Ks, I use hierarchical summarization: split into sections, summarize each, then summarize the summaries. This respects context limits while maintaining coherence. The key is preserving section headers so the final summary is structured."

**On Production:**
> "Every AI-generated summary in banking needs a disclaimer and source link. Users should know it's machine-generated and be able to verify. For external distribution, we require human review - AI-generated content to regulators or customers is too risky without oversight."

**On Cost-Quality Tradeoff:**
> "GPT-4 produces beautiful summaries but at $0.03-0.05 per document. For internal analyst reports, that's fine. For summarizing millions of customer interactions, fine-tuned T5 at $0.001 per document makes more sense. Quality is 90% as good at 3% of the cost."

---

**Q: How do you ensure faithfulness in LLM summaries?**
> Three layers: (1) Prompt engineering emphasizing exact numbers, (2) Post-generation numerical verification, (3) NLI-based claim verification. Any summary failing these checks goes to human review. For critical documents, I use extractive as the primary method.

In [None]:
print("""
╔══════════════════════════════════════════════════════════════════╗
║                    NOTEBOOK SUMMARY                               ║
╠══════════════════════════════════════════════════════════════════╣
║  Task: Text Summarization with LLMs                              ║
║  Approaches: T5/BART, GPT Prompting, Hybrid                      ║
║  Banking Use: Executive briefings, document digests              ║
║                                                                  ║
║  Key Takeaways:                                                  ║
║  1. Hallucination is critical risk - verify all facts            ║
║  2. Hybrid (extract→abstract) balances faithfulness + fluency    ║
║  3. Numerical verification is mandatory for banking              ║
║  4. Hierarchical approach for long documents                     ║
║  5. Always disclose AI-generated content                         ║
╚══════════════════════════════════════════════════════════════════╝
""")