### Lab 2.03 - Multi-Provider News Summarizer


In [1]:
import os
import time
import random
from openai import OpenAI
from typing import Optional, Callable, Any
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Default model
MODEL = "gpt-4o-mini"

print("✅ Setup complete!")

✅ Setup complete!


In [1]:
"""Main application entry point."""
import sys
from summarizer import NewsSummarizer, AsyncNewsSummarizer
import asyncio

async def main():
    """Run the news summarizer."""
    print("="*80)
    print("NEWS SUMMARIZER - Multi-Provider Edition")
    print("="*80)
    
    # Get user input
    category = input("\nEnter news category (technology/business/health/general): ").strip() or "technology"
    num_articles = input("How many articles to process? (1-10): ").strip()
    
    try:
        num_articles = int(num_articles)
        num_articles = max(1, min(10, num_articles))  # Clamp between 1 and 10
    except:
        num_articles = 3
    
    use_async = input("Use async processing? (y/n): ").strip().lower() == 'y'
    
    print(f"\nFetching {num_articles} articles from category: {category}")
    
    try:
        if use_async:
            # Use async version
            summarizer = AsyncNewsSummarizer()
            articles = summarizer.news_api.fetch_top_headlines(
                category=category,
                max_articles=num_articles
            )
            
            if articles:
                print(f"\nProcessing {len(articles)} articles concurrently...")
                results = await summarizer.process_articles_async(articles, max_concurrent=3)
                summarizer.generate_report(results)
        
        else:
            # Use synchronous version
            summarizer = NewsSummarizer()
            articles = summarizer.news_api.fetch_top_headlines(
                category=category,
                max_articles=num_articles
            )
            
            if articles:
                print(f"\nProcessing {len(articles)} articles...")
                results = summarizer.process_articles(articles)
                summarizer.generate_report(results)
        
        print("\n✓ Processing complete!")
    
    except KeyboardInterrupt:
        print("\n\nOperation cancelled by user.")
    
    except Exception as e:
        print(f"\n✗ Error: {e}")

await main()

✓ Configuration validated for development environment
NEWS SUMMARIZER - Multi-Provider Edition

Fetching 5 articles from category: health
✓ Fetched 4 articles from News API

Processing 4 articles concurrently...

Processing: What a cardiologist eats in a day for better heart health - ...
  Summarizing with OpenAI...

Processing: The 7-day gut reset: How to improve digestion, immunity and ...
  Summarizing with OpenAI...

Processing: Out-of-state person with measles visited N.J. hospital, heal...
  Summarizing with OpenAI...
  Summary created
  Analyzing sentiment with Anthropic...
  Summary created
  Analyzing sentiment with Anthropic...
  Summary created
  Analyzing sentiment with Anthropic...
  Sentiment analyzed

Processing: Shasta County outbreak drives Calif.'s first measles surge s...
  Summarizing with OpenAI...
  Sentiment analyzed
  Sentiment analyzed
  Summary created
  Analyzing sentiment with Anthropic...
  Sentiment analyzed

NEWS SUMMARY REPORT

1. What a cardiologist eat

## Reflections on the Multi-Provider News Summarizer

### Challenges Faced

The biggest challenge was orchestrating two different LLM providers within a single pipeline. OpenAI and Anthropic have different client libraries, response formats, and rate limits, so getting them to work together seamlessly required careful abstraction. Handling the case where one provider fails mid-article — after the summary step but before sentiment analysis — was tricky since each step depends on the previous one's output.

Another challenge was managing API costs across providers with different pricing models. Token counting differs between OpenAI (which has tiktoken) and Anthropic, so cost estimates for Anthropic calls are approximations rather than exact.

A third issue surfaced when some NewsAPI articles returned `None` for their content field, which caused `'NoneType' object is not subscriptable` errors during string slicing in the summarizer.

### How I Solved Them

I built with the companion a `LLMProviders` class that wraps both clients behind a uniform interface with `ask_openai()` and `ask_anthropic()` methods, plus an `ask_with_fallback()` method that automatically retries with the secondary provider. The `CostTracker` class handles unified cost tracking across both providers with a configurable daily budget. For the null content issue, the `NewsAPI` class uses `.get()` with default empty strings, though edge cases can still slip through. I also added jitter to the rate limiting so that API calls are spaced out more randomly instead of firing at exact intervals, which helps avoid hitting rate limits when making many requests in a row.

### What I Learned

This was one of my first times working with multiple APIs in the same project, and it really showed me why error handling matters so much. Each provider has its own quirks: different rate limits, different ways of counting tokens, different response formats, and things can break in ways I didn't expect. It taught me that I should always plan for things to go wrong, not just for the happy path. I also got a better understanding of why it helps to split code into separate files (config, API calls, LLM logic, etc.) instead of putting everything in one place. It made debugging way easier when something broke, because I could narrow down which part was causing the issue.

### Ideas for Improvement

- Check that article content isn't empty or `None` before trying to summarize it, so the app doesn't crash on bad data from the API
- Save summaries somewhere so I don't have to re-process the same article if I run it again
- Add more LLM providers (like Google Gemini) so there are more fallback options if one goes down
- Store past results in a small database so I could look at trends over time, like how sentiment shifts for a topic