
## ✨Why is it trending on Wikipedia?✨

Track the world's curiosities through Wikipedia's featured content endpoint, then summarize the news surrounding it. My favorite side project is https://wiki-chronicle.lovable.app/. I love this daily news digest because it's as far removed from an algorithmic news feed as you can get. You get to see what has climbed to the top of all the rest of the algorithmic feeds.

**This notebook has 4 steps**
1. Get most read article list
2. Search the news for each article with `serper`
3. Use `gpt-5-mini` to summarize each article
4. Display the results

### ⚒️ Resources ⚒️

#### 🔍 Serper 
- 3rd party Google search API with 2500 free queries upon registration
- https://serper.dev/

#### 🌐 Wikipeda Featured Contact Enpoint
- Wikimedia Foundation: parent organization of Wikipedia
- Wikimedia API: poorly documented, yet incredibly useful API with unbelievably generous usage limits
- https://api.wikimedia.org/wiki/Feed_API/Reference/Featured_content

#### 🖥️ OpenAI
- gpt-5-mini
- https://platform.openai.com/

In [None]:
# %pip install openai python-dotenv requests

In [50]:
"""
Wikipedia Trending Articles Analyzer
Analyzes trending Wikipedia articles and determines why they're popular using AI.
"""

import datetime
import json
import os
from typing import Dict, List, Optional

import requests
from dotenv import load_dotenv
from IPython.display import HTML, display
from openai import OpenAI

# Load environment variables
load_dotenv()

# API Configuration
SERPER_API_KEY = os.environ["SERPER_API_KEY"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

# Constants
DEFAULT_ARTICLE_COUNT = 5
WIKIPEDIA_USER_AGENT = "WikipediaTrendingAnalyzer/1.0"
SERPER_NEWS_URL = "https://google.serper.dev/news" 

# 🥇 Step 1: fetch most read Wikipedia articles 🥇

In [51]:
def get_top_articles(lang: str = "en", date: Optional[datetime.date] = None, n: int = DEFAULT_ARTICLE_COUNT) -> List[Dict]:
    """Fetch top trending Wikipedia articles for a given date.
    
    Args:
        lang: Language code for Wikipedia (default: "en")
        date: Date to fetch articles for (default: today)
        n: Number of articles to return (default: 5)
    
    Returns:
        List of article dictionaries from Wikipedia API
    """
    if date is None:
        date = datetime.date.today()
    
    url = f"https://api.wikimedia.org/feed/v1/wikipedia/{lang}/featured/{date:%Y/%m/%d}"
    headers = {"User-Agent": WIKIPEDIA_USER_AGENT}
    
    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        feed = response.json()
        return feed.get("mostread", {}).get("articles", [])[:n]
    except requests.RequestException as e:
        print(f"Error fetching Wikipedia data: {e}")
        return []
    except ValueError as e:
        print(f"Error parsing Wikipedia response: {e}")
        return []

# Test the function
trending_articles = get_top_articles()
print(f"Found {len(trending_articles)} trending articles")

Found 5 trending articles


In [52]:
def process_articles(raw_articles: List[Dict]) -> List[Dict]:
    """Process raw Wikipedia articles into a standardized format.
    
    Args:
        raw_articles: List of raw article dictionaries from Wikipedia API
        
    Returns:
        List of processed article dictionaries with standardized fields
    """
    processed_articles = []
    
    for article in raw_articles:
        processed_article = {
            "title": article.get("titles", {}).get("normalized", "Unknown Title"),
            "view_count": article.get("views", 0),
            "link": article.get("content_urls", {}).get("desktop", {}).get("page", ""),
            "thumbnail": article.get("thumbnail", {}).get("source"),
            "raw_news_results": {},
            "trending_reason": ""
        }
        processed_articles.append(processed_article)
    
    return processed_articles

# Process the articles
article_list = process_articles(trending_articles)
print(f"Processed {len(article_list)} articles successfully")

Processed 5 articles successfully


In [53]:
# Display processed articles for verification
for i, article in enumerate(article_list, 1):
    print(f"{i}. {article['title']} - {article['view_count']:,} views")

1. Ed Gein - 1,950,675 views
2. Google Chrome - 424,009 views
3. Monster: The Ed Gein Story - 327,387 views
4. Kantara: Chapter 1 - 317,950 views
5. Ilse Koch - 248,784 views


# 🗞️ Step 2: fetch related news 🗞️

In [60]:
def fetch_news_for_article(title: str) -> Dict:
    """Fetch recent news articles related to a Wikipedia article title.
    
    Args:
        title: The Wikipedia article title to search for
        
    Returns:
        Dictionary containing news search results from Serper API
    """
    payload = {
        "q": title,
        "autocorrect": False,
        "tbs": "qdr:w"  # Last week
    }
    
    headers = {
        "X-API-KEY": SERPER_API_KEY,
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(
            SERPER_NEWS_URL, 
            headers=headers, 
            json=payload,
            timeout=10
        )
        response.raise_for_status()
        return response.json()
    except requests.RequestException as e:
        print(f"Error fetching news for '{title}': {e}")
        return {"news": []}
    except ValueError as e:
        print(f"Error parsing news response for '{title}': {e}")
        return {"news": []}

# Fetch news for each article
print("Fetching news data for articles...")
for i, article in enumerate(article_list, 1):
    print(f"🐶 Fetching news for article {i}/{len(article_list)}: {article['title']}")
    article["raw_news_results"] = fetch_news_for_article(article["title"])

print("News data collection complete!")
  

Fetching news data for articles...
🐶 Fetching news for article 1/5: Ed Gein
🐶 Fetching news for article 2/5: Google Chrome
🐶 Fetching news for article 3/5: Monster: The Ed Gein Story
🐶 Fetching news for article 4/5: Kantara: Chapter 1
🐶 Fetching news for article 5/5: Ilse Koch
News data collection complete!


In [59]:
# Verify news data was collected
for i, article in enumerate(article_list, 1):
    news_count = len(article["raw_news_results"].get("news", []))
    print(f"{i}. {article['title']}: {news_count} news articles found")

1. Ed Gein: 10 news articles found
2. Google Chrome: 10 news articles found
3. Monster: The Ed Gein Story: 10 news articles found
4. Kantara: Chapter 1: 10 news articles found
5. Ilse Koch: 10 news articles found


In [None]:
# Initialize OpenAI client (already imported above)
openai_client = OpenAI(api_key=OPENAI_API_KEY)

def analyze_trending_reason(title: str, news_articles: List[Dict]) -> str:
    """Analyze why a Wikipedia article is trending based on recent news.
    
    Args:
        title: The Wikipedia article title
        news_articles: List of recent news articles related to the title
        
    Returns:
        AI-generated explanation of why the article is trending
    """
    if not news_articles:
        return "No recent news found to explain trending status"
    
    # Create a concise prompt for the AI
    prompt = f"""You are a concise news analysis assistant. If you see the opportunity for whimsy, you take it. 

Analyze the provided news articles to determine why the Wikipedia article "{title}" is receiving attention. 

Provide a brief explanation in no more than 2–3 short sentences. If no clear reason can be identified, output "Unclear why this article is trending". 

Do not start your answer with phrases like "Here is the reason why..." or "The article is trending because...". Avoid words like "spotlight" and "widespread".

Recent news articles:
{json.dumps(news_articles, indent=2)}"""
    
    try:
        response = openai_client.chat.completions.create(
            model="gpt-5-mini",
            messages=[
                {
                    "role": "user",
                    "content": prompt
                }
            ]
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error analyzing trends for '{title}': {e}")
        return "Error occurred during analysis"

# Analyze trending reasons for each article
print("Analyzing trending reasons with AI...")
for i, article in enumerate(article_list, 1):
    title = article["title"]
    news_articles = article["raw_news_results"].get("news", [])
    
    print(f"🔬 Analyzing article {i}/{len(article_list)}: {title}")
    
    trending_reason = analyze_trending_reason(title, news_articles)
    article["trending_reason"] = trending_reason
    
    print(f"📋 Analysis: {trending_reason}")
    print("-" * 50)

print("✅ AI analysis complete!")

Analyzing trending reasons with AI...
Analyzing article 1/5: Ed Gein
Analysis: Netflix's new series Monster: The Ed Gein Story — starring Charlie Hunnam and produced by Ryan Murphy — has reignited interest in Gein as critics and viewers scrutinize its portrayal. Numerous pieces are fact‑checking the show, revisiting his crimes, and debating the series' accuracy and ethics.  
True‑crime curiosity + a high‑profile streaming release = a bump in Wikipedia visits.
--------------------------------------------------
Analyzing article 2/5: Google Chrome
Analysis: Google Chrome is being dug up because Google rushed urgent security patches for an actively exploited zero‑day and other critical RCE flaws while proof‑of‑concept exploit code appeared online. A flurry of related trouble — a recent update reportedly breaking the browser, Gmail access problems, and visible Android feature changes (AI/desktop mode) — pushed people to look up Chrome.
--------------------------------------------------
Ana

# 📰 Step 4: Publish the results 📰

In [58]:
# Display the articles with thumbnails in a nice list format
from IPython.display import display, HTML

def create_article_html(articles):
    html = """
    <style>
    .article-list {
        font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
        max-width: 800px;
        margin: 20px 0;
    }
    .article-item {
        display: flex;
        align-items: flex-start;
        padding: 15px;
        margin-bottom: 15px;
        border: 1px solid #e1e5e9;
        border-radius: 8px;
        background: #f8f9fa;
    }
    .article-thumbnail {
        width: 80px;
        height: 80px;
        object-fit: cover;
        border-radius: 6px;
        margin-right: 15px;
        flex-shrink: 0;
    }
    .no-thumbnail {
        width: 80px;
        height: 80px;
        background: #dee2e6;
        border-radius: 6px;
        margin-right: 15px;
        display: flex;
        align-items: center;
        justify-content: center;
        color: #6c757d;
        font-size: 12px;
        text-align: center;
        flex-shrink: 0;
    }
    .article-content {
        flex: 1;
    }
    .article-title {
        font-size: 18px;
        font-weight: 600;
        color: #1a1a1a;
        margin-bottom: 5px;
        text-decoration: none;
    }
    .article-title:hover {
        color: #0066cc;
    }
    .article-views {
        color: #666;
        font-size: 14px;
        margin-bottom: 8px;
    }
    .article-reason {
        color: #444;
        font-size: 14px;
        line-height: 1.4;
    }
    </style>
    <div class="article-list">
        <h2>📈 Trending Wikipedia Articles</h2>
    """
    
    for article in articles:
        title = article["title"]
        views = f"{article["view_count"]:,}"
        link = article["link"]
        thumbnail = article.get("thumbnail")
        reason = article.get("trending_reason", "Analysis pending...")
        
        # Handle thumbnail display
        if thumbnail:
            img_html = f'<img src="{thumbnail}" alt="{title}" class="article-thumbnail">'
        else:
            img_html = '<div class="no-thumbnail">No Image</div>'
        
        html += f"""
        <div class="article-item">
            {img_html}
            <div class="article-content">
                <a href="{link}" target="_blank" class="article-title">{title}</a>
                <div class="article-views">👁️ {views} views</div>
                <div class="article-reason">{reason}</div>
            </div>
        </div>
        """
    
    html += "</div>"
    return html

# Display the articles
article_html = create_article_html(article_list)
display(HTML(article_html))