# Obsidian News Digest -prototype

In this notebook, we'll build an automated news digest tool step by step. You'll implement each component with guidance, run the code to see results, and gradually build a complete working system.

Our tool will:
- Fetch news articles from popular sources
- Summarize articles using LangChain and a language model
- Format summaries in Markdown
- Save the digest to an Obsidian vault

In [44]:
# STEP 1: CONFIGURATIONS

# Import necessary libraries
import os
import time
import random
from datetime import datetime
from typing import List, Dict, Any
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Check if required environment variables are set
api_key = os.getenv("OPENAI_API_KEY")
vault_path = os.getenv("OBSIDIAN_VAULT_PATH")

if not api_key:
    print("⚠️ OPENAI_API_KEY not found! Make sure to set it in your .env file.")
    
    
if not vault_path:
    print("⚠️ OBSIDIAN_VAULT_PATH not found! Make sure to set it in your .env file.")
    
    vault_path = "./output"  # Default output folder
    
# Configuration variables
news_sources = [
    "https://www.apnews.com/",
    "https://www.c-span.org/"
]
max_articles = 10  # Maximum number of articles in the digest
output_folder = "Daily_news"  # Folder within Obsidian vault

print("✅ Configuration loaded")
print(f"📁 Output path: {os.path.join(vault_path, output_folder)}")
print(f"📰 News sources: {len(news_sources)} sources configured")
print(f"📊 Max articles: {max_articles}")

✅ Configuration loaded
📁 Output path: D:\Obsidian_VauLTs\My_Daily_newS__\Daily_news
📰 News sources: 2 sources configured
📊 Max articles: 10


In [45]:
# STEP 2: FETCHING NEWS

# Import newspaper3k library
from newspaper import Article, build

def fetch_news(source_urls: List[str], max_articles_per_source: int = 5) -> List[Dict[str, Any]]:
    """
    Fetch news articles from multiple sources using newspaper3k.
    
    Args:
        source_urls: List of news source URLs to fetch from
        max_articles_per_source: Maximum number of articles to fetch per source
        
    Returns:
        List of dictionaries containing article information
    """
    all_articles = []
    
    for url in source_urls:
        try:
            print(f"Fetching from {url}...")
            
            # Build newspaper from source URL - this analyzes the site to find articles
            paper = build(url)
            
            # Get all article URLs from the source
            article_urls = paper.article_urls()
            print(f"Found {len(article_urls)} article links")
            
            # Get the most recent/prominent articles 
            article_urls_list = list(article_urls)
            # Most news sites list headlines/important articles first in their HTML
            sampled_urls = article_urls_list[:min(max_articles_per_source * 2, len(article_urls_list))]     
            
            source_articles = []
            for article_url in sampled_urls:
                try:
                    # Create an article object
                    article = Article(article_url)
                    
                    # Download the article content
                    article.download()
                    time.sleep(1)  # Pause briefly to be polite to the server
                    
                    # Parse the article to extract content
                    article.parse()
                    
                    # Skip articles with minimal content (likely not full articles)
                    if not article.text or len(article.text) < 100:
                        continue
                        
                    # Add article information to our collection
                    source_articles.append({
                        "title": article.title,
                        "url": article.url,
                        "text": article.text[:2000],  # Limit text length for LLM (saves tokens)
                        "published_date": article.publish_date,
                        "source": url
                    })
                    
                    print(f"✓ Downloaded: {article.title[:50]}...")
                    
                    # Stop once we have enough articles from this source
                    if len(source_articles) >= max_articles_per_source:
                        break
                        
                except Exception as e:
                    print(f"Error processing article {article_url}: {e}")
                    continue
            
            # Add articles from this source to our master list
            all_articles.extend(source_articles)
            print(f"Got {len(source_articles)} articles from {url}")
            
        except Exception as e:
            print(f"Error processing source {url}: {e}")
            continue
    
    return all_articles

In [27]:
# Test the news fetcher with a single source and limited articles
test_source = [news_sources[0]]  # Just use the first source (BBC)
print("🔍 Testing news fetcher...")
test_articles = fetch_news(test_source, max_articles_per_source=1)

# Print information about the retrieved articles
print(f"\nRetrieved {len(test_articles)} articles")
if test_articles:
    article = test_articles[0]
    print(f"\nSAMPLE ARTICLE:")
    print(f"Title: {article['title']}")
    print(f"URL: {article['url']}")
    print(f"Text length: {len(article['text'])} characters")
    print(f"Text preview: {article['text'][:200]}...")
else:
    print("No articles found.")

🔍 Testing news fetcher...
Fetching from https://www.naftemporiki.gr/...
Found 2 article links
✓ Downloaded: Η Ελλάδα αναλαμβάνει την Προεδρία του Συμβουλίου Α...
Got 1 articles from https://www.naftemporiki.gr/

Retrieved 1 articles

SAMPLE ARTICLE:
Title: Η Ελλάδα αναλαμβάνει την Προεδρία του Συμβουλίου Ασφαλείας του ΟΗΕ
URL: https://www.naftemporiki.gr/politics/1950756/i-ellada-analamvanei-tin-proedria-toy-symvoylioy-asfaleias-toy-oie/
Text length: 692 characters
Text preview: Την Προεδρία του Συμβουλίου Ασφαλείας του Οργανισμού Ηνωμένων Εθνών (ΟΗΕ) αναλαμβάνει από αύριο, 1η Μαΐου 2025, η Ελλάδα και αναμένεται να διαρκέσει ένα μήνα.

Σύμφωνα με πληροφορίες της «Ν», στις 20 ...


In [46]:
# STEP 3: SUMMARIZING NEWS

# Import LangChain components
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

def summarize_and_format_articles(articles: List[Dict[str, Any]]) -> str:
    """
    Summarize news articles and format them as a complete markdown digest.
    
    Args:
        articles: List of article dictionaries with title, text, etc.
        
    Returns:
        Formatted markdown string of the complete digest
    """
    # Initialize the OpenAI chat model
    llm = ChatOpenAI(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4.1-nano-2025-04-14", 
        temperature=0.2
    )
    
    # Get today's date for the digest header
    today = datetime.now().strftime("%d %b %Y")
    
    # Handle case with no articles
    if not articles:
        return f"No major news today."
    
    # Process each article
    article_summaries = []
    for i, article in enumerate(articles):
        try:
            print(f"Summarizing article {i+1}/{len(articles)}...")
            
            # Extract source domain for display
            source_url = article['source']
            source_domain = source_url.split('//')[1].split('/')[0].replace('www.', '')
            
            # Create prompt template for article summarization with formatting
            prompt = ChatPromptTemplate.from_template(
                """
                You are a Veteran News Journalist.
                Summarize this news article in 5 sentences. 
                Focus on the main facts and key details.
                
                Title: {title}
                
                Article: {text}
                
                Format your response in this exact format:
                
                ## {title}
                
                [Your 5 sentence summary here]
                
                *Source: {source_domain}*
                
                [Read more ↗]({url})
                
                ---
                """
            )
            
            # Create and invoke the chain (prompt -> LLM)
            chain = prompt | llm
            response = chain.invoke({
                "title": article["title"], 
                "text": article["text"],
                "source_domain": source_domain,
                "url": article["url"]
            })
            
            # Add the formatted summary to our list
            article_summaries.append(response.content)
            print(f"✓ Summarized: {article['title'][:50]}...")
            
        except Exception as e:
            print(f"Error summarizing article '{article['title']}': {e}")
            # Add a placeholder for failed articles
            article_summaries.append(f"## {article['title']}\n\nSummary unavailable.\n\n---\n")
    
    # Combine everything - note that we're NOT adding the title header
    # The filename will serve as the title in Obsidian
    digest = "\n".join(article_summaries)
    
    return digest

In [29]:
# Test with one of the articles we just fetched
if test_articles and api_key:
    print("Testing summarization...")
    digest = summarize_and_format_articles([test_articles[0]])
    print("\nGenerated digest:")
    print("-------------------")
    print(digest)
else:
    print("Cannot test summarization: No articles or API key")

Testing summarization...
Summarizing article 1/1...
✓ Summarized: Η Ελλάδα αναλαμβάνει την Προεδρία του Συμβουλίου Α...

Generated digest:
-------------------
## Η Ελλάδα αναλαμβάνει την Προεδρία του Συμβουλίου Ασφαλείας του ΟΗΕ

Από αύριο, 1η Μαΐου 2025, η Ελλάδα θα αναλάβει την Προεδρία του Συμβουλίου Ασφαλείας του ΟΗΕ, με αναμενόμενη διάρκεια ένα μήνα. Στις 20 Μαΐου 2025, θα πραγματοποιηθεί στις Ηνωμένες Πολιτείες Αμερικής κεντρική εκδήλωση της Ελληνικής Προεδρίας, με την παρουσία του Πρωθυπουργού Κυριάκου Μητσοτάκη. Η κεντρική εκδήλωση θα είναι η Συνεδρίαση του ΟΗΕ με θέμα την θαλάσσια ασφάλεια. Ο Πρωθυπουργός θα απευθύνει ομιλία και θα προεδρεύσει της Συνεδρίασης του Συμβουλίου Ασφαλείας του ΟΗΕ στη Νέα Υόρκη.


In [47]:
# STEP 4: PUBLISHING TO OBSIDIAN

def publish_to_obsidian(content: str) -> str:
    """
    Publish the formatted digest to Obsidian vault.
    
    Args:
        content: Formatted markdown content
        
    Returns:
        Path to the created file
    """
    # Get today's date for the filename
    today = datetime.now().strftime("%d %b %Y")
    
    # Create path to the output folder in the vault
    folder_path = os.path.join(vault_path, output_folder)
    
    # Create folder if it doesn't exist
    os.makedirs(folder_path, exist_ok=True)
    
    # Create file path with a clear descriptive name
    file_name = f"Global News Digest – {today}.md"
    file_path = os.path.join(folder_path, file_name)
    
    # Write content to file
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(content)
    
    return file_path


In [48]:
# Application

def create_news_digest(sources=None, max_articles_count=None):
    """
    Execute the complete news digest workflow
    """
    # Use provided parameters or defaults
    if sources is None:
        sources = news_sources
        
    if max_articles_count is None:
        max_articles_count = max_articles
    
    try:
        # Step 1: Fetch news articles
        print(f"📰 Fetching news from {len(sources)} sources...")
        articles = fetch_news(sources, max_articles_per_source=max_articles_count//len(sources))
        print(f"Retrieved {len(articles)} articles.")
        
        # Take top articles if we have more than max_articles_count
        selected_articles = articles[:max_articles_count]
        print(f"Selected {len(selected_articles)} articles for summarization.")
        
        # Step 2: Summarize and format articles
        print("🔍 Summarizing and formatting articles...")
        digest = summarize_and_format_articles(selected_articles)
        
        # Step 3: Publish to Obsidian
        print("💾 Publishing to Obsidian...")
        file_path = publish_to_obsidian(digest)
        
        print(f"✅ News digest published successfully to: {file_path}")
        return file_path
        
    except Exception as e:
        print(f"❌ Error in news digest pipeline: {e}")
        return None

In [36]:
# Run the workflow with just one source and a small number of articles
test_workflow_result = create_news_digest(
    sources=[news_sources[0]],  # Just use BBC for testing
    max_articles_count=2  # Limit to 2 articles for quick testing
)

if test_workflow_result:
    print(f"\n🎉 Success! Your news digest is ready at: {test_workflow_result}")
    print("Check your Obsidian vault to see the complete digest.")
else:
    print("\n❌ Workflow failed. Check the error messages above.")

📰 Fetching news from 1 sources...
Fetching from https://www.naftemporiki.gr/...
Found 0 article links
Got 0 articles from https://www.naftemporiki.gr/
Retrieved 0 articles.
Selected 0 articles for summarization.
🔍 Summarizing and formatting articles...
💾 Publishing to Obsidian...
✅ News digest published successfully to: D:\Obsidian_VauLTs\My_Daily_newS__\Daily_news\Global News Digest – 30 Apr 2025.md

🎉 Success! Your news digest is ready at: D:\Obsidian_VauLTs\My_Daily_newS__\Daily_news\Global News Digest – 30 Apr 2025.md
Check your Obsidian vault to see the complete digest.


In [49]:
# Run the complete workflow with all configured sources and max articles
full_workflow_result = create_news_digest()

if full_workflow_result:
    print(f"\n🎉 Success! Your complete news digest is ready at: {full_workflow_result}")
    print("Check your Obsidian vault to see the full digest.")
else:
    print("\n❌ Complete workflow failed. Check the error messages above.")

📰 Fetching news from 2 sources...
Fetching from https://www.apnews.com/...
Found 148 article links
✓ Downloaded: Kremlin says a deal to end the war with Ukraine ca...
✓ Downloaded: Vietnam celebrates 50 years since war’s end with f...
✓ Downloaded: Middle East latest: At least 12 killed overnight b...
✓ Downloaded: Immigrants working legally in the Texas Panhandle ...
✓ Downloaded: Takeaways from AP’s report on how Trump’s immigrat...
Got 5 articles from https://www.apnews.com/
Fetching from https://www.c-span.org/...
Found 0 article links
Got 0 articles from https://www.c-span.org/
Retrieved 5 articles.
Selected 5 articles for summarization.
🔍 Summarizing and formatting articles...
Summarizing article 1/5...
✓ Summarized: Kremlin says a deal to end the war with Ukraine ca...
Summarizing article 2/5...
✓ Summarized: Vietnam celebrates 50 years since war’s end with f...
Summarizing article 3/5...
✓ Summarized: Middle East latest: At least 12 killed overnight b...
Summarizing article 4/5