## Notebook 06: Automated Weekly Digest

### Purpose
Build an automated system that discovers and analyzes AI research papers weekly without manual intervention. This transforms Radar from an on-demand tool into a continuous intelligence platform.

### What We'll Do

| Step | Task | Output |
|------|------|--------|
| 1 | **Build Digest Generator** | Python script that fetches and processes papers |
| 2 | **Format Markdown Report** | Beautiful, scannable weekly digest document |
| 3 | **Test Locally** | Generate sample digest for validation |
| 4 | **Create GitHub Action** | Automated workflow definition |
| 5 | **Configure Secrets** | Set up API keys for automation |
| 6 | **Deploy and Schedule** | Set weekly trigger (every Monday) |

### Key Questions to Answer
- How do we select the most interesting papers from 50-200 weekly results?
- What format makes the digest most useful and readable?
- How do we handle API rate limits and costs in automation?
- Where should the digest be stored (GitHub, email, website)?

### Expected Outcomes
- Python script generating weekly markdown reports
- GitHub Actions workflow running every Monday at 9am
- Automatic commits of digest reports to repository
- Fully autonomous research monitoring with zero manual intervention

### Automation Architecture
```
GitHub Actions Trigger (Weekly)
    ↓
Fetch Papers from ArXiv (last 7 days)
    ↓
Filter and Rank by Relevance
    ↓
Process Top 5-10 Papers Through Agents
    ↓
Generate Markdown Report
    ↓
Commit to reports/ Folder
    ↓
(Optional) Send Email Notification
```

### Design Philosophy

**Efficiency over completeness:** Process 5-10 most relevant papers deeply rather than 100 papers superficially. Focus on quality insights that save time, not overwhelming information dumps.

---



In [2]:
# Cell 2: Imports and Setup

"""
Import libraries for automated digest generation.
Reuses agent pipeline from previous notebooks.
"""

import json
import os
from datetime import datetime, timedelta
from pathlib import Path

# ArXiv and processing
import arxiv
import fitz

# Agent pipeline
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from typing import TypedDict

# Environment
from dotenv import load_dotenv
load_dotenv()

# Initialize Claude
api_key = os.getenv('ANTHROPIC_API_KEY')
if not api_key:
    raise ValueError("ANTHROPIC_API_KEY not found")

llm = ChatAnthropic(
    model="claude-sonnet-4-20250514",
    temperature=0.7,
    max_tokens=4096
)

# Create reports directory
REPORTS_DIR = Path("../reports")
REPORTS_DIR.mkdir(exist_ok=True)

print("Setup complete")
print(f"Reports directory: {REPORTS_DIR}")
print("Ready to build digest generator")

Setup complete
Reports directory: ..\reports
Ready to build digest generator


In [None]:
# Load Agent Pipeline

"""
Define agent state and functions for paper processing.
Same pipeline as Notebooks 03 and 05.
"""

class AgentState(TypedDict):
    paper_title: str
    paper_text: str
    paper_sections: dict
    technical_summary: str
    key_methods: str
    main_results: str
    limitations: str
    executive_summary: str
    key_innovation: str
    accessible_explanation: str
    technical_points: str
    processing_stage: str
    errors: list

def paper_analyzer_agent(state: AgentState) -> AgentState:
    prompt = f"""You are an expert AI researcher analyzing academic papers. 

Paper Title: {state['paper_title']}

Paper Content (excerpt):
{state['paper_text'][:8000]}

Extract these technical insights:

1. TECHNICAL SUMMARY (2-3 sentences): What problem does this solve and how?
2. KEY METHODS (bullet points): What techniques/approaches were used?
3. MAIN RESULTS (bullet points): Key findings or performance metrics?
4. LIMITATIONS (bullet points): Acknowledged limitations or future work?

Respond ONLY with valid JSON, no markdown:
{{
  "technical_summary": "...",
  "key_methods": ["...", "..."],
  "main_results": ["...", "..."],
  "limitations": ["...", "..."]
}}"""

    try:
        response = llm.invoke(prompt)
        content = response.content.strip()
        
        if content.startswith('```'):
            content = content.split('```')[1]
            if content.startswith('json'):
                content = content[4:]
        content = content.strip()
        
        result = json.loads(content)
        
        state['technical_summary'] = result['technical_summary']
        state['key_methods'] = '\n'.join(f"- {m}" for m in result['key_methods'])
        state['main_results'] = '\n'.join(f"- {r}" for r in result['main_results'])
        state['limitations'] = '\n'.join(f"- {l}" for l in result['limitations'])
        state['processing_stage'] = 'analyzed'
        
    except Exception as e:
        state['errors'].append(f"Analyzer error: {str(e)}")
        state['processing_stage'] = 'analyzer_failed'
    
    return state

def simplifier_agent(state: AgentState) -> AgentState:
    prompt = f"""You are an expert science communicator making AI research accessible.

Paper Title: {state['paper_title']}
Technical Summary: {state['technical_summary']}
Key Methods: {state['key_methods']}
Main Results: {state['main_results']}

Create a clear explanation:

1. TWO-LINE SUMMARY: What is this and why does it matter? Maximum 2 sentences.
2. THE CHALLENGE (3-4 bullet points): What problem exists?
3. WHAT THIS PAPER DOES (1 paragraph): Explain the approach in simple terms.
4. KEY TECHNICAL POINTS (3-5 bullet points): Break down technical aspects simply.

Respond ONLY with valid JSON, no markdown:
{{
  "two_line_summary": "...",
  "challenge_bullets": ["...", "...", "..."],
  "solution_overview": "...",
  "technical_points": ["...", "...", "..."]
}}"""

    try:
        response = llm.invoke(prompt)
        content = response.content.strip()
        
        if content.startswith('```'):
            content = content.split('```')[1]
            if content.startswith('json'):
                content = content[4:]
        content = content.strip()
        
        result = json.loads(content)
        
        state['executive_summary'] = result['two_line_summary']
        state['key_innovation'] = '\n'.join(f"- {c}" for c in result['challenge_bullets'])
        state['accessible_explanation'] = result['solution_overview']
        state['technical_points'] = '\n'.join(f"- {t}" for t in result['technical_points'])
        state['processing_stage'] = 'simplified'
        
    except Exception as e:
        state['errors'].append(f"Simplifier error: {str(e)}")
        state['processing_stage'] = 'simplifier_failed'
    
    return state

# Build workflow
workflow = StateGraph(AgentState)
workflow.add_node("analyzer", paper_analyzer_agent)
workflow.add_node("simplifier", simplifier_agent)
workflow.set_entry_point("analyzer")
workflow.add_edge("analyzer", "simplifier")
workflow.add_edge("simplifier", END)
app = workflow.compile()

print("Agent pipeline loaded")
print("Ready to process papers for digest")

Agent pipeline loaded
Ready to process papers for digest


In [4]:
# Fetch Papers from Last Week

"""
Search ArXiv for papers published in the last 7 days.
Filters by AI/ML categories and sorts by date.
"""

def fetch_weekly_papers(days_back=7, max_papers=50):
    """
    Fetch recent papers from ArXiv.
    
    Args:
        days_back (int): Number of days to look back
        max_papers (int): Maximum papers to fetch
    
    Returns:
        list: Paper metadata dictionaries
    """
    
    # Calculate date range
    end_date = datetime.now()
    start_date = end_date - timedelta(days=days_back)
    
    print(f"Fetching papers from {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
    
    # Search ArXiv
    categories = ["cs.AI", "cs.LG", "cs.CL", "cs.CV"]
    query = " OR ".join([f"cat:{cat}" for cat in categories])
    
    search = arxiv.Search(
        query=query,
        max_results=max_papers,
        sort_by=arxiv.SortCriterion.SubmittedDate,
        sort_order=arxiv.SortOrder.Descending
    )
    
    client = arxiv.Client()
    papers = []
    
    for paper in client.results(search):
        # Filter by date
        paper_date = paper.published.replace(tzinfo=None)
        if paper_date < start_date:
            continue
        
        papers.append({
            'arxiv_id': paper.entry_id.split('/')[-1],
            'title': paper.title,
            'authors': [author.name for author in paper.authors],
            'published': paper.published.strftime('%Y-%m-%d'),
            'categories': paper.categories,
            'abstract': paper.summary,
            'pdf_url': paper.pdf_url
        })
    
    print(f"Found {len(papers)} papers in date range")
    return papers

# Test the function
test_papers = fetch_weekly_papers(days_back=7, max_papers=20)

print(f"\nSample papers:")
for i, paper in enumerate(test_papers[:3], 1):
    print(f"{i}. {paper['title'][:60]}...")
    print(f"   Published: {paper['published']}")

Fetching papers from 2026-01-21 to 2026-01-28
Found 20 papers in date range

Sample papers:
1. Evaluation of Oncotimia: An LLM based system for supporting ...
   Published: 2026-01-27
2. DuwatBench: Bridging Language and Visual Heritage through an...
   Published: 2026-01-27
3. Self-Distillation Enables Continual Learning...
   Published: 2026-01-27
