## **Notebook 03: LangGraph Agent Prototype**

**Purpose**
Build the core agentic workflow using LangGraph. This notebook establishes the multi-agent system that will power Radar's paper analysis pipeline.

**What We'll Do**

| Step | Task | Output |
|------|------|--------|
| 1 | **Setup LangGraph** | Import libraries and configure LLM client |
| 2 | **Define Agent State** | Create state schema for data flow between agents |
| 3 | **Build Paper Analyzer** | Agent that extracts key technical insights from papers |
| 4 | **Build Simplifier** | Agent that generates accessible explanations |
| 5 | **Build Industry Matcher** | Agent that maps research to use cases |
| 6 | **Orchestrate Workflow** | Connect agents in LangGraph state machine |
| 7 | **Test Pipeline** | Run complete workflow on sample papers |

**Key Questions to Answer**
- How do I structure state in LangGraph for multi-agent workflows?
- What prompts effectively extract technical insights from papers?
- How do I chain agents while maintaining context?
- Can the pipeline handle multiple papers efficiently?

**Expected Outcomes**
- Working LangGraph state machine with 3+ agents
- Tested prompts for paper analysis and simplification
- Complete pipeline: Paper text -> Technical analysis -> Simple summary -> Industry applications
- Performance baseline for optimization

**Architecture**
```
Paper Text (from Notebook 02)
    |
    v
Paper Analyzer Agent
    |
    v
Simplifier Agent  
    |
    v
Industry Matcher Agent
    |
    v
Final Output (structured JSON)
```

---



In [5]:
# Imports and Setup

"""
Import LangGraph, LangChain, and Anthropic libraries.
Configure Claude API client for agent operations.
"""

# Core libraries
import json
import os
from typing import TypedDict, Annotated
from datetime import datetime

# LangGraph and LangChain
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic

# Load environment variables
from dotenv import load_dotenv
load_dotenv()



True

In [6]:
# Verify API key
api_key = os.getenv('ANTHROPIC_API_KEY')
if not api_key:
    raise ValueError("ANTHROPIC_API_KEY not found in environment variables")

# Initialize Claude client
llm = ChatAnthropic(
    model="claude-sonnet-4-20250514",
    temperature=0.7,
    max_tokens=4096
)

print("Setup complete")
print(f"LLM model: claude-sonnet-4-20250514")
print(f"API key loaded: {api_key[:8]}...")

Setup complete
LLM model: claude-sonnet-4-20250514
API key loaded: sk-ant-a...


In [7]:
# Cell 3: Load Sample Papers from Notebook 02

"""
Load the processed papers to use as input for our agents.
"""

import pandas as pd

# Load processed data
processed_path = '../data/processed/processed_papers_sample.json'

with open(processed_path, 'r', encoding='utf-8') as f:
    processed_papers = json.load(f)

print("LOADED PROCESSED PAPERS")
print("=" * 80)
print(f"Papers available: {len(processed_papers)}")

for i, paper in enumerate(processed_papers[:3], 1):
    print(f"\n{i}. {paper['title'][:60]}...")
    print(f"   Pages: {paper['page_count']} | Chars: {paper['char_count']:,}")
    print(f"   Sections: {paper['section_count']}")

print("\nReady to process with agents")

LOADED PROCESSED PAPERS
Papers available: 5

1. Manifold limit for the training of shallow graph convolution...
   Pages: 44 | Chars: 117,139
   Sections: 4

2. AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling f...
   Pages: 13 | Chars: 52,133
   Sections: 5

3. Chaining the Evidence: Robust Reinforcement Learning for Dee...
   Pages: 21 | Chars: 74,187
   Sections: 5

Ready to process with agents


In [11]:
# Cell 4: Define LangGraph State (UPDATED)

"""
Define the state structure that will flow between agents.
"""

class AgentState(TypedDict):
    """State schema for our agent workflow."""
    
    # Input
    paper_title: str
    paper_text: str
    paper_sections: dict
    
    # Paper Analyzer outputs
    technical_summary: str
    key_methods: str
    main_results: str
    limitations: str
    
    # Simplifier outputs (updated structure)
    executive_summary: str          # Two-line summary
    key_innovation: str              # Challenge bullets (reusing this field)
    accessible_explanation: str      # Solution overview
    technical_points: str            # NEW: Simplified technical points
    
    # Metadata
    processing_stage: str
    errors: list

print("State schema defined with new structure:")
print("  Summary -> Challenge -> Solution -> Technical Points")

State schema defined with new structure:
  Summary -> Challenge -> Solution -> Technical Points


In [12]:
# Cell 5: Paper Analyzer Agent

"""
First agent: Extracts technical insights from research papers.
Identifies methods, results, and limitations.
"""

def paper_analyzer_agent(state: AgentState) -> AgentState:
    """
    Analyze paper and extract technical insights.
    
    Args:
        state: Current agent state with paper_text
    
    Returns:
        Updated state with technical analysis
    """
    
    prompt = f"""You are an expert AI researcher analyzing academic papers. 

Paper Title: {state['paper_title']}

Paper Content (excerpt):
{state['paper_text'][:8000]}

Your task: Extract the following technical insights:

1. TECHNICAL SUMMARY (2-3 sentences): What problem does this solve and how?

2. KEY METHODS (bullet points): What techniques/approaches were used?

3. MAIN RESULTS (bullet points): What were the key findings or performance metrics?

4. LIMITATIONS (bullet points): What are the acknowledged limitations or future work needed?

Format your response as JSON:
{{
  "technical_summary": "...",
  "key_methods": ["...", "..."],
  "main_results": ["...", "..."],
  "limitations": ["...", "..."]
}}

Be precise and technical. Use the language of the field."""

    try:
        response = llm.invoke(prompt)
        result = json.loads(response.content)
        
        state['technical_summary'] = result['technical_summary']
        state['key_methods'] = '\n'.join(f"- {m}" for m in result['key_methods'])
        state['main_results'] = '\n'.join(f"- {r}" for r in result['main_results'])
        state['limitations'] = '\n'.join(f"- {l}" for l in result['limitations'])
        state['processing_stage'] = 'analyzed'
        
    except Exception as e:
        state['errors'].append(f"Analyzer error: {str(e)}")
        state['processing_stage'] = 'analyzer_failed'
    
    return state

print("Agent defined: paper_analyzer_agent")
print("Extracts technical insights from research papers")

Agent defined: paper_analyzer_agent
Extracts technical insights from research papers


In [13]:
# Cell 6: Simplifier Agent 

"""
Second agent: Translates technical insights into accessible explanations.
Uses a clear narrative structure: Summary -> Problem -> Solution -> Technical Details
"""

def simplifier_agent(state: AgentState) -> AgentState:
    """
    Generate accessible explanations from technical analysis.
    
    Args:
        state: Current agent state with technical analysis
    
    Returns:
        Updated state with simplified explanations
    """
    
    prompt = f"""You are an expert science communicator making AI research accessible to intelligent non-experts.

Paper Title: {state['paper_title']}

Technical Summary: {state['technical_summary']}

Key Methods: {state['key_methods']}

Main Results: {state['main_results']}

Your task: Create a clear, scannable explanation using this structure:

1. TWO-LINE SUMMARY: Hook the reader. What is this and why does it matter? Maximum 2 sentences.

2. THE CHALLENGE (3-4 bullet points): What problem exists? What are researchers trying to solve? Set the context.

3. WHAT THIS PAPER DOES (1 paragraph): Explain the approach/solution in simple terms. No jargon. Use analogies if helpful.

4. KEY TECHNICAL POINTS (3-5 bullet points): Break down important technical aspects into simple language. Each bullet should be understandable without deep expertise.

Format your response as JSON:
{{
  "two_line_summary": "...",
  "challenge_bullets": ["...", "...", "..."],
  "solution_overview": "...",
  "technical_points": ["...", "...", "..."]
}}

Write clearly and concisely. Assume the reader is smart but not an AI researcher."""

    try:
        response = llm.invoke(prompt)
        result = json.loads(response.content)
        
        state['executive_summary'] = result['two_line_summary']
        state['key_innovation'] = '\n'.join(f"- {c}" for c in result['challenge_bullets'])
        state['accessible_explanation'] = result['solution_overview']
        
        # Store technical points separately
        state['technical_points'] = '\n'.join(f"- {t}" for t in result['technical_points'])
        state['processing_stage'] = 'simplified'
        
    except Exception as e:
        state['errors'].append(f"Simplifier error: {str(e)}")
        state['processing_stage'] = 'simplifier_failed'
    
    return state

print("Agent defined: simplifier_agent")
print("Structure: Summary -> Challenge -> Solution -> Technical Points")

Agent defined: simplifier_agent
Structure: Summary -> Challenge -> Solution -> Technical Points
