# Tutorial 07: Research Assistant

This capstone tutorial combines all patterns from previous tutorials into a comprehensive research assistant agent.

**What you'll learn:**
- **Multi-step planning**: Breaking research tasks into phases
- **Tool integration**: Search, retrieve, and analyze
- **Reflection**: Self-critique and improve findings
- **Synthesis**: Combine results into coherent reports

By the end, you'll have a full-featured research agent that plans, searches, reflects, and synthesizes.

## Architecture Overview

The research assistant combines patterns from:
- **Tutorial 02**: Tool calling for search and retrieval
- **Tutorial 03**: Memory for tracking research context
- **Tutorial 05**: Reflection for quality improvement
- **Tutorial 06**: Plan-and-execute for structured workflow

### The Research Pipeline

![Research Assistant Graph](../docs/images/07-research-assistant-graph.png)

The pipeline orchestrates: Planner → Researcher → Analyzer → Reflector → Synthesizer, with loops back to Researcher if gaps are found.

## Graph Visualization

![Research Assistant Graph](../docs/images/07-research-assistant-graph.png)

The research assistant orchestrates planning, searching, analyzing, reflecting, and synthesizing.

In [None]:
# Setup
from langgraph_ollama_local import LocalAgentConfig

config = LocalAgentConfig()
print(f"Ollama: {config.ollama.base_url}")
print(f"Model: {config.ollama.model}")

## Step 1: Define Research State

The state tracks the full research lifecycle:

In [None]:
from typing import Annotated, List, Tuple, Optional
import operator
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages

class ResearchState(TypedDict):
    """State for the research assistant."""
    messages: Annotated[list, add_messages]  # Conversation history
    
    # Task
    query: str                               # Research query
    
    # Planning
    research_plan: List[str]                 # Research steps
    current_step: int                        # Current step index
    
    # Research
    sources: Annotated[List[dict], operator.add]  # Gathered sources
    findings: Annotated[List[str], operator.add]  # Key findings
    
    # Reflection
    critique: str                            # Current critique
    gaps: List[str]                          # Identified gaps
    iteration: int                           # Reflection iteration
    
    # Output
    report: str                              # Final synthesized report

print("Research state defined with planning, research, reflection, and output tracking")

## Step 2: Create the LLM and Tools

In [None]:
from langchain_ollama import ChatOllama
from langchain_core.tools import tool
import json

# LLM
llm = ChatOllama(
    model=config.ollama.model,
    base_url=config.ollama.base_url,
    temperature=0,
)

# Simulated research tools (in production, connect to real APIs)
@tool
def web_search(query: str) -> str:
    """Search the web for information on a topic."""
    # Simulated search results
    results = {
        "langgraph": [
            {"title": "LangGraph Documentation", "snippet": "LangGraph is a library for building stateful, multi-actor applications with LLMs.", "url": "https://langchain-ai.github.io/langgraph/"},
            {"title": "LangGraph Tutorial", "snippet": "Build agentic workflows with cycles, persistence, and human-in-the-loop.", "url": "https://python.langchain.com/docs/langgraph"},
        ],
        "agents": [
            {"title": "AI Agents Overview", "snippet": "AI agents are autonomous systems that can perceive, reason, and act.", "url": "https://example.com/agents"},
            {"title": "Building Agents with LLMs", "snippet": "Modern agents combine LLMs with tools and memory for complex tasks.", "url": "https://example.com/llm-agents"},
        ],
        "default": [
            {"title": f"Search results for: {query}", "snippet": f"Information about {query}...", "url": f"https://example.com/{query.replace(' ', '-')}"},
        ]
    }
    
    # Match query to results
    for key in results:
        if key.lower() in query.lower():
            return json.dumps(results[key])
    return json.dumps(results["default"])

@tool
def fetch_document(url: str) -> str:
    """Fetch and extract content from a URL."""
    # Simulated document content
    content = f"""Document from {url}:
    
This is simulated content that would be fetched from the URL.
In a production system, this would use requests/httpx to fetch real content.

Key points:
- Point 1: Important information about the topic
- Point 2: Additional relevant details
- Point 3: Supporting evidence and examples
"""
    return content

@tool
def take_notes(topic: str, notes: str) -> str:
    """Record research notes on a topic."""
    return f"Notes recorded for '{topic}': {notes}"

tools = [web_search, fetch_document, take_notes]
tools_by_name = {t.name: t for t in tools}

# LLM with tools
llm_with_tools = llm.bind_tools(tools)

print(f"Configured with {len(tools)} research tools: {[t.name for t in tools]}")

## Step 3: Create Planning Node

The planner creates a structured research plan:

In [None]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
import re

PLANNER_PROMPT = """You are a research planner. Create a structured plan for researching the given topic.

Return your plan as a JSON array of 3-5 research steps. Each step should be a specific, actionable task.

Example:
["Search for overview and definitions", "Find specific examples and case studies", "Gather expert opinions and analysis", "Identify key trends and future directions"]

Return ONLY the JSON array.
"""

def planner_node(state: ResearchState) -> dict:
    """Create a research plan."""
    query = state["query"]
    
    messages = [
        SystemMessage(content=PLANNER_PROMPT),
        HumanMessage(content=f"Create a research plan for: {query}")
    ]
    
    response = llm.invoke(messages)
    content = response.content
    
    # Parse JSON plan
    try:
        match = re.search(r'\[.*?\]', content, re.DOTALL)
        if match:
            plan = json.loads(match.group())
        else:
            plan = [line.strip() for line in content.split('\n') if line.strip()]
    except json.JSONDecodeError:
        plan = [f"Research: {query}"]
    
    print(f"\n=== RESEARCH PLAN ===")
    for i, step in enumerate(plan):
        print(f"  {i+1}. {step}")
    
    return {
        "research_plan": plan,
        "current_step": 0,
        "iteration": 0
    }

## Step 4: Create Researcher Node

The researcher executes the current step using tools:

In [None]:
from langchain_core.messages import ToolMessage

RESEARCHER_PROMPT = """You are a research assistant. Execute the current research step by using the available tools.

Available tools:
- web_search: Search for information on a topic
- fetch_document: Get content from a URL
- take_notes: Record important findings

Use tools to gather information, then summarize your findings.
"""

def researcher_node(state: ResearchState) -> dict:
    """Execute current research step with tools."""
    plan = state["research_plan"]
    current_step = state["current_step"]
    query = state["query"]
    gaps = state.get("gaps", [])
    
    if current_step >= len(plan):
        return {}
    
    step = plan[current_step]
    
    # Include any identified gaps
    gap_context = ""
    if gaps:
        gap_context = f"\n\nAlso address these gaps: {', '.join(gaps)}"
    
    messages = [
        SystemMessage(content=RESEARCHER_PROMPT),
        HumanMessage(content=f"Research topic: {query}\n\nCurrent step: {step}{gap_context}")
    ]
    
    response = llm_with_tools.invoke(messages)
    
    print(f"\n=== RESEARCH STEP {current_step + 1}: {step} ===")
    
    # Execute any tool calls
    new_sources = []
    new_findings = []
    
    if hasattr(response, 'tool_calls') and response.tool_calls:
        for tc in response.tool_calls:
            tool_name = tc['name']
            tool_args = tc['args']
            print(f"  Tool: {tool_name}({tool_args})")
            
            result = tools_by_name[tool_name].invoke(tool_args)
            
            # Track sources from web_search
            if tool_name == 'web_search':
                try:
                    sources = json.loads(result)
                    new_sources.extend(sources)
                except:
                    pass
            
            new_findings.append(f"{step}: {result[:200]}..." if len(result) > 200 else f"{step}: {result}")
    else:
        # No tool calls, use response directly
        new_findings.append(f"{step}: {response.content[:200]}..." if len(response.content) > 200 else f"{step}: {response.content}")
    
    print(f"  Found {len(new_sources)} sources, {len(new_findings)} findings")
    
    return {
        "sources": new_sources,
        "findings": new_findings,
        "current_step": current_step + 1,
        "gaps": []  # Clear gaps after addressing
    }

## Step 5: Create Analyzer Node

The analyzer processes and organizes findings:

In [None]:
ANALYZER_PROMPT = """You are a research analyst. Review the gathered findings and identify key insights.

Organize the information into:
1. Key facts and definitions
2. Important examples or evidence
3. Expert perspectives
4. Patterns and trends

Be concise but comprehensive.
"""

def analyzer_node(state: ResearchState) -> dict:
    """Analyze and organize findings."""
    findings = state.get("findings", [])
    query = state["query"]
    
    if not findings:
        return {"findings": ["No findings to analyze"]}
    
    findings_text = "\n".join(findings)
    
    messages = [
        SystemMessage(content=ANALYZER_PROMPT),
        HumanMessage(content=f"Research topic: {query}\n\nFindings:\n{findings_text}")
    ]
    
    response = llm.invoke(messages)
    analysis = response.content
    
    print(f"\n=== ANALYSIS ===")
    print(analysis[:300] + "..." if len(analysis) > 300 else analysis)
    
    return {
        "findings": [f"Analysis: {analysis}"]
    }

## Step 6: Create Reflector Node

The reflector critiques research quality and identifies gaps:

In [None]:
REFLECTOR_PROMPT = """You are a research quality reviewer. Evaluate the research findings and identify any gaps.

Consider:
1. Is the topic fully covered?
2. Are there missing perspectives?
3. Is the evidence sufficient?
4. Are there unanswered questions?

If the research is comprehensive, respond with: "COMPLETE"
Otherwise, list the specific gaps that need to be addressed as a JSON array.

Example gaps: ["Missing historical context", "Need more expert opinions", "Lacks practical examples"]
"""

MAX_ITERATIONS = 2

def reflector_node(state: ResearchState) -> dict:
    """Critique research and identify gaps."""
    findings = state.get("findings", [])
    query = state["query"]
    iteration = state.get("iteration", 0)
    
    findings_text = "\n".join(findings[-5:])  # Last 5 findings
    
    messages = [
        SystemMessage(content=REFLECTOR_PROMPT),
        HumanMessage(content=f"Research topic: {query}\n\nCurrent findings:\n{findings_text}")
    ]
    
    response = llm.invoke(messages)
    critique = response.content
    
    print(f"\n=== REFLECTION (iteration {iteration + 1}) ===")
    print(critique[:200] + "..." if len(critique) > 200 else critique)
    
    # Parse gaps
    gaps = []
    if "COMPLETE" not in critique.upper():
        try:
            match = re.search(r'\[.*?\]', critique, re.DOTALL)
            if match:
                gaps = json.loads(match.group())
        except:
            pass
    
    return {
        "critique": critique,
        "gaps": gaps,
        "iteration": iteration + 1
    }

## Step 7: Create Synthesizer Node

The synthesizer creates the final research report:

In [None]:
SYNTHESIZER_PROMPT = """You are a research report writer. Create a comprehensive, well-organized report from the research findings.

Structure your report with:
1. Executive Summary
2. Key Findings
3. Analysis
4. Conclusions
5. Sources (if available)

Write clearly and professionally.
"""

def synthesizer_node(state: ResearchState) -> dict:
    """Create final research report."""
    findings = state.get("findings", [])
    sources = state.get("sources", [])
    query = state["query"]
    
    findings_text = "\n".join(findings)
    sources_text = "\n".join([f"- {s.get('title', 'Unknown')}: {s.get('url', '')}" for s in sources[:5]])
    
    messages = [
        SystemMessage(content=SYNTHESIZER_PROMPT),
        HumanMessage(content=f"Research topic: {query}\n\nFindings:\n{findings_text}\n\nSources:\n{sources_text}")
    ]
    
    response = llm.invoke(messages)
    report = response.content
    
    print(f"\n=== FINAL REPORT ===")
    print(report)
    
    return {"report": report}

## Step 8: Define Routing Logic

In [None]:
def route_after_research(state: ResearchState) -> str:
    """Route after research step."""
    current_step = state["current_step"]
    plan = state["research_plan"]
    
    if current_step < len(plan):
        return "researcher"  # More steps
    return "analyzer"  # All steps done

def route_after_reflection(state: ResearchState) -> str:
    """Route after reflection."""
    gaps = state.get("gaps", [])
    iteration = state.get("iteration", 0)
    critique = state.get("critique", "")
    
    # Complete if approved or max iterations
    if "COMPLETE" in critique.upper():
        print("\n✓ Research complete!")
        return "synthesizer"
    
    if iteration >= MAX_ITERATIONS:
        print(f"\n✓ Max iterations ({MAX_ITERATIONS}) reached")
        return "synthesizer"
    
    if gaps:
        print(f"\n→ Addressing {len(gaps)} gaps...")
        return "researcher"  # Go back and fill gaps
    
    return "synthesizer"

print("Routing logic defined")

## Step 9: Build the Graph

In [None]:
from langgraph.graph import StateGraph, START, END

# Build the research assistant graph
workflow = StateGraph(ResearchState)

# Add nodes
workflow.add_node("planner", planner_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("analyzer", analyzer_node)
workflow.add_node("reflector", reflector_node)
workflow.add_node("synthesizer", synthesizer_node)

# Add edges
workflow.add_edge(START, "planner")
workflow.add_edge("planner", "researcher")
workflow.add_conditional_edges(
    "researcher",
    route_after_research,
    {"researcher": "researcher", "analyzer": "analyzer"}
)
workflow.add_edge("analyzer", "reflector")
workflow.add_conditional_edges(
    "reflector",
    route_after_reflection,
    {"researcher": "researcher", "synthesizer": "synthesizer"}
)
workflow.add_edge("synthesizer", END)

# Compile
graph = workflow.compile()

print("Research assistant graph compiled!")

In [None]:
# Visualize
from IPython.display import Image, display

try:
    display(Image(graph.get_graph().draw_mermaid_png()))
except Exception as e:
    print(f"Could not render: {e}")

## Step 10: Run the Research Assistant

In [None]:
# Run a research query
query = "What is LangGraph and how is it used for building AI agents?"

print(f"Research Query: {query}")
print("=" * 60)

result = graph.invoke({
    "query": query,
    "messages": [],
    "research_plan": [],
    "current_step": 0,
    "sources": [],
    "findings": [],
    "critique": "",
    "gaps": [],
    "iteration": 0,
    "report": ""
})

print("\n" + "=" * 60)
print("RESEARCH COMPLETE")
print(f"Sources gathered: {len(result['sources'])}")
print(f"Findings recorded: {len(result['findings'])}")
print(f"Iterations: {result['iteration']}")

In [None]:
# Helper function
def research(query: str) -> str:
    """Run research on a query and return the report."""
    result = graph.invoke({
        "query": query,
        "messages": [],
        "research_plan": [],
        "current_step": 0,
        "sources": [],
        "findings": [],
        "critique": "",
        "gaps": [],
        "iteration": 0,
        "report": ""
    })
    return result["report"]

# Try another query
report = research("What are the key differences between ReAct agents and Plan-and-Execute agents?")
print("\n" + "=" * 60)
print("FINAL REPORT:")
print("=" * 60)
print(report)

## Adding Persistence

For longer research sessions, add memory:

In [None]:
from langgraph.checkpoint.memory import MemorySaver

# Compile with persistence
memory = MemorySaver()
persistent_graph = workflow.compile(checkpointer=memory)

# Run with thread ID
config = {"configurable": {"thread_id": "research-session-1"}}

result = persistent_graph.invoke(
    {
        "query": "How does LangGraph handle state management?",
        "messages": [],
        "research_plan": [],
        "current_step": 0,
        "sources": [],
        "findings": [],
        "critique": "",
        "gaps": [],
        "iteration": 0,
        "report": ""
    },
    config=config
)

print("Research with persistence complete!")
print(f"Report length: {len(result['report'])} chars")

## Complete Code

In [None]:
# Complete Research Assistant

import json
import re
import operator
from typing import Annotated, List
from typing_extensions import TypedDict
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.tools import tool
from langchain_ollama import ChatOllama
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph_ollama_local import LocalAgentConfig

# === State ===
class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]
    query: str
    research_plan: List[str]
    current_step: int
    sources: Annotated[List[dict], operator.add]
    findings: Annotated[List[str], operator.add]
    critique: str
    gaps: List[str]
    iteration: int
    report: str

# === LLM & Tools ===
config = LocalAgentConfig()
llm = ChatOllama(model=config.ollama.model, base_url=config.ollama.base_url, temperature=0)

@tool
def web_search(query: str) -> str:
    """Search the web."""
    return json.dumps([{"title": f"Results for {query}", "snippet": f"Info about {query}"}])

tools = [web_search]
tools_by_name = {t.name: t for t in tools}
llm_with_tools = llm.bind_tools(tools)

# === Nodes ===
def planner(state):
    response = llm.invoke([HumanMessage(content=f"Create 3-5 research steps (JSON array): {state['query']}")])
    try:
        plan = json.loads(re.search(r'\[.*?\]', response.content, re.DOTALL).group())
    except:
        plan = [f"Research: {state['query']}"]
    return {"research_plan": plan, "current_step": 0, "iteration": 0}

def researcher(state):
    if state["current_step"] >= len(state["research_plan"]):
        return {}
    step = state["research_plan"][state["current_step"]]
    response = llm_with_tools.invoke([HumanMessage(content=f"Research: {step}")])
    return {"findings": [f"{step}: {response.content[:200]}"], "current_step": state["current_step"] + 1, "gaps": []}

def analyzer(state):
    response = llm.invoke([HumanMessage(content=f"Analyze: {state['findings'][-3:]}")])
    return {"findings": [f"Analysis: {response.content}"]}

def reflector(state):
    response = llm.invoke([HumanMessage(content=f"Critique (say COMPLETE if done): {state['findings'][-1]}")])
    gaps = []
    if "COMPLETE" not in response.content.upper():
        try:
            match = re.search(r'\[.*?\]', response.content, re.DOTALL)
            if match:
                gaps = json.loads(match.group())
        except:
            pass
    return {"critique": response.content, "gaps": gaps, "iteration": state["iteration"] + 1}

def synthesizer(state):
    response = llm.invoke([HumanMessage(content=f"Write report on {state['query']}:\n{state['findings']}")])
    return {"report": response.content}

def route_research(state):
    return "researcher" if state["current_step"] < len(state["research_plan"]) else "analyzer"

def route_reflection(state):
    if "COMPLETE" in state.get("critique", "").upper() or state["iteration"] >= 2:
        return "synthesizer"
    return "researcher" if state.get("gaps") else "synthesizer"

# === Graph ===
workflow = StateGraph(ResearchState)
workflow.add_node("planner", planner)
workflow.add_node("researcher", researcher)
workflow.add_node("analyzer", analyzer)
workflow.add_node("reflector", reflector)
workflow.add_node("synthesizer", synthesizer)
workflow.add_edge(START, "planner")
workflow.add_edge("planner", "researcher")
workflow.add_conditional_edges("researcher", route_research, {"researcher": "researcher", "analyzer": "analyzer"})
workflow.add_edge("analyzer", "reflector")
workflow.add_conditional_edges("reflector", route_reflection, {"researcher": "researcher", "synthesizer": "synthesizer"})
workflow.add_edge("synthesizer", END)
graph = workflow.compile()

# === Use ===
result = graph.invoke({
    "query": "What is LangGraph?",
    "messages": [], "research_plan": [], "current_step": 0,
    "sources": [], "findings": [], "critique": "", "gaps": [],
    "iteration": 0, "report": ""
})
print(result["report"])

## Key Concepts Recap

| Pattern | How It's Used |
|---------|---------------|
| **Tool Calling** | web_search, fetch_document for gathering info |
| **Planning** | Structured research steps before execution |
| **Sequential Execution** | Process each research step in order |
| **Reflection** | Critique findings, identify gaps |
| **Conditional Routing** | Loop back if gaps found |
| **Synthesis** | Combine all findings into final report |

## Production Enhancements

For a production research assistant:

1. **Real search APIs**: Integrate Tavily, SerpAPI, or Brave Search
2. **Document loaders**: Use LangChain document loaders for PDFs, web pages
3. **Vector storage**: Store and retrieve research with embeddings
4. **Human-in-the-loop**: Add approval for research directions
5. **Streaming**: Stream progress updates to the user

## Congratulations!

You've completed the LangGraph tutorial series! You now know how to build:
- Basic chatbots
- Tool-calling ReAct agents
- Persistent memory systems
- Human-in-the-loop workflows
- Self-improving reflection loops
- Plan-and-execute agents
- Full research assistants

Happy building!