🤖 LangGraph Research Agent

A production-grade agentic research assistant built with LangGraph — web search, scraping, summarization and query rewriting in one state machine.

Give it a topic. It searches the web, scrapes relevant pages, summarizes each source, and compiles a professional research report with citations — fully autonomously.

🎯 What It Does

python agent.py "RAG indexing strategies in LLM applications"

🤖 LangGraph Research Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📌 Topic: RAG indexing strategies in LLM applications

📊 Graph Structure:
  START → [rewrite] → [search] → [scrape] → [summarize] → [compile] → END

🔍 Queries generated:  3
🌐 Pages found:        14
📄 Pages scraped:      11
📝 Sources summarized: 8
💾 Report saved to:    outputs/rag_indexing_strategies_20260407_160432.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The agent produces a full research report with sections, insights, and citations — saved as a Markdown file.

🧠 Agent Architecture

The agent is built as a LangGraph state machine with 5 nodes:

START
  ↓
[rewrite]    — Generates 3 optimized search queries from the topic
  ↓
[search]     — Searches the web using Tavily API
  ↓
[scrape]     — Scrapes and cleans each result page
  ↓
[summarize]  — Summarizes each source using LLM, filters irrelevant ones
  ↓
[compile]    — Compiles a structured research report with citations
  ↓
END

Each node reads from and writes to a shared ResearchState TypedDict — the single source of truth flowing through the graph.

📁 Project Structure

langgraph-research-agent/
│
├── src/
│   ├── state.py        # ResearchState TypedDict — shared agent memory
│   ├── rewriter.py     # Node: generates optimized search queries
│   ├── search.py       # Node: Tavily web search
│   ├── scraper.py      # Node: scrapes and cleans web pages
│   ├── summarizer.py   # Node: summarizes sources + compiles report
│   └── graph.py        # LangGraph state machine definition
│
├── outputs/            # ← generated research reports saved here
│
├── agent.py            # Main entry point
├── config.yaml         # All settings
└── requirements.txt    # Dependencies

🚀 Quickstart

1. Clone the repo

git clone https://github.com/bdeva1975/langgraph-research-agent.git
cd langgraph-research-agent

2. Create and activate virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Set your API keys

Create a .env file in the root folder:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here

Get a free Tavily API key at: https://tavily.com (1000 searches/month free)

5. Run the agent

# Pass topic as argument
python agent.py "your research topic here"

# Or run interactively
python agent.py

⚙️ Configuration

All settings are in config.yaml:

llm:
  model: "gpt-4o-mini"        # change to gpt-4o for higher quality
  temperature: 0
  max_tokens: 2000

search:
  max_results: 5              # results per search query
  max_searches: 3             # number of search queries to generate

scraper:
  max_chars: 5000             # max characters scraped per page
  timeout: 10                 # request timeout in seconds

summarizer:
  max_summary_length: 500     # max words per source summary
  final_report_length: 1000   # max words for final report

rewriter:
  num_queries: 3              # search queries to generate

output:
  dir: "outputs/"
  save_report: true           # save report as markdown file

📊 Sample Output

The agent produces a structured Markdown report:

# Research Report: RAG Indexing Strategies in LLM Applications
*Generated: 2026-04-07 16:04*

## Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach...

## Key RAG Indexing Strategies
1. **Hybrid Approaches** — combine keyword and vector retrieval [Source 1]
2. **Hierarchical Indexing** — tiered structure for nuanced retrieval [Source 2]
3. **Graph-Based Indexing** — relationship-aware retrieval [Source 3]

## Challenges
...

## Conclusion
...

## Sources
1. [RAG indexing: Structure and evaluate...](https://meilisearch.com/...)
2. [Designing RAG Application: A Case Study](https://pixion.co/...)

💡 Key Design Decisions

Why LangGraph over LangChain chains? LangGraph gives you a proper state machine with conditional edges, shared state, and easy extensibility. Adding a new node (e.g., a fact-checker or citation validator) takes 5 lines.

Why Tavily over DuckDuckGo? Tavily is purpose-built for LLM agents — clean structured results, no rate limiting, and free tier of 1000 searches/month.

Why TypedDict for state? Full type safety across all nodes. Every node knows exactly what fields are available and what it should return.

🔗 Related Projects

rag-indexing-benchmark — Compare 6 RAG indexing strategies on your own documents
langchain-query-transformer-lab — Compare 4 query transformation techniques side by side

📖 Based On

Concepts and techniques from:

AI Agents and Applications with LangChain, LangGraph and MCP — Roberto Infante (Manning, 2026) Chapters 4 & 5: Research Summarization Engine and Agentic Workflows with LangGraph

📄 License

MIT License — free to use, modify, and distribute.

If this repo helped you, please consider giving it a ⭐ — it helps others find it.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
outputs		outputs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
config.yaml		config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 LangGraph Research Agent

🎯 What It Does

🧠 Agent Architecture

📁 Project Structure

🚀 Quickstart

1. Clone the repo

2. Create and activate virtual environment

3. Install dependencies

4. Set your API keys

5. Run the agent

⚙️ Configuration

📊 Sample Output

💡 Key Design Decisions

🔗 Related Projects

📖 Based On

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 LangGraph Research Agent

🎯 What It Does

🧠 Agent Architecture

📁 Project Structure

🚀 Quickstart

1. Clone the repo

2. Create and activate virtual environment

3. Install dependencies

4. Set your API keys

5. Run the agent

⚙️ Configuration

📊 Sample Output

💡 Key Design Decisions

🔗 Related Projects

📖 Based On

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages