A production-grade agentic research assistant built with LangGraph — web search, scraping, summarization and query rewriting in one state machine.
Give it a topic. It searches the web, scrapes relevant pages, summarizes each source, and compiles a professional research report with citations — fully autonomously.
python agent.py "RAG indexing strategies in LLM applications"🤖 LangGraph Research Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📌 Topic: RAG indexing strategies in LLM applications
📊 Graph Structure:
START → [rewrite] → [search] → [scrape] → [summarize] → [compile] → END
🔍 Queries generated: 3
🌐 Pages found: 14
📄 Pages scraped: 11
📝 Sources summarized: 8
💾 Report saved to: outputs/rag_indexing_strategies_20260407_160432.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The agent produces a full research report with sections, insights, and citations — saved as a Markdown file.
The agent is built as a LangGraph state machine with 5 nodes:
START
↓
[rewrite] — Generates 3 optimized search queries from the topic
↓
[search] — Searches the web using Tavily API
↓
[scrape] — Scrapes and cleans each result page
↓
[summarize] — Summarizes each source using LLM, filters irrelevant ones
↓
[compile] — Compiles a structured research report with citations
↓
END
Each node reads from and writes to a shared ResearchState TypedDict — the single source of truth flowing through the graph.
langgraph-research-agent/
│
├── src/
│ ├── state.py # ResearchState TypedDict — shared agent memory
│ ├── rewriter.py # Node: generates optimized search queries
│ ├── search.py # Node: Tavily web search
│ ├── scraper.py # Node: scrapes and cleans web pages
│ ├── summarizer.py # Node: summarizes sources + compiles report
│ └── graph.py # LangGraph state machine definition
│
├── outputs/ # ← generated research reports saved here
│
├── agent.py # Main entry point
├── config.yaml # All settings
└── requirements.txt # Dependencies
git clone https://github.com/bdeva1975/langgraph-research-agent.git
cd langgraph-research-agentpython -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activatepip install -r requirements.txtCreate a .env file in the root folder:
OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
Get a free Tavily API key at: https://tavily.com (1000 searches/month free)
# Pass topic as argument
python agent.py "your research topic here"
# Or run interactively
python agent.pyAll settings are in config.yaml:
llm:
model: "gpt-4o-mini" # change to gpt-4o for higher quality
temperature: 0
max_tokens: 2000
search:
max_results: 5 # results per search query
max_searches: 3 # number of search queries to generate
scraper:
max_chars: 5000 # max characters scraped per page
timeout: 10 # request timeout in seconds
summarizer:
max_summary_length: 500 # max words per source summary
final_report_length: 1000 # max words for final report
rewriter:
num_queries: 3 # search queries to generate
output:
dir: "outputs/"
save_report: true # save report as markdown fileThe agent produces a structured Markdown report:
# Research Report: RAG Indexing Strategies in LLM Applications
*Generated: 2026-04-07 16:04*
## Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach...
## Key RAG Indexing Strategies
1. **Hybrid Approaches** — combine keyword and vector retrieval [Source 1]
2. **Hierarchical Indexing** — tiered structure for nuanced retrieval [Source 2]
3. **Graph-Based Indexing** — relationship-aware retrieval [Source 3]
## Challenges
...
## Conclusion
...
## Sources
1. [RAG indexing: Structure and evaluate...](https://meilisearch.com/...)
2. [Designing RAG Application: A Case Study](https://pixion.co/...)Why LangGraph over LangChain chains? LangGraph gives you a proper state machine with conditional edges, shared state, and easy extensibility. Adding a new node (e.g., a fact-checker or citation validator) takes 5 lines.
Why Tavily over DuckDuckGo? Tavily is purpose-built for LLM agents — clean structured results, no rate limiting, and free tier of 1000 searches/month.
Why TypedDict for state? Full type safety across all nodes. Every node knows exactly what fields are available and what it should return.
- rag-indexing-benchmark — Compare 6 RAG indexing strategies on your own documents
- langchain-query-transformer-lab — Compare 4 query transformation techniques side by side
Concepts and techniques from:
AI Agents and Applications with LangChain, LangGraph and MCP — Roberto Infante (Manning, 2026) Chapters 4 & 5: Research Summarization Engine and Agentic Workflows with LangGraph
MIT License — free to use, modify, and distribute.
If this repo helped you, please consider giving it a ⭐ — it helps others find it.