Skip to content

bdeva1975/langgraph-research-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 LangGraph Research Agent

A production-grade agentic research assistant built with LangGraph — web search, scraping, summarization and query rewriting in one state machine.

Give it a topic. It searches the web, scrapes relevant pages, summarizes each source, and compiles a professional research report with citations — fully autonomously.

Python LangGraph LangChain Tavily License


🎯 What It Does

python agent.py "RAG indexing strategies in LLM applications"
🤖 LangGraph Research Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📌 Topic: RAG indexing strategies in LLM applications

📊 Graph Structure:
  START → [rewrite] → [search] → [scrape] → [summarize] → [compile] → END

🔍 Queries generated:  3
🌐 Pages found:        14
📄 Pages scraped:      11
📝 Sources summarized: 8
💾 Report saved to:    outputs/rag_indexing_strategies_20260407_160432.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The agent produces a full research report with sections, insights, and citations — saved as a Markdown file.


🧠 Agent Architecture

The agent is built as a LangGraph state machine with 5 nodes:

START
  ↓
[rewrite]    — Generates 3 optimized search queries from the topic
  ↓
[search]     — Searches the web using Tavily API
  ↓
[scrape]     — Scrapes and cleans each result page
  ↓
[summarize]  — Summarizes each source using LLM, filters irrelevant ones
  ↓
[compile]    — Compiles a structured research report with citations
  ↓
END

Each node reads from and writes to a shared ResearchState TypedDict — the single source of truth flowing through the graph.


📁 Project Structure

langgraph-research-agent/
│
├── src/
│   ├── state.py        # ResearchState TypedDict — shared agent memory
│   ├── rewriter.py     # Node: generates optimized search queries
│   ├── search.py       # Node: Tavily web search
│   ├── scraper.py      # Node: scrapes and cleans web pages
│   ├── summarizer.py   # Node: summarizes sources + compiles report
│   └── graph.py        # LangGraph state machine definition
│
├── outputs/            # ← generated research reports saved here
│
├── agent.py            # Main entry point
├── config.yaml         # All settings
└── requirements.txt    # Dependencies

🚀 Quickstart

1. Clone the repo

git clone https://github.com/bdeva1975/langgraph-research-agent.git
cd langgraph-research-agent

2. Create and activate virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Set your API keys

Create a .env file in the root folder:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here

Get a free Tavily API key at: https://tavily.com (1000 searches/month free)

5. Run the agent

# Pass topic as argument
python agent.py "your research topic here"

# Or run interactively
python agent.py

⚙️ Configuration

All settings are in config.yaml:

llm:
  model: "gpt-4o-mini"        # change to gpt-4o for higher quality
  temperature: 0
  max_tokens: 2000

search:
  max_results: 5              # results per search query
  max_searches: 3             # number of search queries to generate

scraper:
  max_chars: 5000             # max characters scraped per page
  timeout: 10                 # request timeout in seconds

summarizer:
  max_summary_length: 500     # max words per source summary
  final_report_length: 1000   # max words for final report

rewriter:
  num_queries: 3              # search queries to generate

output:
  dir: "outputs/"
  save_report: true           # save report as markdown file

📊 Sample Output

The agent produces a structured Markdown report:

# Research Report: RAG Indexing Strategies in LLM Applications
*Generated: 2026-04-07 16:04*

## Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach...

## Key RAG Indexing Strategies
1. **Hybrid Approaches** — combine keyword and vector retrieval [Source 1]
2. **Hierarchical Indexing** — tiered structure for nuanced retrieval [Source 2]
3. **Graph-Based Indexing** — relationship-aware retrieval [Source 3]

## Challenges
...

## Conclusion
...

## Sources
1. [RAG indexing: Structure and evaluate...](https://meilisearch.com/...)
2. [Designing RAG Application: A Case Study](https://pixion.co/...)

💡 Key Design Decisions

Why LangGraph over LangChain chains? LangGraph gives you a proper state machine with conditional edges, shared state, and easy extensibility. Adding a new node (e.g., a fact-checker or citation validator) takes 5 lines.

Why Tavily over DuckDuckGo? Tavily is purpose-built for LLM agents — clean structured results, no rate limiting, and free tier of 1000 searches/month.

Why TypedDict for state? Full type safety across all nodes. Every node knows exactly what fields are available and what it should return.


🔗 Related Projects


📖 Based On

Concepts and techniques from:

AI Agents and Applications with LangChain, LangGraph and MCP — Roberto Infante (Manning, 2026) Chapters 4 & 5: Research Summarization Engine and Agentic Workflows with LangGraph


📄 License

MIT License — free to use, modify, and distribute.


If this repo helped you, please consider giving it a ⭐ — it helps others find it.

About

A production-grade agentic research assistant built with LangGraph — web search, scraping, summarization and query rewriting in one state machine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages