An intelligent multi-agent application that automates academic research and literature review generation from arXiv papers. Built with AutoGen, OpenAI GPT-4o, and Streamlit.
- Multi-Agent Collaboration: Two specialized agents working together (Research + Summarization)
- Real-time Streaming: Live output as agents process research
- Intelligent Search: Finds most relevant papers using arXiv API
- Professional Summaries: Generates formatted literature reviews
- User-Friendly Interface: Streamlit-based UI with configuration options
- Error Handling: Comprehensive logging and error management
Arxiv_Research_Paper/
βββ app.py # Main Streamlit application
βββ pipeline.py # Research orchestration
βββ agents.py # Agent initialization
βββ constants.py # Configuration constants
βββ prompts.py # Agent prompts & templates
βββ utils.py # Utility functions
βββ requirements.txt # Project dependencies
βββ ARCHITECTURE.md # Detailed architecture guide
βββ .env # Environment variables (not in repo)
- Python 3.8+
- OpenAI API key
- Clone or download the project
- Install dependencies:
pip install -r requirements.txt- Create a
.envfile in the project root:
OPENAI_API_KEY=your_openai_api_key_herestreamlit run app.pyThe application will open in your browser at http://localhost:8501
-
Select a Research Topic:
- Choose from preset topics or enter a custom topic
- Adjust maximum number of papers to retrieve
-
Start Research:
- Click the "π Start Research" button
- Watch real-time results stream in
-
View Results:
- ArXiv Research Agent fetches relevant papers
- Summarizer Agent creates a literature review
- Both responses are displayed in markdown format
All configuration is centralized in constants.py:
- Model: GPT-4o (configurable)
- Max Results: 5 papers (adjustable via UI)
- Agent Names: ArxivResearchAgent, SummarizerAgent
- Max Turns: 2 (conversation rounds)
βββββββββββββββββββββββββββββββββββββββ
β User Input (Topic Selection) β
ββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β ArxivResearchAgent β
β - Formulates search query β
β - Fetches top relevant papers β
β - Returns JSON paper list β
ββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β SummarizerAgent β
β - Analyzes paper list β
β - Generates literature review β
β - Formats output in Markdown β
ββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Display Results (Streamlit UI) β
βββββββββββββββββββββββββββββββββββββββ
Main Streamlit interface with:
- Sidebar configuration panel
- Topic selection and input
- Real-time result display
- Error handling and user feedback
Orchestration layer with:
ResearchTeamclass for agent management- Async research execution
- Stream-based output handling
Agent initialization with:
- OpenAI client setup
- ArXiv Research Agent
- Summarizer Agent
- Environment variable management
Centralized configuration:
- Model settings
- Agent names
- UI configuration
- Default values
Prompt engineering:
- Research agent system message
- Summarizer agent system message
- Task templates
Utility functions:
arxiv_research(): Search arXiv APIformat_papers_for_display(): Format output- Logging utilities
Required in .env:
OPENAI_API_KEY: Your OpenAI API key
autogen-agentchat: Multi-agent orchestrationautogen-ext: AutoGen extensionsautogen-ext[openai]: OpenAI integrationstreamlit: Web UI frameworkarxiv: arXiv API clientpython-dotenv: Environment variable management
See requirements.txt for specific versions.
- API Costs: Running this application will incur OpenAI API costs
- Rate Limiting: Be mindful of arXiv API rate limits
- API Key: Never commit
.envfile to version control - Async Execution: Application runs async tasks; ensure proper environment setup
- Ensure
.envfile exists in project root - Verify the key format is correct
- Check file permissions
- Ensure all dependencies are installed
- Try running:
pip install -r requirements.txt --upgrade - Check if port 8501 is available
- Check internet connection
- Verify OpenAI API status
- Reduce max results for faster processing
- Typical Research Time: 30-60 seconds for 5 papers
- Concurrent Operations: Supports async processing
- Memory Usage: Minimal for normal usage
This project is open source. Use and modify as needed.
To improve this project:
- Add tests (currently missing)
- Implement result caching
- Add data persistence
- Improve error recovery
- Enhance UI/UX
For issues or questions:
- Check the ARCHITECTURE.md file for detailed information
- Review error logs in the terminal
- Verify environment setup
- Result caching and persistence
- User authentication and multi-user support
- Export research to PDF/Word
- Integration with reference management tools
- Advanced filtering and sorting options
- Custom prompt templates
- Batch research jobs