A comprehensive data analysis platform that enables natural language querying of CSV files using advanced AI workflows and real-time processing.
This platform provides an intuitive interface for analyzing CSV and Excel files through natural language queries. Built with LangGraph workflows, it supports multi-file analysis, temporal data processing, and advanced statistical operations.
- Natural Language Queries: Ask questions about your data in plain English
- Multi-file Analysis: Process and correlate data across multiple CSV/Excel files
- Real-time Processing: WebSocket-powered progress updates during analysis
- Advanced Analytics: Statistical correlation, anomaly detection, and forecasting
- Multi-LLM Support: Google Gemini (recommended), Groq, OpenAI, and Anthropic APIs
- Session Persistence: Maintains conversation context across sessions
- Production Ready: Comprehensive error handling and retry mechanisms
- Python 3.8+
- Node.js 14+
- npm or yarn
- AI API key (see API Key Setup below)
You need an API key from one of the supported providers. Google Gemini is recommended for the best balance of performance, cost, and reliability.
-
Clone the repository
git clone https://github.com/Ruchir-r/LangGraph-CSV-Analysis-Task.git cd LangGraph-CSV-Analysis-Task -
Set up environment variables
cp backend/.env.example backend/.env
Edit
backend/.envand add your API key:For Google Gemini (Recommended):
GOOGLE_API_KEY=AIzaSy... DEFAULT_LLM_PROVIDER=google DEFAULT_MODEL=gemini-1.5-flash
For Groq (Fast & Free):
GROQ_API_KEY=gsk_... DEFAULT_LLM_PROVIDER=groq DEFAULT_MODEL=llama-3.1-8b-instant
For OpenAI:
OPENAI_API_KEY=sk-... DEFAULT_LLM_PROVIDER=openai DEFAULT_MODEL=gpt-4o-mini
For Anthropic:
ANTHROPIC_API_KEY=sk-ant-... DEFAULT_LLM_PROVIDER=anthropic DEFAULT_MODEL=claude-3-haiku-20240307
-
Install dependencies
# Backend dependencies cd backend pip install -r requirements.txt cd .. # Frontend dependencies cd frontend npm install cd ..
-
Start the application
# Option 1: Use deployment script (recommended) ./deploy.sh # Option 2: Manual startup # Terminal 1 - Backend cd backend && python main.py # Terminal 2 - Frontend cd frontend && npm start
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Upload Files: Use the web interface to upload CSV or Excel files
- Ask Questions: Type natural language queries about your data
- View Results: Get comprehensive analysis with visualizations and insights
"Show average revenue by region for the last two months"
"Which product had the highest growth rate?"
"Detect any anomalies in the sales data"
"What is the correlation between discount and revenue?"
"Forecast next quarter's revenue based on current trends"
The system maintains context across questions:
User: "How did Product A perform last quarter?"
System: [Provides detailed analysis]
User: "What about compared to Product B?"
System: [Compares both products using previous context]
User: "What should we do to improve performance?"
System: [Provides actionable recommendations]
- Frontend: React application with Material-UI components
- Backend: FastAPI server with REST and WebSocket endpoints
- AI Engine: LangGraph workflow with 8-node processing pipeline
- Database: SQLite for session and file management
- Parse Files: Extract and validate data from uploaded files
- Plan Operations: Determine analysis strategy based on query
- Align Timeseries: Synchronize temporal data across files
- Generate Code: Create analysis code using LLM
- Validate Code: Check code safety and correctness
- Execute Code: Run analysis with error handling
- Trend Analysis: Perform statistical analysis and pattern detection
- Explain Results: Generate human-readable insights and recommendations
| Variable | Description | Required |
|---|---|---|
GOOGLE_API_KEY |
Google Gemini API key (recommended) | Yes* |
GROQ_API_KEY |
Groq API key (fast & free) | Yes* |
OPENAI_API_KEY |
OpenAI API key | Yes* |
ANTHROPIC_API_KEY |
Anthropic API key | Yes* |
DEFAULT_LLM_PROVIDER |
Default LLM provider (google/groq/openai/anthropic) | No |
DEFAULT_MODEL |
Default model for the provider | No |
DATABASE_URL |
Database connection string | No |
LOG_LEVEL |
Logging level (DEBUG/INFO/WARNING/ERROR) | No |
*At least one API key is required
The system automatically detects available API keys and uses the configured default provider. If no default is specified, priority order:
- Google Gemini (if API key available)
- Groq (if API key available)
- OpenAI (if API key available)
- Anthropic (if API key available)
# Run all tests
cd tests && python -m pytest
# Run with coverage
cd tests && python -m pytest --cov=../ --cov-report=html
# Run specific test categories
cd tests && python -m pytest test_integration.py
cd tests && python -m pytest test_api.py# Backend development
cd backend
pip install -r requirements.txt
python main.py
# Frontend development
cd frontend
npm install
npm start
# Run in development mode with hot reload
./deploy.sh --dev├── backend/
│ ├── app/ # FastAPI application
│ ├── services/ # Core services and utilities
│ ├── langgraph_workflow.py # AI workflow implementation
│ ├── main.py # Application entry point
│ └── requirements.txt # Python dependencies
├── frontend/
│ ├── src/ # React application source
│ ├── public/ # Static assets
│ └── package.json # Node.js dependencies
├── tests/ # Test suite
├── sample_data/ # Example CSV files
├── deploy.sh # Deployment script
└── README.md # This file
GET /api/v2/health- Health checkPOST /api/v2/files/upload- Upload filesGET /api/v2/files/- List uploaded filesPOST /api/v2/analysis/simple- Simple analysis queryPOST /api/v2/analysis/comprehensive- Comprehensive analysisGET /api/v2/sessions/- List sessionsGET /api/v2/sessions/{id}- Get session details
ws://localhost:8000/api/v2/ws/{session_id}- Real-time progress updates
Full API documentation available at: http://localhost:8000/docs
-
API Key Setup Issues
Problem: "No LLM provider configured" or "Invalid API key"
Solution:
# Check your .env file exists ls -la backend/.env # Verify API key format cat backend/.env | grep API_KEY
Expected formats:
- Google Gemini:
AIzaSyABC123...(39 characters) - Groq:
gsk_ABC123...(starts with gsk_) - OpenAI:
sk-proj-ABC123...orsk-ABC123... - Anthropic:
sk-ant-api03-ABC123...
Common fixes:
- Remove quotes around API key in .env file
- Ensure no spaces before/after the API key
- Copy the API key directly from provider dashboard
- Restart the backend after updating .env file
- Google Gemini:
-
Provider-Specific Issues
Google Gemini "API key not valid":
- Ensure you've enabled the Generative AI API in Google Cloud Console
- Check API key restrictions (if any) in Google AI Studio
Groq rate limiting:
- Free tier has request limits - wait a few minutes and retry
- Consider upgrading to paid tier for higher limits
OpenAI "Insufficient quota":
- Add billing information to your OpenAI account
- Check usage limits in OpenAI dashboard
Testing your API key:
# Test Google Gemini key curl "https://generativelanguage.googleapis.com/v1/models?key=YOUR_API_KEY" # Test OpenAI key curl -H "Authorization: Bearer YOUR_API_KEY" https://api.openai.com/v1/models
-
Port Conflicts
- Backend runs on port 8000, frontend on port 3000
- Use
./deploy.sh --port 8080to change backend port - Check for running processes:
lsof -i :8000
-
Installation Issues
- Ensure Python 3.8+ and Node.js 14+ are installed
- Use virtual environment for Python dependencies
- Clear npm cache:
npm cache clean --force
-
File Upload Issues
- Maximum file size: 10MB per file
- Supported formats: CSV, Excel (.xlsx, .xls)
- Ensure files have proper headers and structure
# View application logs
./deploy.sh --logs
# Check system status
./deploy.sh --status
# Enable debug logging
export LOG_LEVEL=DEBUG
python backend/main.py- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes and add tests
- Run test suite:
cd tests && python -m pytest - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Create Pull Request
This project is licensed under the MIT License. See LICENSE file for details.
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section above
- Review API documentation at
/docsendpoint
- Handles files up to 10MB each
- Supports multiple concurrent sessions
- WebSocket connections for real-time updates
- Optimized for datasets with thousands of rows
- Horizontal scaling ready with stateless design