LangGraph CSV Analysis Platform

A comprehensive data analysis platform that enables natural language querying of CSV files using advanced AI workflows and real-time processing.

Overview

This platform provides an intuitive interface for analyzing CSV and Excel files through natural language queries. Built with LangGraph workflows, it supports multi-file analysis, temporal data processing, and advanced statistical operations.

Key Features

Natural Language Queries: Ask questions about your data in plain English
Multi-file Analysis: Process and correlate data across multiple CSV/Excel files
Real-time Processing: WebSocket-powered progress updates during analysis
Advanced Analytics: Statistical correlation, anomaly detection, and forecasting
Multi-LLM Support: Google Gemini (recommended), Groq, OpenAI, and Anthropic APIs
Session Persistence: Maintains conversation context across sessions
Production Ready: Comprehensive error handling and retry mechanisms

Quick Start

Prerequisites

Python 3.8+
Node.js 14+
npm or yarn
AI API key (see API Key Setup below)

API Key Setup (Required)

You need an API key from one of the supported providers. Google Gemini is recommended for the best balance of performance, cost, and reliability.

Installation

Clone the repository

git clone https://github.com/Ruchir-r/LangGraph-CSV-Analysis-Task.git
cd LangGraph-CSV-Analysis-Task

Set up environment variables

cp backend/.env.example backend/.env

Edit backend/.env and add your API key:

For Google Gemini (Recommended):

GOOGLE_API_KEY=AIzaSy...
DEFAULT_LLM_PROVIDER=google
DEFAULT_MODEL=gemini-1.5-flash

For Groq (Fast & Free):

GROQ_API_KEY=gsk_...
DEFAULT_LLM_PROVIDER=groq
DEFAULT_MODEL=llama-3.1-8b-instant

For OpenAI:

OPENAI_API_KEY=sk-...
DEFAULT_LLM_PROVIDER=openai
DEFAULT_MODEL=gpt-4o-mini

For Anthropic:

ANTHROPIC_API_KEY=sk-ant-...
DEFAULT_LLM_PROVIDER=anthropic
DEFAULT_MODEL=claude-3-haiku-20240307

Install dependencies

# Backend dependencies
cd backend
pip install -r requirements.txt
cd ..

# Frontend dependencies
cd frontend
npm install
cd ..

Start the application

# Option 1: Use deployment script (recommended)
./deploy.sh

# Option 2: Manual startup
# Terminal 1 - Backend
cd backend && python main.py

# Terminal 2 - Frontend
cd frontend && npm start

Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

Usage

Basic Analysis

Upload Files: Use the web interface to upload CSV or Excel files
Ask Questions: Type natural language queries about your data
View Results: Get comprehensive analysis with visualizations and insights

Example Queries

"Show average revenue by region for the last two months"
"Which product had the highest growth rate?"
"Detect any anomalies in the sales data"
"What is the correlation between discount and revenue?"
"Forecast next quarter's revenue based on current trends"

Multi-turn Conversations

The system maintains context across questions:

User: "How did Product A perform last quarter?"
System: [Provides detailed analysis]

User: "What about compared to Product B?"
System: [Compares both products using previous context]

User: "What should we do to improve performance?"
System: [Provides actionable recommendations]

Architecture

System Components

Frontend: React application with Material-UI components
Backend: FastAPI server with REST and WebSocket endpoints
AI Engine: LangGraph workflow with 8-node processing pipeline
Database: SQLite for session and file management

Workflow Pipeline

Parse Files: Extract and validate data from uploaded files
Plan Operations: Determine analysis strategy based on query
Align Timeseries: Synchronize temporal data across files
Generate Code: Create analysis code using LLM
Validate Code: Check code safety and correctness
Execute Code: Run analysis with error handling
Trend Analysis: Perform statistical analysis and pattern detection
Explain Results: Generate human-readable insights and recommendations

Configuration

Environment Variables

Variable	Description	Required
`GOOGLE_API_KEY`	Google Gemini API key (recommended)	Yes*
`GROQ_API_KEY`	Groq API key (fast & free)	Yes*
`OPENAI_API_KEY`	OpenAI API key	Yes*
`ANTHROPIC_API_KEY`	Anthropic API key	Yes*
`DEFAULT_LLM_PROVIDER`	Default LLM provider (google/groq/openai/anthropic)	No
`DEFAULT_MODEL`	Default model for the provider	No
`DATABASE_URL`	Database connection string	No
`LOG_LEVEL`	Logging level (DEBUG/INFO/WARNING/ERROR)	No

*At least one API key is required

API Configuration

The system automatically detects available API keys and uses the configured default provider. If no default is specified, priority order:

Google Gemini (if API key available)
Groq (if API key available)
OpenAI (if API key available)
Anthropic (if API key available)

Development

Running Tests

# Run all tests
cd tests && python -m pytest

# Run with coverage
cd tests && python -m pytest --cov=../ --cov-report=html

# Run specific test categories
cd tests && python -m pytest test_integration.py
cd tests && python -m pytest test_api.py

Development Setup

# Backend development
cd backend
pip install -r requirements.txt
python main.py

# Frontend development
cd frontend
npm install
npm start

# Run in development mode with hot reload
./deploy.sh --dev

Project Structure

├── backend/
│   ├── app/                 # FastAPI application
│   ├── services/           # Core services and utilities
│   ├── langgraph_workflow.py # AI workflow implementation
│   ├── main.py             # Application entry point
│   └── requirements.txt    # Python dependencies
├── frontend/
│   ├── src/                # React application source
│   ├── public/             # Static assets
│   └── package.json        # Node.js dependencies
├── tests/                  # Test suite
├── sample_data/           # Example CSV files
├── deploy.sh              # Deployment script
└── README.md              # This file

API Reference

REST Endpoints

GET /api/v2/health - Health check
POST /api/v2/files/upload - Upload files
GET /api/v2/files/ - List uploaded files
POST /api/v2/analysis/simple - Simple analysis query
POST /api/v2/analysis/comprehensive - Comprehensive analysis
GET /api/v2/sessions/ - List sessions
GET /api/v2/sessions/{id} - Get session details

WebSocket

ws://localhost:8000/api/v2/ws/{session_id} - Real-time progress updates

Full API documentation available at: http://localhost:8000/docs

Troubleshooting

Common Issues

API Key Setup Issues

Problem: "No LLM provider configured" or "Invalid API key"

Solution:
```
# Check your .env file exists
ls -la backend/.env

# Verify API key format
cat backend/.env | grep API_KEY
```
Expected formats:
- Google Gemini: AIzaSyABC123... (39 characters)
- Groq: gsk_ABC123... (starts with gsk_)
- OpenAI: sk-proj-ABC123... or sk-ABC123...
- Anthropic: sk-ant-api03-ABC123...
Common fixes:
- Remove quotes around API key in .env file
- Ensure no spaces before/after the API key
- Copy the API key directly from provider dashboard
- Restart the backend after updating .env file
Provider-Specific Issues

Google Gemini "API key not valid":
- Ensure you've enabled the Generative AI API in Google Cloud Console
- Check API key restrictions (if any) in Google AI Studio
Groq rate limiting:
- Free tier has request limits - wait a few minutes and retry
- Consider upgrading to paid tier for higher limits
OpenAI "Insufficient quota":
- Add billing information to your OpenAI account
- Check usage limits in OpenAI dashboard
Testing your API key:
```
# Test Google Gemini key
curl "https://generativelanguage.googleapis.com/v1/models?key=YOUR_API_KEY"

# Test OpenAI key
curl -H "Authorization: Bearer YOUR_API_KEY" https://api.openai.com/v1/models
```
Port Conflicts
- Backend runs on port 8000, frontend on port 3000
- Use ./deploy.sh --port 8080 to change backend port
- Check for running processes: lsof -i :8000
Installation Issues
- Ensure Python 3.8+ and Node.js 14+ are installed
- Use virtual environment for Python dependencies
- Clear npm cache: npm cache clean --force
File Upload Issues
- Maximum file size: 10MB per file
- Supported formats: CSV, Excel (.xlsx, .xls)
- Ensure files have proper headers and structure

Logs and Debugging

# View application logs
./deploy.sh --logs

# Check system status
./deploy.sh --status

# Enable debug logging
export LOG_LEVEL=DEBUG
python backend/main.py

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes and add tests
Run test suite: cd tests && python -m pytest
Commit changes: git commit -am 'Add feature'
Push to branch: git push origin feature-name
Create Pull Request

License

This project is licensed under the MIT License. See LICENSE file for details.

Support

For issues and questions:

Create an issue on GitHub
Check the troubleshooting section above
Review API documentation at /docs endpoint

Performance and Scalability

Handles files up to 10MB each
Supports multiple concurrent sessions
WebSocket connections for real-time updates
Optimized for datasets with thousands of rows
Horizontal scaling ready with stateless design

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
archive		archive
backend		backend
docs/development		docs/development
frontend		frontend
sample_data		sample_data
tests		tests
.gitignore		.gitignore
DEPLOYMENT_READY.md		DEPLOYMENT_READY.md
DEVELOPMENT_JOURNAL.md		DEVELOPMENT_JOURNAL.md
ERROR_AWARE_RETRY_SYSTEM.md		ERROR_AWARE_RETRY_SYSTEM.md
Historical Multi-Table Data Analysis with Conversational Interface.pdf		Historical Multi-Table Data Analysis with Conversational Interface.pdf
README.md		README.md
deploy.sh		deploy.sh
start_v2_backup.sh		start_v2_backup.sh
stop.sh		stop.sh

Ruchir-r/LangGraph-CSV-Analysis-Task

Folders and files

Latest commit

History

Repository files navigation