Drop a CSV, ask a question, get charts + insights in seconds.
AI data analyst powered by GPT-4o code generation. No SQL, no Excel, no code — just ask in plain language and get answers with visualizations.
CSV upload → Schema Detection → GPT-4o Codegen (pandas) → Sandbox Execution → Answer + Chart
↓ (optional)
GPT-4o-mini → Proactive Insights + Chart
- Your CSV is parsed and its schema extracted (columns, types, stats, sample values)
- GPT-4o generates pandas code to answer your question (JSON mode)
- The code runs in a sandboxed environment — if it fails, one automatic retry with error feedback
- You get a text answer + optional chart (dark-themed matplotlib/seaborn)
- Optionally, GPT-4o-mini discovers 3-5 proactive insights with computed evidence
| Round | Model | Accuracy | Hallucinations | Decision |
|---|---|---|---|---|
| R1 | GPT-4o-mini | 55% (11/20) | 1 | NO-GO |
| R2 | GPT-4o + anti-hallucination prompt | 87.5% (17.5/20) | 0 | CONDITIONAL GO |
Evaluated on 20 golden questions (aggregation, trend, comparison, multi-step, adversarial). Median latency: 2.1s. Full eval: docs/EVAL-REPORT.md
Key decision: model upgrade from GPT-4o-mini to GPT-4o was driven entirely by eval results — not intuition.
| Component | Technology |
|---|---|
| LLM (main) | GPT-4o (OpenAI) — code generation |
| LLM (insights) | GPT-4o-mini (OpenAI) — proactive insights |
| Execution | pandas + matplotlib/seaborn in sandboxed exec() |
| Backend | FastAPI (Python) |
| Frontend | React + Tailwind (Lovable) |
| Hosting | Render ($7/mo) |
# Clone and setup
git clone https://github.com/Mehdibargach/datapilot.git
cd datapilot
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Add API keys
cp .env.example .env
# Edit .env with your OPENAI_API_KEY
# Run
uvicorn api:app --host 0.0.0.0 --port 8000API endpoints:
GET /health— health checkGET /datasets— list demo datasets (superstore, saas, marketing)POST /analyze— ask a question about your CSV
api.py ← FastAPI backend (3 endpoints)
agent.py ← Pipeline orchestrator (analyze + insights)
codegen.py ← GPT-4o code generation + meta-question detection
sandbox.py ← Sandboxed pandas code execution
schema.py ← CSV schema detection (multi-encoding)
insights.py ← GPT-4o-mini proactive insight discovery
data/ ← Demo datasets (superstore, saas, marketing)
docs/
EVAL-REPORT.md ← Full eval with golden dataset (20 questions)
ADR.md ← Architecture Decision Records
Mehdi Bargach — Builder PM · AI Products