Your personal Exploratory Data Analyst + AI Agent
Drop in a CSV. Get a complete EDA — automatically.

An agentic, LLM-powered Exploratory Data Analysis pipeline built with LangGraph and Ollama.
Nine specialized AI agents analyze your dataset, then generate a polished HTML report — all running 100% locally.
You give it a CSV. It spins up a 9-stage LangGraph pipeline where each node is an AI agent that analyzes a different aspect of your data, writes a summary, and passes its findings to the next stage. At the end, you get:
- An interactive Streamlit dashboard with tabbed results and live progress
- A rich, color-coded terminal output (if run via CLI)
- A self-contained
report.html— dark-themed, browser-ready, heatmap embedded inline - A
correlation_heatmap.pngsaved tooutput/
schema → quality → stats → outliers → correlation → importance → synthesis → model_rec → feature_eng
Each node runs a Python analysis tool first, then passes the raw result to the LLM to reason over and summarize in plain English.
Architecture Note: All Python computations (correlation matrices, outlier IQR bounds, descriptive stats) run on the full dataset to ensure statistical accuracy. However, the data passed to the LLM is intentionally capped (e.g., only the top 20 strongest correlations or top 15 outlier features) to prevent context window overload and hallucination.
| # | Agent | What it analyzes |
|---|---|---|
| 1 | Schema | Shape, column types, null counts per column |
| 2 | Quality | Duplicates, missing value %, columns with nulls |
| 3 | Statistics | Descriptive stats, skewness, categorical value counts |
| 4 | Outliers | IQR-based detection — count, bounds, example values |
| 5 | Correlation | Pearson matrix, multicollinearity flags, heatmap |
| 6 | Feature Importance | Target-aware correlation ranking (powered by robust multi-tiered target detection), falling back to variance (numeric) and entropy (categorical) ranking |
| 7 | Synthesis | Full EDA narrative — overview, issues, patterns, recommendations |
| 8 | Model Recommendation | Infers problem type, recommends models, flags uncertainty, suggests metrics |
| 9 | Feature Engineering | Suggests concrete new features: log transforms, bins, interactions, encodings |
Note: Feature engineering code is executed in a restricted sandbox (__builtins__ stripped, only pd, np, and df exposed) with a 2-attempt reflection loop that feeds errors back to the LLM for self-correction. The pipeline uses robust multi-tiered target detection to completely hide the target variable from the LLM prompt and sandbox to prevent data leakage, and aggressively checks newly generated features to reject trivial copies (correlation > 0.999) of existing features.
Install Ollama and pull the model:
ollama pull llama3.2pip install -r requirements.txtstreamlit run app.pyThis will open the EDAgent web dashboard in your browser. Just drag and drop your CSV into the upload area!
If you prefer the terminal, you can run the pipeline directly:
# On your own dataset
python pipeline.py your_dataset.csv
# With built-in test data
python pipeline.py┌──────────────────────────────────────────────────────────────────┐
│ EDAgent Pipeline │
│ Dataset: Teen_Mental_Health_Dataset.csv Rows: 1200 Cols: 13 │
└──────────────────────────────────────────────────────────────────┘
[ schema ] Running schema agent...
[ quality ] Running quality agent...
[ stats ] Running stats agent...
[ outliers ] Running outlier agent...
[ correlation ] Running correlation agent...
[ importance ] Running feature importance agent...
[ synthesis ] Running synthesis agent...
[ model-rec ] Running model recommendation agent...
[ feature-eng ] Running feature engineering agent...
┌─────────────────────────────────────────────────────────────────┐
│ Done! │
│ HTML Report → output/report.html │
│ Heatmap → output/correlation_heatmap.png │
└─────────────────────────────────────────────────────────────────┘
| Tool | Role |
|---|---|
| Streamlit | Interactive web dashboard |
| LangGraph | Agent orchestration & state management |
| LangChain + Ollama | Local LLM inference |
| Llama 3.2 | The underlying language model |
| pandas | Data analysis |
| seaborn / matplotlib | Correlation heatmap |
| scipy | Entropy calculation for feature importance |
| rich | Terminal UI |
EDAgent/
├── app.py # Streamlit web dashboard
├── pipeline.py # Full LangGraph pipeline (all 9 agents)
├── requirements.txt
├── README.md
└── output/ # Auto-generated, git-ignored
├── report.html
└── correlation_heatmap.png
No API keys. No data sent to the cloud. Everything runs on your machine via Ollama. Swap llama3.2 for any other Ollama-compatible model by changing one line in pipeline.py:
llm = ChatOllama(model="your-model-here")Created by Varnit :)