Data Analyst Agent

title	Data Analyst Agent
emoji	🤖
colorFrom	blue
colorTo	purple
sdk	docker
pinned	false

Data Analyst Agent

AI-powered Data Analyst Agent with an interactive web interface built using:

Flask
Bootstrap
HTML/CSS
LangChain
LangGraph
OpenAI

🤖 Autonomous Data Analyst: Multi-Agent EDA Orchestrator

An advanced, self-iterating Data Science agent that performs comprehensive Exploratory Data Analysis (EDA) without human intervention. Built on LangGraph, it utilizes a specialized Planner-Executor-Finalizer architecture to transform raw datasets into professional, insight-driven Markdown reports.

The project now includes a modern Bootstrap + HTML frontend with a Flask backend, allowing users to:

📊 Upload datasets and automatically generate a professional summary.md EDA report.
💬 Ask natural language questions about the uploaded dataset and receive AI-generated answers instantly.

🌐 Web Application Features

📁 Dataset Analysis Mode

Upload a CSV dataset and let the agent autonomously perform:

Data Cleaning
Statistical Analysis
Correlation Discovery
Pattern Detection
Insight Generation

After analysis, the generated summary.md report is available for download.

💬 Ask Questions About Your Dataset

Users can interact with the dataset using natural language queries such as:

"What are the missing values?"
"Which features are highly correlated?"
"Show the average sales by region."
"What insights can you derive from this dataset?"

The AI agent analyzes the dataframe and returns context-aware responses.

🧠 System Architecture

This system moves beyond basic prompt-chaining by implementing a Hierarchical State Machine. It separates strategic planning from technical execution using an isolated "internal scratchpad," ensuring the global reasoning stays focused and technical debugging remains local.

The Core Nodes:

📍 Metadata Analyst: Automatically extracts the "DNA" of the dataframe (schema, types, shape) to ground the agent's logic in the actual data structure.
🏗️ The Lead Planner: A high-level strategist that breaks the analysis into a hierarchical roadmap: Data Quality → Univariate → Bivariate → Multivariate analysis.
💻 The Python Executor: A specialized developer node. It operates in a private internal_history loop, allowing it to iterate on code and self-correct syntax or logic errors without distracting the Planner.
🧪 AST Execution REPL: A robust tool using Abstract Syntax Trees to safely execute multi-line Python scripts and capture results, mimicking a Jupyter Notebook environment.
📝 Technical Finalizer: Distills raw technical logs and code outputs into clean, executive summaries for the Planner, ensuring only high-value insights are promoted to global memory.
📊 Executive Summarizer: The final output node that compiles the entire analysis journey from the message history into a professional summary.md.

✨ Key Features

Interactive Web UI built using Bootstrap and HTML.
Flask Backend Integration for file uploads, analysis, and AI interactions.
Self-Healing Logic: Leveraging Python 3.11's enhanced error reporting, the executor identifies exactly where code fails and fixes it in real-time within the internal loop.
Isolated Scratchpad: Technical "noise"—such as debugging attempts, tracebacks, and raw data dumps—is confined to the internal_history. This keeps the long-term memory of the agent clean, focused, and token-efficient.
Recursive Re-Planning: After every task, the agent assesses findings. If a specific correlation, anomaly, or outlier is detected, it dynamically pivots the next step to investigate further.
Autonomous Documentation: Automatically compiles insights and writes its own final report to a persistent Markdown file upon completion.
Natural Language Dataset Q&A powered by OpenAI and LangGraph.

🛠️ Technical Stack

Component	Technology
Frontend	Bootstrap + HTML
Backend	Flask
Orchestration	LangGraph (StateGraph)
LLM	OpenAI GPT-4o-mini
Frameworks	LangChain
Runtime	Python 3.11
Data Engine	Pandas & NumPy
Logic Parsing	Python AST (Abstract Syntax Trees)
State Management	TypedDict & Annotated History

📄 Final Report Contents

The generated summary.md output is structured for professional stakeholders:

Executive Summary: High-level overview of the dataset's health and primary characteristics.
Data Quality Audit: Analysis of null values, duplicates, inconsistencies, and anomalies.
Statistical Deep-Dive: Numerical and analytical insights from univariate and bivariate analysis.
Strategic Recommendations: Data-driven recommendations based on discovered trends and patterns.

🛡️ Security & Execution

The system uses a custom write_python_code tool that evaluates code in a controlled local environment. It is designed for sandboxed data analysis where the agent has direct access to the df object, providing a seamless bridge between LLM reasoning and programmatic data manipulation.

Developed as a robust framework for autonomous data exploration using LangGraph, Flask, Bootstrap, and Python 3.11.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
templates		templates
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
eda_agent.py		eda_agent.py
plan.txt		plan.txt
pyproject.toml		pyproject.toml
question.py		question.py
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analyst Agent

🤖 Autonomous Data Analyst: Multi-Agent EDA Orchestrator

🌐 Web Application Features

📁 Dataset Analysis Mode

💬 Ask Questions About Your Dataset

🧠 System Architecture

The Core Nodes:

✨ Key Features

🛠️ Technical Stack

📄 Final Report Contents

🛡️ Security & Execution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Analyst Agent

🤖 Autonomous Data Analyst: Multi-Agent EDA Orchestrator

🌐 Web Application Features

📁 Dataset Analysis Mode

💬 Ask Questions About Your Dataset

🧠 System Architecture

The Core Nodes:

✨ Key Features

🛠️ Technical Stack

📄 Final Report Contents

🛡️ Security & Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages