| title | Data Analyst Agent |
|---|---|
| emoji | 🤖 |
| colorFrom | blue |
| colorTo | purple |
| sdk | docker |
| pinned | false |
AI-powered Data Analyst Agent with an interactive web interface built using:
- Flask
- Bootstrap
- HTML/CSS
- LangChain
- LangGraph
- OpenAI
An advanced, self-iterating Data Science agent that performs comprehensive Exploratory Data Analysis (EDA) without human intervention. Built on LangGraph, it utilizes a specialized Planner-Executor-Finalizer architecture to transform raw datasets into professional, insight-driven Markdown reports.
The project now includes a modern Bootstrap + HTML frontend with a Flask backend, allowing users to:
- 📊 Upload datasets and automatically generate a professional
summary.mdEDA report. - 💬 Ask natural language questions about the uploaded dataset and receive AI-generated answers instantly.
Upload a CSV dataset and let the agent autonomously perform:
- Data Cleaning
- Statistical Analysis
- Correlation Discovery
- Pattern Detection
- Insight Generation
After analysis, the generated summary.md report is available for download.
Users can interact with the dataset using natural language queries such as:
- "What are the missing values?"
- "Which features are highly correlated?"
- "Show the average sales by region."
- "What insights can you derive from this dataset?"
The AI agent analyzes the dataframe and returns context-aware responses.
This system moves beyond basic prompt-chaining by implementing a Hierarchical State Machine. It separates strategic planning from technical execution using an isolated "internal scratchpad," ensuring the global reasoning stays focused and technical debugging remains local.
- 📍 Metadata Analyst: Automatically extracts the "DNA" of the dataframe (schema, types, shape) to ground the agent's logic in the actual data structure.
- 🏗️ The Lead Planner: A high-level strategist that breaks the analysis into a hierarchical roadmap: Data Quality → Univariate → Bivariate → Multivariate analysis.
- 💻 The Python Executor: A specialized developer node. It operates in a private
internal_historyloop, allowing it to iterate on code and self-correct syntax or logic errors without distracting the Planner. - 🧪 AST Execution REPL: A robust tool using Abstract Syntax Trees to safely execute multi-line Python scripts and capture results, mimicking a Jupyter Notebook environment.
- 📝 Technical Finalizer: Distills raw technical logs and code outputs into clean, executive summaries for the Planner, ensuring only high-value insights are promoted to global memory.
- 📊 Executive Summarizer: The final output node that compiles the entire analysis journey from the message history into a professional
summary.md.
- Interactive Web UI built using Bootstrap and HTML.
- Flask Backend Integration for file uploads, analysis, and AI interactions.
- Self-Healing Logic: Leveraging Python 3.11's enhanced error reporting, the executor identifies exactly where code fails and fixes it in real-time within the internal loop.
- Isolated Scratchpad: Technical "noise"—such as debugging attempts, tracebacks, and raw data dumps—is confined to the
internal_history. This keeps the long-term memory of the agent clean, focused, and token-efficient. - Recursive Re-Planning: After every task, the agent assesses findings. If a specific correlation, anomaly, or outlier is detected, it dynamically pivots the next step to investigate further.
- Autonomous Documentation: Automatically compiles insights and writes its own final report to a persistent Markdown file upon completion.
- Natural Language Dataset Q&A powered by OpenAI and LangGraph.
| Component | Technology |
|---|---|
| Frontend | Bootstrap + HTML |
| Backend | Flask |
| Orchestration | LangGraph (StateGraph) |
| LLM | OpenAI GPT-4o-mini |
| Frameworks | LangChain |
| Runtime | Python 3.11 |
| Data Engine | Pandas & NumPy |
| Logic Parsing | Python AST (Abstract Syntax Trees) |
| State Management | TypedDict & Annotated History |
The generated summary.md output is structured for professional stakeholders:
- Executive Summary: High-level overview of the dataset's health and primary characteristics.
- Data Quality Audit: Analysis of null values, duplicates, inconsistencies, and anomalies.
- Statistical Deep-Dive: Numerical and analytical insights from univariate and bivariate analysis.
- Strategic Recommendations: Data-driven recommendations based on discovered trends and patterns.
The system uses a custom write_python_code tool that evaluates code in a controlled local environment. It is designed for sandboxed data analysis where the agent has direct access to the df object, providing a seamless bridge between LLM reasoning and programmatic data manipulation.
Developed as a robust framework for autonomous data exploration using LangGraph, Flask, Bootstrap, and Python 3.11.