Skip to content

Lokesh-code-tech/Data_Analyst_Agent

Repository files navigation

title Data Analyst Agent
emoji 🤖
colorFrom blue
colorTo purple
sdk docker
pinned false

Data Analyst Agent

AI-powered Data Analyst Agent with an interactive web interface built using:

  • Flask
  • Bootstrap
  • HTML/CSS
  • LangChain
  • LangGraph
  • OpenAI

🤖 Autonomous Data Analyst: Multi-Agent EDA Orchestrator

Python 3.11 LangGraph LLM

An advanced, self-iterating Data Science agent that performs comprehensive Exploratory Data Analysis (EDA) without human intervention. Built on LangGraph, it utilizes a specialized Planner-Executor-Finalizer architecture to transform raw datasets into professional, insight-driven Markdown reports.

The project now includes a modern Bootstrap + HTML frontend with a Flask backend, allowing users to:

  • 📊 Upload datasets and automatically generate a professional summary.md EDA report.
  • 💬 Ask natural language questions about the uploaded dataset and receive AI-generated answers instantly.

🌐 Web Application Features

📁 Dataset Analysis Mode

Upload a CSV dataset and let the agent autonomously perform:

  • Data Cleaning
  • Statistical Analysis
  • Correlation Discovery
  • Pattern Detection
  • Insight Generation

After analysis, the generated summary.md report is available for download.

💬 Ask Questions About Your Dataset

Users can interact with the dataset using natural language queries such as:

  • "What are the missing values?"
  • "Which features are highly correlated?"
  • "Show the average sales by region."
  • "What insights can you derive from this dataset?"

The AI agent analyzes the dataframe and returns context-aware responses.


🧠 System Architecture

This system moves beyond basic prompt-chaining by implementing a Hierarchical State Machine. It separates strategic planning from technical execution using an isolated "internal scratchpad," ensuring the global reasoning stays focused and technical debugging remains local.

The Core Nodes:

  • 📍 Metadata Analyst: Automatically extracts the "DNA" of the dataframe (schema, types, shape) to ground the agent's logic in the actual data structure.
  • 🏗️ The Lead Planner: A high-level strategist that breaks the analysis into a hierarchical roadmap: Data Quality → Univariate → Bivariate → Multivariate analysis.
  • 💻 The Python Executor: A specialized developer node. It operates in a private internal_history loop, allowing it to iterate on code and self-correct syntax or logic errors without distracting the Planner.
  • 🧪 AST Execution REPL: A robust tool using Abstract Syntax Trees to safely execute multi-line Python scripts and capture results, mimicking a Jupyter Notebook environment.
  • 📝 Technical Finalizer: Distills raw technical logs and code outputs into clean, executive summaries for the Planner, ensuring only high-value insights are promoted to global memory.
  • 📊 Executive Summarizer: The final output node that compiles the entire analysis journey from the message history into a professional summary.md.

✨ Key Features

  • Interactive Web UI built using Bootstrap and HTML.
  • Flask Backend Integration for file uploads, analysis, and AI interactions.
  • Self-Healing Logic: Leveraging Python 3.11's enhanced error reporting, the executor identifies exactly where code fails and fixes it in real-time within the internal loop.
  • Isolated Scratchpad: Technical "noise"—such as debugging attempts, tracebacks, and raw data dumps—is confined to the internal_history. This keeps the long-term memory of the agent clean, focused, and token-efficient.
  • Recursive Re-Planning: After every task, the agent assesses findings. If a specific correlation, anomaly, or outlier is detected, it dynamically pivots the next step to investigate further.
  • Autonomous Documentation: Automatically compiles insights and writes its own final report to a persistent Markdown file upon completion.
  • Natural Language Dataset Q&A powered by OpenAI and LangGraph.

🛠️ Technical Stack

Component Technology
Frontend Bootstrap + HTML
Backend Flask
Orchestration LangGraph (StateGraph)
LLM OpenAI GPT-4o-mini
Frameworks LangChain
Runtime Python 3.11
Data Engine Pandas & NumPy
Logic Parsing Python AST (Abstract Syntax Trees)
State Management TypedDict & Annotated History

📄 Final Report Contents

The generated summary.md output is structured for professional stakeholders:

  1. Executive Summary: High-level overview of the dataset's health and primary characteristics.
  2. Data Quality Audit: Analysis of null values, duplicates, inconsistencies, and anomalies.
  3. Statistical Deep-Dive: Numerical and analytical insights from univariate and bivariate analysis.
  4. Strategic Recommendations: Data-driven recommendations based on discovered trends and patterns.

🛡️ Security & Execution

The system uses a custom write_python_code tool that evaluates code in a controlled local environment. It is designed for sandboxed data analysis where the agent has direct access to the df object, providing a seamless bridge between LLM reasoning and programmatic data manipulation.



Developed as a robust framework for autonomous data exploration using LangGraph, Flask, Bootstrap, and Python 3.11.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors