Skip to content

Deen-Hayatu/Data_visual_Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎨 Universal Data Visualization Agent

Python LangChain Ollama License

Transform any dataset into beautiful visualizations using natural language - completely local, private, and secure!

Data Visualization Agent

A powerful local tool that uses natural language to create data visualizations through LangChain and Ollama integration.

πŸ“š Documentation

⚑ Quick Overview

Transform your data analysis with natural language queries:

πŸ’¬ "Create a histogram of customer ages"
πŸ’¬ "Show me sales trends over the last quarter" 
πŸ’¬ "Compare revenue by product category"

✨ Features:

  • πŸ€– Local AI processing (no cloud dependencies)
  • πŸ“Š Automatic plot saving (PNG + PDF formats)
  • 🎨 Professional visualizations
  • πŸ’¬ Natural language interface
  • πŸ“ Multiple data format support (CSV, Excel, JSON, etc.)

πŸš€ Quick Start

  1. Install Ollama: brew install ollama (macOS)
  2. Download AI model: ollama pull mistral:7b
  3. Install dependencies: pip install -r requirements.txt
  4. Run: python data_viz_agent_local.py

See QUICK_START.md for detailed setup instructions.

Demo GIF

πŸš€ Key Features

🎯 Universal Dataset Support

  • Multi-format compatibility: CSV, Excel (.xlsx, .xls), JSON, Parquet, TSV, and text files
  • Smart format detection: Automatically detects file types and delimiters
  • Interactive dataset selection: Browse and load any dataset at runtime
  • Dynamic column analysis: Adapts examples and queries to your specific data structure

πŸ€– Advanced AI Capabilities

  • Natural Language Processing: Ask questions in plain English
  • Local LLM Integration: Complete privacy with Ollama (no data leaves your machine)
  • Intelligent code generation: Automatically creates optimized pandas and visualization code
  • Error handling & recovery: Robust parsing with graceful error management

πŸ“Š Professional Visualizations

  • Comprehensive chart types: Histograms, scatter plots, bar charts, heatmaps, box plots, correlation matrices
  • Publication-ready styling: Beautiful, professional-grade visualizations
  • Auto-save functionality: High-quality PNG exports with customizable filenames
  • Interactive plot management: Runtime plot styling and customization

πŸ”§ Enterprise-Ready Features

  • Modular architecture: Clean, maintainable codebase with separation of concerns
  • Comprehensive error handling: Detailed error messages and recovery suggestions
  • Extensive logging: Full transparency into agent decision-making process
  • Configurable backends: Support for headless environments and various display modes

πŸ“‹ Table of Contents

⚑ Quick Start

# 1. Clone the repository
git clone https://github.com/yourusername/universal-data-viz-agent.git
cd universal-data-viz-agent

# 2. Install Ollama and pull a model
# Visit https://ollama.ai/ for installation instructions
ollama pull mistral:7b

# 3. Set up Python environment
python3 -m venv Data_viz_Agent_env
source Data_viz_Agent_env/bin/activate  # On Windows: Data_viz_Agent_env\Scripts\activate
pip install -r requirements.txt

# 4. Run the agent with your own dataset
python data_viz_agent_local.py

# 5. Enter the path to your dataset when prompted
# Example: /Users/yourusername/your_data.csv

🎯 What's New in v2.0

πŸ†• Universal Dataset Support

  • Before: Only worked with pre-defined student dataset
  • Now: Works with ANY CSV, Excel, JSON, or text file
  • Interactive selection: Choose your dataset at runtime
  • Smart detection: Automatically handles different file formats and delimiters

πŸ†• Enhanced User Experience

  • Runtime commands: info, columns, change dataset commands
  • Dynamic examples: Examples adapt to your specific dataset structure
  • Better error handling: Clear error messages with helpful suggestions
  • Flexible data types: Automatic numeric/categorical column detection

πŸ†• Professional Architecture

  • Modular design: Clean separation of dataset loading, LLM initialization, and visualization
  • Enterprise-ready: Robust error handling and logging
  • Configurable: Easy to extend and customize for specific use cases

πŸ›  Installation

Prerequisites

Step-by-Step Setup

  1. Install Ollama

    # Visit https://ollama.ai/ and follow installation instructions for your OS
    # Then pull a language model:
    ollama pull mistral:7b
    # Alternative models: llama3, codellama, etc.
  2. Clone and Setup Python Environment

    git clone https://github.com/yourusername/universal-data-viz-agent.git
    cd universal-data-viz-agent
    
    # Create virtual environment
    python3 -m venv Data_viz_Agent_env
    source Data_viz_Agent_env/bin/activate  # On Windows: Data_viz_Agent_env\Scripts\activate
    
    # Install dependencies
    pip install --upgrade pip
    pip install -r requirements.txt
  3. Prepare Your Dataset

    # The agent now works with ANY dataset file!
    # Supported formats:
    # - CSV files (.csv)
    # - Excel files (.xlsx, .xls)  
    # - JSON files (.json)
    # - Text files (.txt, .tsv)
    # - Parquet files (.parquet)
    
    # Example datasets you can try:
    # - Your own business data
    # - Public datasets from Kaggle
    # - Government open data
    # - Research datasets
  4. Run the Agent

    python data_viz_agent_local.py
    
    # You'll be prompted to enter your dataset path:
    # Example inputs:
    # /Users/yourname/sales_data.csv
    # ~/Documents/customer_data.xlsx
    # ./data/survey_results.json

πŸ’» Usage

Interactive Dataset Selection

When you run the agent, you'll be guided through an interactive setup:

python data_viz_agent_local.py

Step 1: Dataset Selection

πŸ“ DATASET SELECTION
==================================================
Supported file formats:
β€’ CSV (.csv)
β€’ Text files (.txt, .tsv)  
β€’ Excel (.xlsx, .xls)
β€’ JSON (.json)
β€’ Parquet (.parquet)

πŸ“‚ Enter the full path to your dataset file: /path/to/your/data.csv

Step 2: Choose Interaction Mode

🎯 What would you like to do?
1. Run example queries
2. Interactive mode (ask your own questions)
3. Both
4. Load a different dataset

Enhanced Interactive Commands

In interactive mode, you now have special commands:

πŸ’¬ Your question: info          # Show dataset information
πŸ’¬ Your question: columns       # List all column names  
πŸ’¬ Your question: change dataset # Load a different file
πŸ’¬ Your question: quit          # Exit the program

Universal Query Examples

The agent adapts to YOUR dataset! Here are examples for any dataset:

Data Exploration (works with any dataset)

"Show me the first 10 rows"
"What are the column names and data types?"
"How many missing values are in each column?"
"Describe the dataset with summary statistics"

Smart Visualizations (adapts to your columns)

"Create a histogram of [your_numeric_column]"
"Show correlations between all numeric columns"  
"Plot the distribution of [your_categorical_column]"
"Create a scatter plot of [column1] vs [column2]"

Advanced Analysis

"Compare [numeric_column] by [categorical_column]"
"Show me outliers in [your_column]"
"Create a heatmap of missing values"
"Plot time series data for [date_column]"

🎯 Examples

Here are some natural language queries you can try:

Basic Data Exploration

"Show me the first 5 rows of the data"
"What are the column names in this dataset?"
"How many students are in this dataset?"

Simple Visualizations

"Create a histogram of student final grades"
"Show the distribution of study time"
"Plot the ages of students"

Correlation Analysis

"Show the correlation between study time and final grade"
"Create a heatmap of all correlations"
"How does mother's education relate to student performance?"

Comparative Analysis

"Compare grades between urban and rural students"
"Create a box plot showing grades by internet access"
"Show the difference in performance between schools"

Advanced Visualizations

"Create a scatter plot of age vs final grade, colored by school"
"Plot the relationship between failures and final grade"
"Show me a pair plot of the grade variables G1, G2, G3"

🎯 Example Use Cases

πŸ“Š Business Analytics

# Load your sales data
πŸ’¬ "Show monthly revenue trends"
πŸ’¬ "Compare customer segments by purchase amount" 
πŸ’¬ "Create a funnel analysis of conversion rates"

πŸ§ͺ Research & Science

# Load your experimental data
πŸ’¬ "Plot the correlation between variables X and Y"
πŸ’¬ "Show distribution of measurement errors"
πŸ’¬ "Create box plots comparing treatment groups"

πŸ“ˆ Financial Analysis

# Load your financial data
πŸ’¬ "Show stock price volatility over time"
πŸ’¬ "Compare portfolio performance by sector"
πŸ’¬ "Create a risk-return scatter plot"

πŸŽ“ Educational Data

# Load student or survey data
πŸ’¬ "Analyze grade distributions by subject"
πŸ’¬ "Show correlation between study time and performance"
πŸ’¬ "Compare outcomes across different demographics"

Sample Output

When you ask: "Create a histogram of [your_column]"

The agent will:

  1. πŸ€” Understand: Parse your natural language request
  2. πŸ” Analyze: Examine your specific dataset structure
  3. πŸ’» Generate: Create optimized Python code for your data
  4. πŸ“Š Execute: Run the code and create the visualization
  5. πŸ’Ύ Save: Export high-quality PNG file
  6. βœ… Report: Provide summary of what was created

πŸ“Š Dataset Compatibility

Supported File Formats

Format Extensions Features
CSV .csv Automatic delimiter detection
Excel .xlsx, .xls Multiple sheets supported
Text .txt, .tsv Smart delimiter inference
JSON .json Nested structure flattening
Parquet .parquet High-performance columnar

Smart Loading Features

  • πŸ” Auto-detection: Automatically identifies file format and structure
  • πŸ› οΈ Delimiter inference: Handles comma, tab, space, and custom delimiters
  • πŸ“Š Data type optimization: Automatically detects numeric vs categorical columns
  • 🚫 Error recovery: Graceful handling of malformed files with helpful suggestions
  • πŸ“ˆ Memory efficient: Optimized loading for large datasets

Example Dataset Structures

The agent works with any tabular data structure:

E-commerce Data:

customer_id, product_name, purchase_amount, date, category
1001, Laptop, 999.99, 2024-01-15, Electronics
1002, Book, 29.99, 2024-01-16, Education

Financial Data:

date, stock_symbol, open_price, close_price, volume
2024-01-01, AAPL, 180.50, 182.30, 50000000
2024-01-02, AAPL, 182.30, 185.10, 45000000

Survey Data:

respondent_id, age, satisfaction_score, category, region
R001, 25, 8.5, Premium, North
R002, 34, 7.2, Standard, South

πŸ”§ Technical Details

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Query    │───▢│  LangChain      │───▢│   Ollama LLM    β”‚
β”‚ (Natural Lang.) β”‚    β”‚  Agent          β”‚    β”‚  (Local Model)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Visualization  │◀───│  Python Code    │───▢│  Dataset Loader β”‚
β”‚ (Matplotlib/SNS)β”‚    β”‚  Generation     β”‚    β”‚ (Multi-format)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Enhanced Tech Stack

  • πŸ€– LLM Engine: Ollama (Mistral 7B, Llama3, CodeLlama)
  • πŸ”— Agent Framework: LangChain with pandas integration
  • πŸ“Š Data Processing: Pandas with smart type inference
  • πŸ“ˆ Visualization: Matplotlib + Seaborn with professional styling
  • 🐍 Language: Python 3.8+ with type hints
  • πŸ’Ύ File I/O: Multi-format support (CSV, Excel, JSON, Parquet)

Core Components

# Dataset Loading System
def load_dataset(file_path) -> pd.DataFrame
def get_dataset_path() -> str

# LLM Integration  
def initialize_llm() -> OllamaLLM
def create_agent(llm, dataframe) -> Agent

# Interactive Interface
def ask_agent(question) -> Response
def interactive_mode() -> None

# Visualization Engine
def save_plot(filename) -> None
def setup_plot_style() -> None

Security & Privacy

  • πŸ” Local Processing: All data stays on your machine
  • 🚫 No External APIs: Zero data transmission to external services
  • πŸ›‘οΈ Sandboxed Execution: Code execution in controlled environment
  • πŸ“ Audit Trail: Complete logging of all operations
  • πŸ” Transparent Operations: Full visibility into generated code

πŸ” Troubleshooting

Common Issues

1. "Import langchain_ollama could not be resolved"

# Install the correct LangChain packages
pip install langchain-community langchain-ollama

2. "Ollama connection failed"

# Check if Ollama is running
ollama serve

# Verify model is downloaded
ollama list

# Pull model if missing
ollama pull mistral:7b

3. "Dataset file not found"

# The agent now prompts for dataset path interactively
# Simply run the program and enter your file path when prompted
python data_viz_agent_local.py

# Supported path formats:
# - Absolute: /Users/username/data.csv
# - Relative: ./data/myfile.xlsx  
# - Home directory: ~/Documents/data.json

4. "Unsupported file format"

# Check supported formats:
# βœ… CSV, Excel, JSON, Parquet, TSV, TXT
# ❌ PDF, Word, Images, Databases

# Convert your data to a supported format:
# Excel β†’ CSV: Open in Excel, Save As β†’ CSV
# Database β†’ CSV: Export query results  
# API β†’ JSON: Save API response as .json file

4. "Matplotlib display issues"

# For headless environments, plots are auto-saved as PNG files
# Check the project directory for generated plot files

5. "Agent taking too long to respond"

# Try a smaller/faster model
ollama pull mistral:7b  # Faster than larger models

Performance Tips

  • Model Selection: Use mistral:7b for faster responses, llama3 for better accuracy
  • Memory: Ensure you have at least 8GB RAM for smooth operation
  • First Run: Initial queries may be slower as the model loads

🀝 Contributing

We welcome contributions! Here's how you can help:

Ways to Contribute

  1. πŸ› Bug Reports: Found an issue? Open an issue
  2. πŸ’‘ Feature Requests: Have an idea? Start a discussion
  3. πŸ“– Documentation: Improve docs, add examples
  4. πŸ”§ Code: Submit pull requests for bug fixes or new features

Development Setup

# Fork the repository and clone your fork
git clone https://github.com/Deen-Hayatu/local-data-viz-agent.git
cd local-data-viz-agent

# Create a feature branch
git checkout -b feature/your-feature-name

# Make your changes and test
python data_viz_agent_local.py

# Submit a pull request

Code Style

  • Follow PEP 8 guidelines
  • Add docstrings to functions
  • Include type hints where appropriate
  • Test your changes before submitting

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • IBM watsonx.ai labs - Original inspiration for the data analysis workflow
  • LangChain - Powerful framework for LLM applications
  • Ollama - Making local LLMs accessible and easy to use
  • UCI Machine Learning Repository - Student performance dataset

πŸ“ž Support


⭐ Star this repository if you find it helpful!

Made with ❀️ for the open-source community

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors