🎨 Universal Data Visualization Agent

Transform any dataset into beautiful visualizations using natural language - completely local, private, and secure!

Data Visualization Agent

A powerful local tool that uses natural language to create data visualizations through LangChain and Ollama integration.

📚 Documentation

🚀 Quick Start Guide - Get up and running in 5 minutes
📖 Complete Documentation - Detailed user guide and features
🔧 API Reference - Technical documentation for developers
🎨 Examples Gallery - Sample queries and visualization examples

⚡ Quick Overview

Transform your data analysis with natural language queries:

💬 "Create a histogram of customer ages"
💬 "Show me sales trends over the last quarter" 
💬 "Compare revenue by product category"

✨ Features:

🤖 Local AI processing (no cloud dependencies)
📊 Automatic plot saving (PNG + PDF formats)
🎨 Professional visualizations
💬 Natural language interface
📁 Multiple data format support (CSV, Excel, JSON, etc.)

🚀 Quick Start

Install Ollama: brew install ollama (macOS)
Download AI model: ollama pull mistral:7b
Install dependencies: pip install -r requirements.txt
Run: python data_viz_agent_local.py

See QUICK_START.md for detailed setup instructions.

🚀 Key Features

🎯 Universal Dataset Support

Multi-format compatibility: CSV, Excel (.xlsx, .xls), JSON, Parquet, TSV, and text files
Smart format detection: Automatically detects file types and delimiters
Interactive dataset selection: Browse and load any dataset at runtime
Dynamic column analysis: Adapts examples and queries to your specific data structure

🤖 Advanced AI Capabilities

Natural Language Processing: Ask questions in plain English
Local LLM Integration: Complete privacy with Ollama (no data leaves your machine)
Intelligent code generation: Automatically creates optimized pandas and visualization code
Error handling & recovery: Robust parsing with graceful error management

📊 Professional Visualizations

Comprehensive chart types: Histograms, scatter plots, bar charts, heatmaps, box plots, correlation matrices
Publication-ready styling: Beautiful, professional-grade visualizations
Auto-save functionality: High-quality PNG exports with customizable filenames
Interactive plot management: Runtime plot styling and customization

🔧 Enterprise-Ready Features

Modular architecture: Clean, maintainable codebase with separation of concerns
Comprehensive error handling: Detailed error messages and recovery suggestions
Extensive logging: Full transparency into agent decision-making process
Configurable backends: Support for headless environments and various display modes

⚡ Quick Start

# 1. Clone the repository
git clone https://github.com/yourusername/universal-data-viz-agent.git
cd universal-data-viz-agent

# 2. Install Ollama and pull a model
# Visit https://ollama.ai/ for installation instructions
ollama pull mistral:7b

# 3. Set up Python environment
python3 -m venv Data_viz_Agent_env
source Data_viz_Agent_env/bin/activate  # On Windows: Data_viz_Agent_env\Scripts\activate
pip install -r requirements.txt

# 4. Run the agent with your own dataset
python data_viz_agent_local.py

# 5. Enter the path to your dataset when prompted
# Example: /Users/yourusername/your_data.csv

🎯 What's New in v2.0

🆕 Universal Dataset Support

Before: Only worked with pre-defined student dataset
Now: Works with ANY CSV, Excel, JSON, or text file
Interactive selection: Choose your dataset at runtime
Smart detection: Automatically handles different file formats and delimiters

🆕 Enhanced User Experience

Runtime commands: info, columns, change dataset commands
Dynamic examples: Examples adapt to your specific dataset structure
Better error handling: Clear error messages with helpful suggestions
Flexible data types: Automatic numeric/categorical column detection

🆕 Professional Architecture

Modular design: Clean separation of dataset loading, LLM initialization, and visualization
Enterprise-ready: Robust error handling and logging
Configurable: Easy to extend and customize for specific use cases

🛠 Installation

Prerequisites

Python 3.8+
Ollama - Install from ollama.ai
Git (for cloning the repository)

Step-by-Step Setup

Install Ollama

# Visit https://ollama.ai/ and follow installation instructions for your OS
# Then pull a language model:
ollama pull mistral:7b
# Alternative models: llama3, codellama, etc.

Clone and Setup Python Environment

git clone https://github.com/yourusername/universal-data-viz-agent.git
cd universal-data-viz-agent

# Create virtual environment
python3 -m venv Data_viz_Agent_env
source Data_viz_Agent_env/bin/activate  # On Windows: Data_viz_Agent_env\Scripts\activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

Prepare Your Dataset

# The agent now works with ANY dataset file!
# Supported formats:
# - CSV files (.csv)
# - Excel files (.xlsx, .xls)  
# - JSON files (.json)
# - Text files (.txt, .tsv)
# - Parquet files (.parquet)

# Example datasets you can try:
# - Your own business data
# - Public datasets from Kaggle
# - Government open data
# - Research datasets

Run the Agent

python data_viz_agent_local.py

# You'll be prompted to enter your dataset path:
# Example inputs:
# /Users/yourname/sales_data.csv
# ~/Documents/customer_data.xlsx
# ./data/survey_results.json

💻 Usage

Interactive Dataset Selection

When you run the agent, you'll be guided through an interactive setup:

python data_viz_agent_local.py

Step 1: Dataset Selection

📁 DATASET SELECTION
==================================================
Supported file formats:
• CSV (.csv)
• Text files (.txt, .tsv)  
• Excel (.xlsx, .xls)
• JSON (.json)
• Parquet (.parquet)

📂 Enter the full path to your dataset file: /path/to/your/data.csv

Step 2: Choose Interaction Mode

🎯 What would you like to do?
1. Run example queries
2. Interactive mode (ask your own questions)
3. Both
4. Load a different dataset

Enhanced Interactive Commands

In interactive mode, you now have special commands:

💬 Your question: info          # Show dataset information
💬 Your question: columns       # List all column names  
💬 Your question: change dataset # Load a different file
💬 Your question: quit          # Exit the program

Universal Query Examples

The agent adapts to YOUR dataset! Here are examples for any dataset:

Data Exploration (works with any dataset)

"Show me the first 10 rows"
"What are the column names and data types?"
"How many missing values are in each column?"
"Describe the dataset with summary statistics"

Smart Visualizations (adapts to your columns)

"Create a histogram of [your_numeric_column]"
"Show correlations between all numeric columns"  
"Plot the distribution of [your_categorical_column]"
"Create a scatter plot of [column1] vs [column2]"

Advanced Analysis

"Compare [numeric_column] by [categorical_column]"
"Show me outliers in [your_column]"
"Create a heatmap of missing values"
"Plot time series data for [date_column]"

🎯 Examples

Here are some natural language queries you can try:

Basic Data Exploration

"Show me the first 5 rows of the data"
"What are the column names in this dataset?"
"How many students are in this dataset?"

Simple Visualizations

"Create a histogram of student final grades"
"Show the distribution of study time"
"Plot the ages of students"

Correlation Analysis

"Show the correlation between study time and final grade"
"Create a heatmap of all correlations"
"How does mother's education relate to student performance?"

Comparative Analysis

"Compare grades between urban and rural students"
"Create a box plot showing grades by internet access"
"Show the difference in performance between schools"

Advanced Visualizations

"Create a scatter plot of age vs final grade, colored by school"
"Plot the relationship between failures and final grade"
"Show me a pair plot of the grade variables G1, G2, G3"

🎯 Example Use Cases

📊 Business Analytics

# Load your sales data
💬 "Show monthly revenue trends"
💬 "Compare customer segments by purchase amount" 
💬 "Create a funnel analysis of conversion rates"

🧪 Research & Science

# Load your experimental data
💬 "Plot the correlation between variables X and Y"
💬 "Show distribution of measurement errors"
💬 "Create box plots comparing treatment groups"

📈 Financial Analysis

# Load your financial data
💬 "Show stock price volatility over time"
💬 "Compare portfolio performance by sector"
💬 "Create a risk-return scatter plot"

🎓 Educational Data

# Load student or survey data
💬 "Analyze grade distributions by subject"
💬 "Show correlation between study time and performance"
💬 "Compare outcomes across different demographics"

Sample Output

When you ask: "Create a histogram of [your_column]"

The agent will:

🤔 Understand: Parse your natural language request
🔍 Analyze: Examine your specific dataset structure
💻 Generate: Create optimized Python code for your data
📊 Execute: Run the code and create the visualization
💾 Save: Export high-quality PNG file
✅ Report: Provide summary of what was created

📊 Dataset Compatibility

Supported File Formats

Format	Extensions	Features
CSV	`.csv`	Automatic delimiter detection
Excel	`.xlsx`, `.xls`	Multiple sheets supported
Text	`.txt`, `.tsv`	Smart delimiter inference
JSON	`.json`	Nested structure flattening
Parquet	`.parquet`	High-performance columnar

Smart Loading Features

🔍 Auto-detection: Automatically identifies file format and structure
🛠️ Delimiter inference: Handles comma, tab, space, and custom delimiters
📊 Data type optimization: Automatically detects numeric vs categorical columns
🚫 Error recovery: Graceful handling of malformed files with helpful suggestions
📈 Memory efficient: Optimized loading for large datasets

Example Dataset Structures

The agent works with any tabular data structure:

E-commerce Data:

customer_id, product_name, purchase_amount, date, category
1001, Laptop, 999.99, 2024-01-15, Electronics
1002, Book, 29.99, 2024-01-16, Education

Financial Data:

date, stock_symbol, open_price, close_price, volume
2024-01-01, AAPL, 180.50, 182.30, 50000000
2024-01-02, AAPL, 182.30, 185.10, 45000000

Survey Data:

respondent_id, age, satisfaction_score, category, region
R001, 25, 8.5, Premium, North
R002, 34, 7.2, Standard, South

🔧 Technical Details

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   User Query    │───▶│  LangChain      │───▶│   Ollama LLM    │
│ (Natural Lang.) │    │  Agent          │    │  (Local Model)  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Visualization  │◀───│  Python Code    │───▶│  Dataset Loader │
│ (Matplotlib/SNS)│    │  Generation     │    │ (Multi-format)  │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Enhanced Tech Stack

🤖 LLM Engine: Ollama (Mistral 7B, Llama3, CodeLlama)
🔗 Agent Framework: LangChain with pandas integration
📊 Data Processing: Pandas with smart type inference
📈 Visualization: Matplotlib + Seaborn with professional styling
🐍 Language: Python 3.8+ with type hints
💾 File I/O: Multi-format support (CSV, Excel, JSON, Parquet)

Core Components

# Dataset Loading System
def load_dataset(file_path) -> pd.DataFrame
def get_dataset_path() -> str

# LLM Integration  
def initialize_llm() -> OllamaLLM
def create_agent(llm, dataframe) -> Agent

# Interactive Interface
def ask_agent(question) -> Response
def interactive_mode() -> None

# Visualization Engine
def save_plot(filename) -> None
def setup_plot_style() -> None

Security & Privacy

🔐 Local Processing: All data stays on your machine
🚫 No External APIs: Zero data transmission to external services
🛡️ Sandboxed Execution: Code execution in controlled environment
📝 Audit Trail: Complete logging of all operations
🔍 Transparent Operations: Full visibility into generated code

🔍 Troubleshooting

Common Issues

1. "Import langchain_ollama could not be resolved"

# Install the correct LangChain packages
pip install langchain-community langchain-ollama

2. "Ollama connection failed"

# Check if Ollama is running
ollama serve

# Verify model is downloaded
ollama list

# Pull model if missing
ollama pull mistral:7b

3. "Dataset file not found"

# The agent now prompts for dataset path interactively
# Simply run the program and enter your file path when prompted
python data_viz_agent_local.py

# Supported path formats:
# - Absolute: /Users/username/data.csv
# - Relative: ./data/myfile.xlsx  
# - Home directory: ~/Documents/data.json

4. "Unsupported file format"

# Check supported formats:
# ✅ CSV, Excel, JSON, Parquet, TSV, TXT
# ❌ PDF, Word, Images, Databases

# Convert your data to a supported format:
# Excel → CSV: Open in Excel, Save As → CSV
# Database → CSV: Export query results  
# API → JSON: Save API response as .json file

4. "Matplotlib display issues"

# For headless environments, plots are auto-saved as PNG files
# Check the project directory for generated plot files

5. "Agent taking too long to respond"

# Try a smaller/faster model
ollama pull mistral:7b  # Faster than larger models

Performance Tips

Model Selection: Use mistral:7b for faster responses, llama3 for better accuracy
Memory: Ensure you have at least 8GB RAM for smooth operation
First Run: Initial queries may be slower as the model loads

🤝 Contributing

We welcome contributions! Here's how you can help:

Ways to Contribute

🐛 Bug Reports: Found an issue? Open an issue
💡 Feature Requests: Have an idea? Start a discussion
📖 Documentation: Improve docs, add examples
🔧 Code: Submit pull requests for bug fixes or new features

Development Setup

# Fork the repository and clone your fork
git clone https://github.com/Deen-Hayatu/local-data-viz-agent.git
cd local-data-viz-agent

# Create a feature branch
git checkout -b feature/your-feature-name

# Make your changes and test
python data_viz_agent_local.py

# Submit a pull request

Code Style

Follow PEP 8 guidelines
Add docstrings to functions
Include type hints where appropriate
Test your changes before submitting

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

IBM watsonx.ai labs - Original inspiration for the data analysis workflow
LangChain - Powerful framework for LLM applications
Ollama - Making local LLMs accessible and easy to use
UCI Machine Learning Repository - Student performance dataset

📞 Support

📧 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
📖 Documentation: Wiki

⭐ Star this repository if you find it helpful!

Made with ❤️ for the open-source community

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
test_visualizations		test_visualizations
.gitignore		.gitignore
API_DOCUMENTATION.md		API_DOCUMENTATION.md
API_REFERENCE.md		API_REFERENCE.md
CHANGELOG.md		CHANGELOG.md
CONCEPTUAL_WALKTHROUGH.md		CONCEPTUAL_WALKTHROUGH.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPENDENCY_FIXES_SUMMARY.md		DEPENDENCY_FIXES_SUMMARY.md
DOCUMENTATION_INDEX.md		DOCUMENTATION_INDEX.md
EXAMPLES.md		EXAMPLES.md
INSTALLATION_DEPLOYMENT.md		INSTALLATION_DEPLOYMENT.md
LAB_IMPLEMENTATION_ANALYSIS.md		LAB_IMPLEMENTATION_ANALYSIS.md
LICENSE		LICENSE
OPTIMIZATION_TIPS.md		OPTIMIZATION_TIPS.md
PLOT_SAVING_GUIDE.md		PLOT_SAVING_GUIDE.md
PRODUCTION_VS_JUPYTER.md		PRODUCTION_VS_JUPYTER.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_DOCUMENTATION.md		README_DOCUMENTATION.md
Readme_data_viz_agent.yaml		Readme_data_viz_agent.yaml
SETUP_COMPLETE.md		SETUP_COMPLETE.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
TROUBLESHOOTING_GUIDE.md		TROUBLESHOOTING_GUIDE.md
Text-to-chart in Jupyter notebook Visualization Agent.ipynb		Text-to-chart in Jupyter notebook Visualization Agent.ipynb
Untitled.ipynb		Untitled.ipynb
box_plot_demo.py		box_plot_demo.py
bug_report.yml		bug_report.yml
data_viz_agent_local.py		data_viz_agent_local.py
data_viz_agent_notebook.ipynb		data_viz_agent_notebook.ipynb
enhanced_agent_demo.py		enhanced_agent_demo.py
example_usage.py		example_usage.py
gender_chart_generator.py		gender_chart_generator.py
gender_count_visualization.ipynb		gender_count_visualization.ipynb
get_student_dataset.py		get_student_dataset.py
real_world_analysis.py		real_world_analysis.py
requirements.txt		requirements.txt
setup.sh		setup.sh
start_agent.sh		start_agent.sh
test_fixed_environment.py		test_fixed_environment.py
test_plot_saving.py		test_plot_saving.py
test_working_imports.py		test_working_imports.py
working_imports_cell.py		working_imports_cell.py

Folders and files

Latest commit

History

Repository files navigation

🎨 Universal Data Visualization Agent

Data Visualization Agent

📚 Documentation

⚡ Quick Overview

🚀 Quick Start

🚀 Key Features

🎯 Universal Dataset Support

🤖 Advanced AI Capabilities

📊 Professional Visualizations

🔧 Enterprise-Ready Features

📋 Table of Contents

⚡ Quick Start

🎯 What's New in v2.0

🆕 Universal Dataset Support

🆕 Enhanced User Experience

🆕 Professional Architecture

🛠 Installation

Prerequisites

Step-by-Step Setup

💻 Usage

Interactive Dataset Selection

Enhanced Interactive Commands

Universal Query Examples

Data Exploration (works with any dataset)

Smart Visualizations (adapts to your columns)

Advanced Analysis

🎯 Examples

Basic Data Exploration

Simple Visualizations

Correlation Analysis

Comparative Analysis

Advanced Visualizations

🎯 Example Use Cases

📊 Business Analytics

🧪 Research & Science

📈 Financial Analysis

🎓 Educational Data

Sample Output

📊 Dataset Compatibility

Supported File Formats

Smart Loading Features

Example Dataset Structures

🔧 Technical Details

Architecture

Enhanced Tech Stack

Core Components

Security & Privacy

🔍 Troubleshooting

Common Issues

Performance Tips

🤝 Contributing

Ways to Contribute

Development Setup

Code Style

📝 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages