Transform any dataset into beautiful visualizations using natural language - completely local, private, and secure!
A powerful local tool that uses natural language to create data visualizations through LangChain and Ollama integration.
- π Quick Start Guide - Get up and running in 5 minutes
- π Complete Documentation - Detailed user guide and features
- π§ API Reference - Technical documentation for developers
- π¨ Examples Gallery - Sample queries and visualization examples
Transform your data analysis with natural language queries:
π¬ "Create a histogram of customer ages"
π¬ "Show me sales trends over the last quarter"
π¬ "Compare revenue by product category"
β¨ Features:
- π€ Local AI processing (no cloud dependencies)
- π Automatic plot saving (PNG + PDF formats)
- π¨ Professional visualizations
- π¬ Natural language interface
- π Multiple data format support (CSV, Excel, JSON, etc.)
- Install Ollama:
brew install ollama(macOS) - Download AI model:
ollama pull mistral:7b - Install dependencies:
pip install -r requirements.txt - Run:
python data_viz_agent_local.py
See QUICK_START.md for detailed setup instructions.
- Multi-format compatibility: CSV, Excel (.xlsx, .xls), JSON, Parquet, TSV, and text files
- Smart format detection: Automatically detects file types and delimiters
- Interactive dataset selection: Browse and load any dataset at runtime
- Dynamic column analysis: Adapts examples and queries to your specific data structure
- Natural Language Processing: Ask questions in plain English
- Local LLM Integration: Complete privacy with Ollama (no data leaves your machine)
- Intelligent code generation: Automatically creates optimized pandas and visualization code
- Error handling & recovery: Robust parsing with graceful error management
- Comprehensive chart types: Histograms, scatter plots, bar charts, heatmaps, box plots, correlation matrices
- Publication-ready styling: Beautiful, professional-grade visualizations
- Auto-save functionality: High-quality PNG exports with customizable filenames
- Interactive plot management: Runtime plot styling and customization
- Modular architecture: Clean, maintainable codebase with separation of concerns
- Comprehensive error handling: Detailed error messages and recovery suggestions
- Extensive logging: Full transparency into agent decision-making process
- Configurable backends: Support for headless environments and various display modes
- Quick Start
- Installation
- Usage
- Examples
- Dataset Information
- Technical Details
- Troubleshooting
- Contributing
- License
# 1. Clone the repository
git clone https://github.com/yourusername/universal-data-viz-agent.git
cd universal-data-viz-agent
# 2. Install Ollama and pull a model
# Visit https://ollama.ai/ for installation instructions
ollama pull mistral:7b
# 3. Set up Python environment
python3 -m venv Data_viz_Agent_env
source Data_viz_Agent_env/bin/activate # On Windows: Data_viz_Agent_env\Scripts\activate
pip install -r requirements.txt
# 4. Run the agent with your own dataset
python data_viz_agent_local.py
# 5. Enter the path to your dataset when prompted
# Example: /Users/yourusername/your_data.csv- Before: Only worked with pre-defined student dataset
- Now: Works with ANY CSV, Excel, JSON, or text file
- Interactive selection: Choose your dataset at runtime
- Smart detection: Automatically handles different file formats and delimiters
- Runtime commands:
info,columns,change datasetcommands - Dynamic examples: Examples adapt to your specific dataset structure
- Better error handling: Clear error messages with helpful suggestions
- Flexible data types: Automatic numeric/categorical column detection
- Modular design: Clean separation of dataset loading, LLM initialization, and visualization
- Enterprise-ready: Robust error handling and logging
- Configurable: Easy to extend and customize for specific use cases
- Python 3.8+
- Ollama - Install from ollama.ai
- Git (for cloning the repository)
-
Install Ollama
# Visit https://ollama.ai/ and follow installation instructions for your OS # Then pull a language model: ollama pull mistral:7b # Alternative models: llama3, codellama, etc.
-
Clone and Setup Python Environment
git clone https://github.com/yourusername/universal-data-viz-agent.git cd universal-data-viz-agent # Create virtual environment python3 -m venv Data_viz_Agent_env source Data_viz_Agent_env/bin/activate # On Windows: Data_viz_Agent_env\Scripts\activate # Install dependencies pip install --upgrade pip pip install -r requirements.txt
-
Prepare Your Dataset
# The agent now works with ANY dataset file! # Supported formats: # - CSV files (.csv) # - Excel files (.xlsx, .xls) # - JSON files (.json) # - Text files (.txt, .tsv) # - Parquet files (.parquet) # Example datasets you can try: # - Your own business data # - Public datasets from Kaggle # - Government open data # - Research datasets
-
Run the Agent
python data_viz_agent_local.py # You'll be prompted to enter your dataset path: # Example inputs: # /Users/yourname/sales_data.csv # ~/Documents/customer_data.xlsx # ./data/survey_results.json
When you run the agent, you'll be guided through an interactive setup:
python data_viz_agent_local.pyStep 1: Dataset Selection
π DATASET SELECTION
==================================================
Supported file formats:
β’ CSV (.csv)
β’ Text files (.txt, .tsv)
β’ Excel (.xlsx, .xls)
β’ JSON (.json)
β’ Parquet (.parquet)
π Enter the full path to your dataset file: /path/to/your/data.csv
Step 2: Choose Interaction Mode
π― What would you like to do?
1. Run example queries
2. Interactive mode (ask your own questions)
3. Both
4. Load a different dataset
In interactive mode, you now have special commands:
π¬ Your question: info # Show dataset information
π¬ Your question: columns # List all column names
π¬ Your question: change dataset # Load a different file
π¬ Your question: quit # Exit the programThe agent adapts to YOUR dataset! Here are examples for any dataset:
"Show me the first 10 rows"
"What are the column names and data types?"
"How many missing values are in each column?"
"Describe the dataset with summary statistics"
"Create a histogram of [your_numeric_column]"
"Show correlations between all numeric columns"
"Plot the distribution of [your_categorical_column]"
"Create a scatter plot of [column1] vs [column2]"
"Compare [numeric_column] by [categorical_column]"
"Show me outliers in [your_column]"
"Create a heatmap of missing values"
"Plot time series data for [date_column]"
Here are some natural language queries you can try:
"Show me the first 5 rows of the data"
"What are the column names in this dataset?"
"How many students are in this dataset?"
"Create a histogram of student final grades"
"Show the distribution of study time"
"Plot the ages of students"
"Show the correlation between study time and final grade"
"Create a heatmap of all correlations"
"How does mother's education relate to student performance?"
"Compare grades between urban and rural students"
"Create a box plot showing grades by internet access"
"Show the difference in performance between schools"
"Create a scatter plot of age vs final grade, colored by school"
"Plot the relationship between failures and final grade"
"Show me a pair plot of the grade variables G1, G2, G3"
# Load your sales data
π¬ "Show monthly revenue trends"
π¬ "Compare customer segments by purchase amount"
π¬ "Create a funnel analysis of conversion rates"# Load your experimental data
π¬ "Plot the correlation between variables X and Y"
π¬ "Show distribution of measurement errors"
π¬ "Create box plots comparing treatment groups"# Load your financial data
π¬ "Show stock price volatility over time"
π¬ "Compare portfolio performance by sector"
π¬ "Create a risk-return scatter plot"# Load student or survey data
π¬ "Analyze grade distributions by subject"
π¬ "Show correlation between study time and performance"
π¬ "Compare outcomes across different demographics"When you ask: "Create a histogram of [your_column]"
The agent will:
- π€ Understand: Parse your natural language request
- π Analyze: Examine your specific dataset structure
- π» Generate: Create optimized Python code for your data
- π Execute: Run the code and create the visualization
- πΎ Save: Export high-quality PNG file
- β Report: Provide summary of what was created
| Format | Extensions | Features |
|---|---|---|
| CSV | .csv |
Automatic delimiter detection |
| Excel | .xlsx, .xls |
Multiple sheets supported |
| Text | .txt, .tsv |
Smart delimiter inference |
| JSON | .json |
Nested structure flattening |
| Parquet | .parquet |
High-performance columnar |
- π Auto-detection: Automatically identifies file format and structure
- π οΈ Delimiter inference: Handles comma, tab, space, and custom delimiters
- π Data type optimization: Automatically detects numeric vs categorical columns
- π« Error recovery: Graceful handling of malformed files with helpful suggestions
- π Memory efficient: Optimized loading for large datasets
The agent works with any tabular data structure:
E-commerce Data:
customer_id, product_name, purchase_amount, date, category
1001, Laptop, 999.99, 2024-01-15, Electronics
1002, Book, 29.99, 2024-01-16, Education
Financial Data:
date, stock_symbol, open_price, close_price, volume
2024-01-01, AAPL, 180.50, 182.30, 50000000
2024-01-02, AAPL, 182.30, 185.10, 45000000
Survey Data:
respondent_id, age, satisfaction_score, category, region
R001, 25, 8.5, Premium, North
R002, 34, 7.2, Standard, South
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β User Query βββββΆβ LangChain βββββΆβ Ollama LLM β
β (Natural Lang.) β β Agent β β (Local Model) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Visualization ββββββ Python Code βββββΆβ Dataset Loader β
β (Matplotlib/SNS)β β Generation β β (Multi-format) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- π€ LLM Engine: Ollama (Mistral 7B, Llama3, CodeLlama)
- π Agent Framework: LangChain with pandas integration
- π Data Processing: Pandas with smart type inference
- π Visualization: Matplotlib + Seaborn with professional styling
- π Language: Python 3.8+ with type hints
- πΎ File I/O: Multi-format support (CSV, Excel, JSON, Parquet)
# Dataset Loading System
def load_dataset(file_path) -> pd.DataFrame
def get_dataset_path() -> str
# LLM Integration
def initialize_llm() -> OllamaLLM
def create_agent(llm, dataframe) -> Agent
# Interactive Interface
def ask_agent(question) -> Response
def interactive_mode() -> None
# Visualization Engine
def save_plot(filename) -> None
def setup_plot_style() -> None- π Local Processing: All data stays on your machine
- π« No External APIs: Zero data transmission to external services
- π‘οΈ Sandboxed Execution: Code execution in controlled environment
- π Audit Trail: Complete logging of all operations
- π Transparent Operations: Full visibility into generated code
1. "Import langchain_ollama could not be resolved"
# Install the correct LangChain packages
pip install langchain-community langchain-ollama2. "Ollama connection failed"
# Check if Ollama is running
ollama serve
# Verify model is downloaded
ollama list
# Pull model if missing
ollama pull mistral:7b3. "Dataset file not found"
# The agent now prompts for dataset path interactively
# Simply run the program and enter your file path when prompted
python data_viz_agent_local.py
# Supported path formats:
# - Absolute: /Users/username/data.csv
# - Relative: ./data/myfile.xlsx
# - Home directory: ~/Documents/data.json4. "Unsupported file format"
# Check supported formats:
# β
CSV, Excel, JSON, Parquet, TSV, TXT
# β PDF, Word, Images, Databases
# Convert your data to a supported format:
# Excel β CSV: Open in Excel, Save As β CSV
# Database β CSV: Export query results
# API β JSON: Save API response as .json file4. "Matplotlib display issues"
# For headless environments, plots are auto-saved as PNG files
# Check the project directory for generated plot files5. "Agent taking too long to respond"
# Try a smaller/faster model
ollama pull mistral:7b # Faster than larger models- Model Selection: Use
mistral:7bfor faster responses,llama3for better accuracy - Memory: Ensure you have at least 8GB RAM for smooth operation
- First Run: Initial queries may be slower as the model loads
We welcome contributions! Here's how you can help:
- π Bug Reports: Found an issue? Open an issue
- π‘ Feature Requests: Have an idea? Start a discussion
- π Documentation: Improve docs, add examples
- π§ Code: Submit pull requests for bug fixes or new features
# Fork the repository and clone your fork
git clone https://github.com/Deen-Hayatu/local-data-viz-agent.git
cd local-data-viz-agent
# Create a feature branch
git checkout -b feature/your-feature-name
# Make your changes and test
python data_viz_agent_local.py
# Submit a pull request- Follow PEP 8 guidelines
- Add docstrings to functions
- Include type hints where appropriate
- Test your changes before submitting
This project is licensed under the MIT License - see the LICENSE file for details.
- IBM watsonx.ai labs - Original inspiration for the data analysis workflow
- LangChain - Powerful framework for LLM applications
- Ollama - Making local LLMs accessible and easy to use
- UCI Machine Learning Repository - Student performance dataset
- π§ Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: Wiki
β Star this repository if you find it helpful!
Made with β€οΈ for the open-source community
