Skip to content

Kaboom2025/sqldemo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Spreadsheet Agent with LangChain πŸ”—

A powerful LangChain-powered agent that interacts with spreadsheets using natural language. Query, process, and summarize your data with simple conversational commands.

πŸš€ New to this project? β†’ START_HERE.md - Get running in 60 seconds!

✨ Key Features

  • πŸ”— Powered by LangChain: Multi-LLM support (GPT-4, Claude, and more)
  • πŸ€– Multiple Agent Types: OpenAI Functions, ReAct, and custom agents
  • πŸ” Natural Language Queries: Filter, sort, and select data using plain English
  • πŸ”„ Data Processing: Add columns, aggregate data, compute metrics
  • πŸ“ Smart Summarization: Get human-readable insights from your data
  • πŸ’¬ Conversational Interface: Chain operations naturally across multiple turns
  • 🎯 Dynamic Tool Selection: Agent chooses the right tools automatically
  • πŸ“š Memory-Enabled: Maintains context throughout the conversation
  • 🌊 Streaming Support: Real-time output via LangChain callbacks
  • ⏱️ Performance Timing: See execution time for every query
  • πŸ” SQL Query Display: See the SQL equivalent of every operation

πŸ”— LangChain Integration

This agent is built on LangChain, giving you:

Supported LLM Providers

  • βœ… OpenAI: GPT-4, GPT-3.5-turbo
  • βœ… Anthropic: Claude 3 Opus, Claude 3 Sonnet
  • πŸ”„ Any LangChain LLM: Custom configurations supported

Switch Models Instantly

# Use GPT-4
python main.py --model gpt-4

# Use Claude
python main.py --model claude-3-opus-20240229

# Use GPT-3.5 (faster/cheaper)
python main.py --model gpt-3.5-turbo

See LANGCHAIN_INTEGRATION.md for full details!

⚑ Quick Start

Just want to start NOW? β†’ See START_HERE.md for 3-step setup!

Installation

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set your API key
export OPENAI_API_KEY='sk-your-key-here'

# 3. Run the agent
python reports_agent.py

That's it! Then type 1 to load your first report.

Basic Usage

🎯 NEW: Reports Agent (Recommended for reports/ folder)

Specialized interface for the 30+ business reports in reports/ folder:

# Quick start - interactive mode
python reports_agent.py

# Or use the launcher
./analyze_reports.sh

# List all available reports
python reports_agent.py --list

Super fast loading - just type a number:

πŸ“Š You: 1              # Loads first report
πŸ“Š You: sales.csv      # Or type filename

See REPORTS_GUIDE.md for detailed examples!

Interactive CLI (General Purpose)

python main.py

Example session:

πŸ“Š You: Load the file examples/sales_data.csv

πŸ€– Agent: Loaded 24 rows and 7 columns from examples/sales_data.csv
...

πŸ“Š You: Show rows where Sales > 2000

πŸ€– Agent: Query returned 8 rows:
...

πŸ“Š You: Add a Profit column as Revenue - Cost

πŸ€– Agent: Successfully added column 'Profit'
...

πŸ“Š You: Summarize the top 5 results

πŸ€– Agent: Data Summary:
- Total rows: 24
...

Batch Mode

python main.py --batch "Load examples/sales_data.csv" "Show rows where Sales > 2000" "Group by Region and sum Sales"

Python API

from agent import create_agent

# Create agent
agent = create_agent(model_name='gpt-4', verbose=True)

# Run queries
result = agent.run("Load the file examples/sales_data.csv")
print(result['output'])

result = agent.run("Show rows where Sales > 2000")
print(result['output'])

result = agent.run("Add a Profit column as Revenue - Cost")
print(result['output'])

Project Structure

sqldemo/
β”œβ”€β”€ agent.py                 # Main agent implementation
β”œβ”€β”€ tools.py                 # Tool definitions (Query, Process, Summarize)
β”œβ”€β”€ main.py                  # CLI interface
β”œβ”€β”€ requirements.txt         # Dependencies
β”œβ”€β”€ examples/                # Example data and notebooks
β”‚   β”œβ”€β”€ sales_data.csv      # Sample dataset
β”‚   β”œβ”€β”€ example_1_query_only.ipynb
β”‚   β”œβ”€β”€ example_2_query_and_process.ipynb
β”‚   └── example_3_full_workflow.ipynb
└── tests/                   # Unit tests
    └── test_tools.py

Available Tools

1. Load Spreadsheet

Load CSV or Excel files into memory.

Example: "Load the file data.csv"

2. Query Spreadsheet

Filter, select, and sort data using natural language.

Examples:

  • "Show rows where Sales > 1000"
  • "Filter for Region == 'North'"
  • "Get top 10 rows sorted by Revenue descending"
  • "Select columns: Name, Sales, Profit"

3. Process Spreadsheet

Transform and aggregate data.

Examples:

  • "Add a Profit column calculated as Revenue - Cost"
  • "Group by Category and sum Sales"
  • "Calculate mean Sales by Region"
  • "Normalize the Price column to 0-1 range"

4. Summarize Results

Generate natural language summaries.

Examples:

  • "Summarize the top 5 results"
  • "Explain what this data shows"
  • "Write a brief report of key insights"

5. Get Spreadsheet Info

View current spreadsheet metadata.

Example: Type info in the CLI

Example Workflows

Workflow 1: Sales Analysis

1. Load examples/sales_data.csv
2. Filter for sales above 2000
3. Add a Profit column (Revenue - Cost)
4. Summarize the top 5 results

Workflow 2: Regional Performance

1. Load examples/sales_data.csv
2. Group by Region and sum Revenue
3. Summarize which region performed best

Workflow 3: Product Margins

1. Load examples/sales_data.csv
2. Add a Margin column ((Revenue - Cost) / Revenue * 100)
3. Group by Product and calculate average Margin
4. Summarize which product has the best margins

CLI Commands

In interactive mode, you can use these commands:

  • help or ? - Show help message
  • info - Display current spreadsheet info
  • history - Show operation history
  • reset - Reset spreadsheet to original state
  • clear - Clear conversation memory
  • exit or quit - Exit the program

Command-Line Options

python main.py [OPTIONS]

Options:
  --model TEXT         OpenAI model to use (default: gpt-4)
  --temperature FLOAT  LLM temperature (default: 0)
  --verbose           Enable verbose output
  --batch TEXT...     Run in batch mode with commands
  --api-key TEXT      OpenAI API key

Examples

See the examples/ directory for Jupyter notebooks demonstrating:

  • Query-only operations
  • Query + processing workflows
  • Full workflows with summarization

Requirements

  • Python 3.9+
  • OpenAI API key
  • Dependencies listed in requirements.txt

Architecture

The system consists of:

  1. LangChain Agent: OpenAI Functions Agent with conversation memory
  2. Spreadsheet State: In-memory dataframe with operation history
  3. Tool Suite: Specialized tools for different operations
  4. CLI Interface: Interactive and batch modes

Supported File Formats

  • CSV (.csv)
  • Excel (.xlsx, .xls)

Future Enhancements

  • Visualization tool (charts/graphs)
  • SQL translation for large datasets
  • Multi-spreadsheet joins
  • Export tool (Excel/PDF)
  • Google Sheets integration
  • Role-based access control

Troubleshooting

Issue: "No spreadsheet loaded" error Solution: Make sure to load a spreadsheet first using load_spreadsheet tool

Issue: API key not found Solution: Set OPENAI_API_KEY environment variable or use --api-key flag

Issue: Tool parsing errors Solution: Try rephrasing your request or check the verbose output for details

Contributing

Contributions welcome! Please ensure:

  • Code follows existing style
  • Tests pass
  • Documentation is updated

License

MIT License - see LICENSE file for details

Contact

For questions or issues, please open a GitHub issue.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published