A powerful LangChain-powered agent that interacts with spreadsheets using natural language. Query, process, and summarize your data with simple conversational commands.
π New to this project? β START_HERE.md - Get running in 60 seconds!
- π Powered by LangChain: Multi-LLM support (GPT-4, Claude, and more)
- π€ Multiple Agent Types: OpenAI Functions, ReAct, and custom agents
- π Natural Language Queries: Filter, sort, and select data using plain English
- π Data Processing: Add columns, aggregate data, compute metrics
- π Smart Summarization: Get human-readable insights from your data
- π¬ Conversational Interface: Chain operations naturally across multiple turns
- π― Dynamic Tool Selection: Agent chooses the right tools automatically
- π Memory-Enabled: Maintains context throughout the conversation
- π Streaming Support: Real-time output via LangChain callbacks
- β±οΈ Performance Timing: See execution time for every query
- π SQL Query Display: See the SQL equivalent of every operation
This agent is built on LangChain, giving you:
- β OpenAI: GPT-4, GPT-3.5-turbo
- β Anthropic: Claude 3 Opus, Claude 3 Sonnet
- π Any LangChain LLM: Custom configurations supported
# Use GPT-4
python main.py --model gpt-4
# Use Claude
python main.py --model claude-3-opus-20240229
# Use GPT-3.5 (faster/cheaper)
python main.py --model gpt-3.5-turbo
See LANGCHAIN_INTEGRATION.md for full details!
Just want to start NOW? β See START_HERE.md for 3-step setup!
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set your API key
export OPENAI_API_KEY='sk-your-key-here'
# 3. Run the agent
python reports_agent.py
That's it! Then type 1
to load your first report.
Specialized interface for the 30+ business reports in reports/
folder:
# Quick start - interactive mode
python reports_agent.py
# Or use the launcher
./analyze_reports.sh
# List all available reports
python reports_agent.py --list
Super fast loading - just type a number:
π You: 1 # Loads first report
π You: sales.csv # Or type filename
See REPORTS_GUIDE.md for detailed examples!
python main.py
Example session:
π You: Load the file examples/sales_data.csv
π€ Agent: Loaded 24 rows and 7 columns from examples/sales_data.csv
...
π You: Show rows where Sales > 2000
π€ Agent: Query returned 8 rows:
...
π You: Add a Profit column as Revenue - Cost
π€ Agent: Successfully added column 'Profit'
...
π You: Summarize the top 5 results
π€ Agent: Data Summary:
- Total rows: 24
...
python main.py --batch "Load examples/sales_data.csv" "Show rows where Sales > 2000" "Group by Region and sum Sales"
from agent import create_agent
# Create agent
agent = create_agent(model_name='gpt-4', verbose=True)
# Run queries
result = agent.run("Load the file examples/sales_data.csv")
print(result['output'])
result = agent.run("Show rows where Sales > 2000")
print(result['output'])
result = agent.run("Add a Profit column as Revenue - Cost")
print(result['output'])
sqldemo/
βββ agent.py # Main agent implementation
βββ tools.py # Tool definitions (Query, Process, Summarize)
βββ main.py # CLI interface
βββ requirements.txt # Dependencies
βββ examples/ # Example data and notebooks
β βββ sales_data.csv # Sample dataset
β βββ example_1_query_only.ipynb
β βββ example_2_query_and_process.ipynb
β βββ example_3_full_workflow.ipynb
βββ tests/ # Unit tests
βββ test_tools.py
Load CSV or Excel files into memory.
Example: "Load the file data.csv"
Filter, select, and sort data using natural language.
Examples:
"Show rows where Sales > 1000"
"Filter for Region == 'North'"
"Get top 10 rows sorted by Revenue descending"
"Select columns: Name, Sales, Profit"
Transform and aggregate data.
Examples:
"Add a Profit column calculated as Revenue - Cost"
"Group by Category and sum Sales"
"Calculate mean Sales by Region"
"Normalize the Price column to 0-1 range"
Generate natural language summaries.
Examples:
"Summarize the top 5 results"
"Explain what this data shows"
"Write a brief report of key insights"
View current spreadsheet metadata.
Example: Type info
in the CLI
1. Load examples/sales_data.csv
2. Filter for sales above 2000
3. Add a Profit column (Revenue - Cost)
4. Summarize the top 5 results
1. Load examples/sales_data.csv
2. Group by Region and sum Revenue
3. Summarize which region performed best
1. Load examples/sales_data.csv
2. Add a Margin column ((Revenue - Cost) / Revenue * 100)
3. Group by Product and calculate average Margin
4. Summarize which product has the best margins
In interactive mode, you can use these commands:
help
or?
- Show help messageinfo
- Display current spreadsheet infohistory
- Show operation historyreset
- Reset spreadsheet to original stateclear
- Clear conversation memoryexit
orquit
- Exit the program
python main.py [OPTIONS]
Options:
--model TEXT OpenAI model to use (default: gpt-4)
--temperature FLOAT LLM temperature (default: 0)
--verbose Enable verbose output
--batch TEXT... Run in batch mode with commands
--api-key TEXT OpenAI API key
See the examples/
directory for Jupyter notebooks demonstrating:
- Query-only operations
- Query + processing workflows
- Full workflows with summarization
- Python 3.9+
- OpenAI API key
- Dependencies listed in
requirements.txt
The system consists of:
- LangChain Agent: OpenAI Functions Agent with conversation memory
- Spreadsheet State: In-memory dataframe with operation history
- Tool Suite: Specialized tools for different operations
- CLI Interface: Interactive and batch modes
- CSV (
.csv
) - Excel (
.xlsx
,.xls
)
- Visualization tool (charts/graphs)
- SQL translation for large datasets
- Multi-spreadsheet joins
- Export tool (Excel/PDF)
- Google Sheets integration
- Role-based access control
Issue: "No spreadsheet loaded" error
Solution: Make sure to load a spreadsheet first using load_spreadsheet
tool
Issue: API key not found
Solution: Set OPENAI_API_KEY
environment variable or use --api-key
flag
Issue: Tool parsing errors Solution: Try rephrasing your request or check the verbose output for details
Contributions welcome! Please ensure:
- Code follows existing style
- Tests pass
- Documentation is updated
MIT License - see LICENSE file for details
For questions or issues, please open a GitHub issue.