Zemulax/GenSQL
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
# GenSQL 🧠📊
**Stop writing queries. Start asking questions.**
GenSQL is a data layer that understands intent, not just syntax. Think of it as having a librarian who knows your data inside-out, speaks your language, and adapts as your needs change.
> **Not a replacement for Postgres. A better interface for humans.**
---
## 💡 The Problem
Traditional databases are powerful but dumb:
- You need to know exact table/column names
- You must write precise queries with rigid syntax
- Schema changes break your queries
- Non-technical users need a developer for every question
**What if your database worked like talking to a librarian?**
You don't ask a librarian:
> "Give me items from shelf 3B where Dewey decimal is between 500-599"
You ask:
> "I need books about space exploration"
And they _understand_ what you mean.
---
## ✨ What GenSQL Does Differently
**Traditional Database:**
```sql
SELECT * FROM students WHERE grade_percentage >= 60
-- Requires knowing schema, exact field names, SQL syntax
```
**GenSQL:**
```
show me students who passed
→ Figures out what "passed" means in your context
→ Finds the right collection and fields automatically
→ Returns results with explanation
```
### Key Features
- **Understands intent:** "students who passed" vs rigid SELECT queries
- **Semantic field matching:** Don't know if it's "firstName" or "first_name"? GenSQL figures it out
- **Comparison operators:** Handles `>=`, `<`, `!=` naturally from plain English
- **Smart aggregations:** "what's the average grade?" → calculates automatically
- **Domain awareness:** Learns what "failed", "active", or "recent" means in your context
- **Schema evolution:** Field names change? Queries still work
- **Natural language + SQL:** Use whichever feels right
---
## 🎯 Who Is This For?
GenSQL shines in specific scenarios:
✅ **Internal business tools** - Sales teams need data but don't know SQL
✅ **Rapid prototyping** - Schema keeps changing, don't want to rewrite queries
✅ **Domain-specific apps** - HR tools, student tracking, CRM where business logic is complex
✅ **Non-technical users** - Teachers, managers, analysts who think in questions not queries
❌ **Not for:** High-frequency operations, mission-critical systems, cost-sensitive production apps
---
## 🚀 Quick Example
```bash
GenSQL> add a student named Alice with grade 95
✓ Added student Alice (students_1)
GenSQL> add student Bob grade 72
✓ Added student Bob (students_2)
GenSQL> show me students with grades over 80
Found 1 student:
- Alice (grade: 95)
GenSQL> what's the average grade?
The average grade is 83.5
GenSQL> delete students who failed
Found 0 students with grade < 60. No deletions made.
GenSQL> list all students
Found 2 students:
- Alice (grade: 95)
- Bob (grade: 72)
```
**No SQL written. No schema defined. Just conversations.**
---
## 🏗 Architecture
```
User Input (Natural Language)
↓
GenSQL CLI
↓
LLM Brain Core (GPT-4o)
↓
Tool Interface (Strict Gateway)
↓
Memory Layer (JSON for v0)
```
### The Brain-First Design
**Key principle:** The LLM is the _only_ component allowed to access memory.
- Memory operations happen **exclusively** through tools:
- `memory_write` - Create records
- `memory_read` - Query with filters
- `memory_update` - Modify existing data
- `memory_delete` - Remove records
- `memory_aggregate` - Calculate sum, avg, count, etc.
- `memory_inspect_schema` - Understand structure dynamically
- Application code **never** touches the database directly
- The LLM infers structure, validates intent, and performs operations
- All actions are auditable through tool call logs
**Why this matters:**
- Prevents hallucinated data (LLM can't invent records)
- Natural language → structured operations pipeline
- User intent → system actions mapping
- Flexible schemas without chaos
---
## 🛠 Technical Features (v0)
### 1. Semantic Understanding
```
"show me high-performing students"
→ Inspects schema to find grade field
→ Interprets "high-performing" as grade >= 80
→ Returns filtered results
```
### 2. Comparison Operators
```
"sales over 1000" → amount >= 1000
"students under 70" → grade < 70
"not in North region" → region != "North"
```
### 3. Smart Aggregations
```
"how many students?" → COUNT
"average sale amount?" → AVG
"total revenue?" → SUM
```
### 4. Dynamic Collections
```
"add a project named GenSQL with status active"
→ Creates "projects" collection automatically
→ Infers structure from context
```
### 5. Schema Flexibility
```
Field name changes: firstName → first_name
Queries still work - LLM adapts automatically
```
---
## 💾 Memory Model (v0)
Simple JSON storage for prototyping:
```json
{
"students": [
{ "id": "students_1", "name": "Alice", "grade": 95 },
{ "id": "students_2", "name": "Bob", "grade": 72 }
],
"sales": [{ "id": "sales_1", "region": "North", "amount": 1200 }]
}
```
**Trade-offs:**
- ✅ Easy to inspect and debug
- ✅ No setup required
- ✅ Perfect for prototypes
- ❌ Not production-ready
- ❌ No concurrency control
- ❌ Limited performance
---
## 📊 Use Case: Teacher's Assistant
Instead of building a custom UI with filters and buttons:
```bash
# Natural interaction
GenSQL> remember that Alice scored 95 on the math test
GenSQL> Bob got 72 in math
GenSQL> show me students struggling in math
→ Returns students with score < 70
GenSQL> what's the class average?
→ The average score is 83.5
GenSQL> delete the student named Bob
→ Deleted student Bob (math score: 72)
```
**Result:** Teacher gets data access without learning SQL or waiting for custom UI development.
---
## 🔮 Roadmap
**Phase 1: Intelligence (Current)**
- ✅ Natural language understanding
- ✅ Semantic field matching
- ✅ Smart aggregations
- ✅ Dynamic schemas
**Phase 2: Scale**
- [ ] Swap JSON → SQLite (LLM-gated)
- [ ] Fuzzy matching for typos
- [ ] Relationship inference (joins)
- [ ] Multi-step reasoning
**Phase 3: Production**
- [ ] Transaction support
- [ ] Concurrency handling
- [ ] Performance optimization
- [ ] REST API / Web UI
- [ ] Multi-user access control
---
## 🚧 Current Limitations
GenSQL v0 is intentionally simple to prove the concept:
- **Single-file storage** (JSON) - no scalability
- **No transactions** - operations aren't atomic
- **No concurrency** - single-user only
- **LLM latency** - 500ms-5s per query (vs <10ms for SQL)
- **LLM cost** - Every query hits the API ($$$)
- **No strict schemas** - flexibility vs validation trade-off
---
## 🎯 What GenSQL Is (and Isn't)
**GenSQL IS:**
- ✅ A natural language interface to structured data
- ✅ A cognitive layer that understands context and intent
- ✅ An exploration of LLM-first data systems
- ✅ A portfolio/research project
**GenSQL IS NOT:**
- ❌ A replacement for PostgreSQL or MySQL
- ❌ Production-ready (yet)
- ❌ Suitable for high-frequency or mission-critical workloads
- ❌ Faster than traditional databases (it's smarter, not faster)
---
## 🧪 Philosophy
**GenSQL explores a simple question:**
> What if a database worked more like a human mind?
Humans don't think in SQL. We think in questions:
- "Who are my best customers?"
- "Which students need help?"
- "What are our recent wins?"
GenSQL bridges that gap. It's not about replacing databases—it's about making them _accessible_ to how humans naturally think about data.
---
## 🔧 Technical Details
**Built with:**
- Python 3.11+
- OpenAI GPT-4o Mini (tool calling)
- JSON storage (v0)
**Architecture:**
- Agent-based design
- Tool-calling interface
- Strict memory access control
- Conversational state management
**Inspired by:**
- Modern LLM agent architectures
- Tool-use paradigms (ReAct, function calling)
- Human memory and cognition patterns
---
## 📝 Example Conversations
### Business Intelligence
```
GenSQL> what were our top 3 products last quarter?
GenSQL> show me customers who haven't ordered in 90 days
GenSQL> calculate total revenue by region
```
### Data Cleanup
```
GenSQL> find duplicate people entries
GenSQL> delete all test records
GenSQL> update everyone in North region to East
```
### Exploration
```
GenSQL> what collections do we have?
GenSQL> show me everything in projects
GenSQL> what fields exist in the students collection?
```
---
## 🙌 Motivation
Built to demonstrate:
1. **LLM tool calling** in a real application
2. **Brain-first architecture** - AI as the controller, not assistant
3. **Human-centric data interaction** - conversations over commands
**By Mozay** - [zemulax.com](https://zemulax.com)
---
## 📚 Learn More
- [Technical deep-dive](#) _(coming soon)_
- [Demo video](#) _(coming soon)_
- [Architecture decisions](#) _(coming soon)_
---
**Status:** v0 - Experimental prototype
**License:** MIT
**Contributions:** Open to feedback and ideas!