Skip to content

Zemulax/GenSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# GenSQL 🧠📊

**Stop writing queries. Start asking questions.**

GenSQL is a data layer that understands intent, not just syntax. Think of it as having a librarian who knows your data inside-out, speaks your language, and adapts as your needs change.

> **Not a replacement for Postgres. A better interface for humans.**

---

## 💡 The Problem

Traditional databases are powerful but dumb:

- You need to know exact table/column names
- You must write precise queries with rigid syntax
- Schema changes break your queries
- Non-technical users need a developer for every question

**What if your database worked like talking to a librarian?**

You don't ask a librarian:

> "Give me items from shelf 3B where Dewey decimal is between 500-599"

You ask:

> "I need books about space exploration"

And they _understand_ what you mean.

---

## ✨ What GenSQL Does Differently

**Traditional Database:**

```sql
SELECT * FROM students WHERE grade_percentage >= 60
-- Requires knowing schema, exact field names, SQL syntax
```

**GenSQL:**

```
show me students who passed
→ Figures out what "passed" means in your context
→ Finds the right collection and fields automatically
→ Returns results with explanation
```

### Key Features

- **Understands intent:** "students who passed" vs rigid SELECT queries
- **Semantic field matching:** Don't know if it's "firstName" or "first_name"? GenSQL figures it out
- **Comparison operators:** Handles `>=`, `<`, `!=` naturally from plain English
- **Smart aggregations:** "what's the average grade?" → calculates automatically
- **Domain awareness:** Learns what "failed", "active", or "recent" means in your context
- **Schema evolution:** Field names change? Queries still work
- **Natural language + SQL:** Use whichever feels right

---

## 🎯 Who Is This For?

GenSQL shines in specific scenarios:

✅ **Internal business tools** - Sales teams need data but don't know SQL  
✅ **Rapid prototyping** - Schema keeps changing, don't want to rewrite queries  
✅ **Domain-specific apps** - HR tools, student tracking, CRM where business logic is complex  
✅ **Non-technical users** - Teachers, managers, analysts who think in questions not queries

❌ **Not for:** High-frequency operations, mission-critical systems, cost-sensitive production apps

---

## 🚀 Quick Example

```bash
GenSQL> add a student named Alice with grade 95
✓ Added student Alice (students_1)

GenSQL> add student Bob grade 72
✓ Added student Bob (students_2)

GenSQL> show me students with grades over 80
Found 1 student:
- Alice (grade: 95)

GenSQL> what's the average grade?
The average grade is 83.5

GenSQL> delete students who failed
Found 0 students with grade < 60. No deletions made.

GenSQL> list all students
Found 2 students:
- Alice (grade: 95)
- Bob (grade: 72)
```

**No SQL written. No schema defined. Just conversations.**

---

## 🏗 Architecture

```
   User Input (Natural Language)
         ↓
   GenSQL CLI
         ↓
   LLM Brain Core (GPT-4o)
         ↓
   Tool Interface (Strict Gateway)
         ↓
   Memory Layer (JSON for v0)
```

### The Brain-First Design

**Key principle:** The LLM is the _only_ component allowed to access memory.

- Memory operations happen **exclusively** through tools:

  - `memory_write` - Create records
  - `memory_read` - Query with filters
  - `memory_update` - Modify existing data
  - `memory_delete` - Remove records
  - `memory_aggregate` - Calculate sum, avg, count, etc.
  - `memory_inspect_schema` - Understand structure dynamically

- Application code **never** touches the database directly
- The LLM infers structure, validates intent, and performs operations
- All actions are auditable through tool call logs

**Why this matters:**

- Prevents hallucinated data (LLM can't invent records)
- Natural language → structured operations pipeline
- User intent → system actions mapping
- Flexible schemas without chaos

---

## 🛠 Technical Features (v0)

### 1. Semantic Understanding

```
"show me high-performing students"
→ Inspects schema to find grade field
→ Interprets "high-performing" as grade >= 80
→ Returns filtered results
```

### 2. Comparison Operators

```
"sales over 1000"  → amount >= 1000
"students under 70" → grade < 70
"not in North region" → region != "North"
```

### 3. Smart Aggregations

```
"how many students?" → COUNT
"average sale amount?" → AVG
"total revenue?" → SUM
```

### 4. Dynamic Collections

```
"add a project named GenSQL with status active"
→ Creates "projects" collection automatically
→ Infers structure from context
```

### 5. Schema Flexibility

```
Field name changes: firstName → first_name
Queries still work - LLM adapts automatically
```

---

## 💾 Memory Model (v0)

Simple JSON storage for prototyping:

```json
{
  "students": [
    { "id": "students_1", "name": "Alice", "grade": 95 },
    { "id": "students_2", "name": "Bob", "grade": 72 }
  ],
  "sales": [{ "id": "sales_1", "region": "North", "amount": 1200 }]
}
```

**Trade-offs:**

- ✅ Easy to inspect and debug
- ✅ No setup required
- ✅ Perfect for prototypes
- ❌ Not production-ready
- ❌ No concurrency control
- ❌ Limited performance

---

## 📊 Use Case: Teacher's Assistant

Instead of building a custom UI with filters and buttons:

```bash
# Natural interaction
GenSQL> remember that Alice scored 95 on the math test
GenSQL> Bob got 72 in math
GenSQL> show me students struggling in math
→ Returns students with score < 70

GenSQL> what's the class average?
→ The average score is 83.5

GenSQL> delete the student named Bob
→ Deleted student Bob (math score: 72)
```

**Result:** Teacher gets data access without learning SQL or waiting for custom UI development.

---

## 🔮 Roadmap

**Phase 1: Intelligence (Current)**

- ✅ Natural language understanding
- ✅ Semantic field matching
- ✅ Smart aggregations
- ✅ Dynamic schemas

**Phase 2: Scale**

- [ ] Swap JSON → SQLite (LLM-gated)
- [ ] Fuzzy matching for typos
- [ ] Relationship inference (joins)
- [ ] Multi-step reasoning

**Phase 3: Production**

- [ ] Transaction support
- [ ] Concurrency handling
- [ ] Performance optimization
- [ ] REST API / Web UI
- [ ] Multi-user access control

---

## 🚧 Current Limitations

GenSQL v0 is intentionally simple to prove the concept:

- **Single-file storage** (JSON) - no scalability
- **No transactions** - operations aren't atomic
- **No concurrency** - single-user only
- **LLM latency** - 500ms-5s per query (vs <10ms for SQL)
- **LLM cost** - Every query hits the API ($$$)
- **No strict schemas** - flexibility vs validation trade-off

---

## 🎯 What GenSQL Is (and Isn't)

**GenSQL IS:**

- ✅ A natural language interface to structured data
- ✅ A cognitive layer that understands context and intent
- ✅ An exploration of LLM-first data systems
- ✅ A portfolio/research project

**GenSQL IS NOT:**

- ❌ A replacement for PostgreSQL or MySQL
- ❌ Production-ready (yet)
- ❌ Suitable for high-frequency or mission-critical workloads
- ❌ Faster than traditional databases (it's smarter, not faster)

---

## 🧪 Philosophy

**GenSQL explores a simple question:**

> What if a database worked more like a human mind?

Humans don't think in SQL. We think in questions:

- "Who are my best customers?"
- "Which students need help?"
- "What are our recent wins?"

GenSQL bridges that gap. It's not about replacing databases—it's about making them _accessible_ to how humans naturally think about data.

---

## 🔧 Technical Details

**Built with:**

- Python 3.11+
- OpenAI GPT-4o Mini (tool calling)
- JSON storage (v0)

**Architecture:**

- Agent-based design
- Tool-calling interface
- Strict memory access control
- Conversational state management

**Inspired by:**

- Modern LLM agent architectures
- Tool-use paradigms (ReAct, function calling)
- Human memory and cognition patterns

---

## 📝 Example Conversations

### Business Intelligence

```
GenSQL> what were our top 3 products last quarter?
GenSQL> show me customers who haven't ordered in 90 days
GenSQL> calculate total revenue by region
```

### Data Cleanup

```
GenSQL> find duplicate people entries
GenSQL> delete all test records
GenSQL> update everyone in North region to East
```

### Exploration

```
GenSQL> what collections do we have?
GenSQL> show me everything in projects
GenSQL> what fields exist in the students collection?
```

---

## 🙌 Motivation

Built to demonstrate:

1. **LLM tool calling** in a real application
2. **Brain-first architecture** - AI as the controller, not assistant
3. **Human-centric data interaction** - conversations over commands

**By Mozay** - [zemulax.com](https://zemulax.com)

---

## 📚 Learn More

- [Technical deep-dive](#) _(coming soon)_
- [Demo video](#) _(coming soon)_
- [Architecture decisions](#) _(coming soon)_

---

**Status:** v0 - Experimental prototype  
**License:** MIT  
**Contributions:** Open to feedback and ideas!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages