Skip to content

Addyk-24/NL2SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗄️ NL2SQL - Model Pipeline for Natural language to SQL QUERY

🚀 Overview

In many organizations, valuable data lives in relational databases, but accessing it requires SQL expertise. This project bridges that gap by allowing users to:

  • Ask questions in natural language

  • Upload ER diagram images

  • Automatically generate validated SQL queries

All without writing SQL manually.

❓Problem Statement

In many organizations, data are stored in databases, but non-technical users can’t query it directly because they don’t know SQL. They rely on data analysts for even simple queries like.

Non-technical users struggle to query databases because:

  • SQL has a steep learning curve

  • Even simple queries require analyst intervention

  • Complex queries take time and are error-prone

Example:

“Show me the total sales in 2024 by region.”

  • Generating Effective and Complex for Non Technical Folk can be overwhelming
  • Helps in generating complex query in seconds and saves time and cost for that specific organization

📖 Blog / Write-up

I’ve written a detailed blog that walks through the thinking, trade-offs, and occasional “why is this not working?” moments :) behind building this NL2SQL pipeline.

  • How ER diagrams can be turned into executable schemas
  • The design decisions that didn’t make it into the final code (for good reasons)
  • And what actually breaks when theory meets production

Here's the link 👉 Medium

🧠 Solution

Build NL2SQL Pipline that helps users to quickly generation query regarding sql and also generate sql query by ER diagram We built an NL2SQL pipeline that converts:

  • 📝 Natural language queries → SQL
  • 🖼️ ER diagram images → Database schema → SQL And many more features...
User Input (Natural Language)
          ↓
   Text Preprocessing
          ↓
   Schema Understanding
          ↓
   NL2SQL Model (LLM or Encoder-Decoder)
          ↓
   SQL Query Generator
          ↓
   SQL Validator & Executor
          ↓
   Database (e.g., PostgreSQL/MySQL)
          ↓
   Results Display (Frontend/UI)

✨ Key Features

  • 🔤 Natural Language → SQL generation
  • 🖼️ ER Diagram image → Schema extraction → SQL
  • 🧠 LLM-based reasoning with schema awareness
  • 🔐 SQL security & validation checks
  • 📋 Copy-ready SQL output
  • 🧪 Syntax validation before execution

📊 Evaluation Metrics

  • Syntactic Validit
    • Used syntactic Metrics that Observes if the query syntax is valid or invalid so if valid then query can be parseable
  • Checks whether generated SQL is parseable
  • Schema Consistency
    • Ensures correct table & column references
  • Foreign Key Validation
  • Security Rules
    • Blocks unsafe SQL patterns

Implementation

cd nl2sql
pip install requirements.txt
cd .\frontend
npm run dev
cd .\backend
uv run .\main.py

🧩 Example

image

github-readme-example

📌 Future Improvements

  • 🔄 SQL execution & result visualization
  • 📊 Query optimization hints
  • 🧪 Automated test coverage
  • 🗂️ Multi-schema support
  • 🔍 Reasoning and Streaming of Response

👩‍💻 Author

Aditya Katkar

About

Model Pipeline for Natural language to SQL QUERY

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors