A robust Retrieval-Augmented Generation (RAG) system designed to bridge the gap between natural language and relational databases. This project allows users to query SQL databases using plain English, leveraging LangChain for orchestration, Qdrant for schema context retrieval, and Ollama for private, local LLM inference.
- Local-First: Runs entirely on your machine using Ollama—no API keys required for inference.
- RAG-Enhanced: Uses a vector database (Qdrant) to store database metadata, ensuring the LLM understands your specific schema before generating queries.
- Natural Language Interface: Translate complex business questions into accurate SQL joins and aggregations.
- Secure: Since it runs locally, your sensitive database schema and data never leave your infrastructure.
- Framework: LangChain
- Vector Database: Qdrant
- LLM Engine: Ollama (e.g., Llama 3, Mistral, or SQLCoder)
- Database Support: SQLite, PostgreSQL, or MySQL (via SQLAlchemy)
- Language: Python 3.10+
- Ollama: Download and install Ollama.
- Model: Pull a coding or general-purpose model:
ollama pull llama3
- Qdrant: Run Qdrant locally via Docker:
docker run -p 6333:6333 qdrant/qdrant
- Clone the Repository:
git clone https://github.com/Pratham1603/Text_2_SQL.git
cd Text_2_SQL
- Create a Virtual Environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install Dependencies:
pip install -r requirements.txt
- Configuration:
Create a
.envfile in the root directory and add your database credentials:
DB_URL=sqlite:///./example.db
QDRANT_HOST=localhost
QDRANT_PORT=6333
OLLAMA_MODEL=llama3
- Schema Indexing: The system parses your SQL schema (tables, columns, types) and generates descriptive embeddings. These are stored in Qdrant.
- Contextual Retrieval: When a user asks a question, the system retrieves the most relevant table schemas from the vector store.
- Prompt Engineering: LangChain constructs a prompt containing the user's question and the retrieved schema context.
- SQL Generation: The Ollama LLM generates a syntactically correct SQL query.
- Execution: The system executes the query against the database and returns the result in human-readable format.
Run the main application:
python main.py
Example Query:
"Show me the top 5 customers who spent the most money in 2023."
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.