-
Notifications
You must be signed in to change notification settings - Fork 1
Contributing
codeadeel edited this page Apr 7, 2026
·
3 revisions
Thank you for your interest in contributing to SQL Query Engine! This page covers development setup, project conventions, and how to submit changes.
- Python 3.10+
- Docker and Docker Compose
- A running PostgreSQL instance (for integration testing)
- A running Redis instance
- Access to an OpenAI-compatible LLM server (Ollama is the easiest for local dev)
git clone https://github.com/codeadeel/sqlqueryengine.git
cd sqlqueryengine
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtThe quickest way to get PostgreSQL and Redis running locally:
# Start just Redis from the compose file
docker compose up redis -d
# Or use standalone Redis
docker run -d --name redis -p 6379:6379 redis:latest
# For Ollama (local LLM)
ollama serve
ollama pull qwen2.5-coder:7bexport LLM_BASE_URL="http://localhost:11434/v1"
export LLM_MODEL="qwen2.5-coder:7b"
export LLM_API_KEY="ollama"
export LLM_TEMPERATURE="0.7"
export POSTGRES_HOST="localhost"
export POSTGRES_PORT="5432"
export POSTGRES_DB="your_dev_db"
export POSTGRES_USER="postgres"
export POSTGRES_PASSWORD="your_password"
export REDIS_HOST="localhost"
export REDIS_PORT="6379"
export REDIS_PASSWORD=""
export REDIS_DB="0"python run.pysqlqueryengine/
βββ sqlQueryEngine/ # Main Python package
β βββ __init__.py # Public exports
β βββ main.py # FastAPI app + native routes
β βββ engine.py # Pipeline orchestrator
β βββ queryGenerator.py # Stage 1: NL β SQL
β βββ queryEvaluator.py # Stage 2: execute + repair
β βββ openaiCompat.py # OpenAI-compatible API
β βββ dbHandler.py # PostgreSQL handler (read-only)
β βββ sessionManager.py # Redis session manager
β βββ connConfig.py # Configuration loading
β βββ promptTemplates.py # LLM prompt definitions (4 templates)
β βββ sqlGuidelines.py # PostgreSQL best-practices (2 corpora)
βββ evaluation/ # Evaluation framework
β βββ shared/ # Shared utilities
β β βββ resultComparator.py # Order-independent result comparison
β β βββ resourceMetrics.py # Wall time, memory, throughput tracking
β βββ synthetic/ # Synthetic evaluation (controlled environment)
β β βββ entrypoint.py # Pipeline orchestrator
β β βββ evalRunner.py # 3-config ablation runner
β β βββ evalConfig.py # Environment-driven config
β β βββ seedData.py # Database seeding (Faker)
β β βββ schemaDefinitions.py # DDL for 3 evaluation databases
β β βββ questionRunner.py # Gold query executor
β β βββ scoreReport.py # Summary table generation
β β βββ questions/ # 120 gold questions (40 per DB)
β β βββ results/runs/ # Archived results per model
β βββ bird/ # BIRD benchmark evaluation
β βββ birdEntrypoint.py # Pipeline orchestrator
β βββ birdDataLoader.py # Dataset loading + SQL dialect conversion
β βββ sqliteToPostgres.py # SQLite β PostgreSQL migration
β βββ birdEvalRunner.py # 3-config ablation runner for BIRD
β βββ birdScoreReport.py # BIRD-specific scoring + baselines
β βββ birdConfig.py # BIRD environment-driven config
β βββ bird_data/ # BIRD dataset (gitignored, download separately)
β βββ bird_results/runs/ # Archived results per model
βββ Dockerfile # Multi-stage Docker build (3 stages)
βββ docker-compose.yml # Production stack (engine + Redis + OpenWebUI)
βββ docker-compose-synthetic-evaluation.yml # Synthetic evaluation stack
βββ docker-compose-bird-evaluation.yml # BIRD benchmark stack
βββ requirements.txt # Python dependencies
βββ run.py # Uvicorn entry point
βββ curlCommands.sh # API usage examples
βββ .gitignore
See the Module Reference page for detailed documentation of each module.
- Follow PEP 8 with the exception of camelCase for variable and function names (project convention)
- Use type hints for function signatures
- Use Pydantic models for request/response validation
- Use LangChain patterns for LLM interactions
-
Files: camelCase (
queryGenerator.py,dbHandler.py) -
Classes: PascalCase (
SQLQueryEngine,QueryGenerator) -
Methods: camelCase (
getUserChatContext,queryExecutor) -
Constants: UPPER_SNAKE_CASE (
SPLIT_IDENTIFIER,DEFAULT_RETRY_COUNT)
- Separation of concerns: Each module has a single responsibility
- Dependency injection: Connection params passed down from config, not imported globally
- Multi-strategy response parsing: LLM responses are parsed via a 5-strategy cascade (JSON β embedded JSON β code blocks β regex β raw text) rather than relying on structured output or function calling β this ensures compatibility with any model
-
Read-only safety: Database connections always enforced as read-only via
conn.set_read_only(True) - Graceful degradation: Evaluator has 3-tier fallback for schema context resolution
- Early-accept: Queries returning rows are accepted immediately without an LLM call, preventing regressions
- Best-result tracking: If retries exhaust, the best result seen across all attempts is returned
- Define the Pydantic request model in
main.py - Add the route handler in
main.py(native) oropenaiCompat.py(OpenAI-compat) - If needed, add new methods to
engine.py - Update
curlCommands.shwith example calls - Update the wiki API Reference and Usage Guide
Prompt templates live in promptTemplates.py. When changing prompts:
- Test with multiple LLM models to ensure broad compatibility
- Verify response parsing still extracts SQL correctly (check
_parseResponse()and_parseEvalResponse()) - Test with various database schemas (simple and complex)
- Check that the repair loop still functions
If adding features that require new LLM capabilities:
- Define the Pydantic output schema in the relevant module
- Add a corresponding parser method (follow the multi-strategy pattern in
_parseResponse()) - Add appropriate error handling for malformed LLM responses
- Test with different model sizes (small models may struggle with complex schemas)
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes
- Test the endpoints using
curlCommands.shas a reference - Commit with a clear message describing the change
- Push and open a Pull Request
- Keep PRs focused β one feature or fix per PR
- Include curl examples demonstrating the change (if API-related)
- Update documentation (wiki pages, curlCommands.sh, README) as needed
Here are some areas where contributions are welcome:
- Additional LLM providers: Direct integrations beyond the OpenAI-compatible interface
- Query result formatting: Better markdown/HTML rendering of results
- Schema change detection: Automatic invalidation of cached schema when the database schema changes
- Token counting: Implement actual token usage tracking for the OpenAI-compat endpoint
- Write mode: Optional write-capable mode for controlled INSERT/UPDATE operations
- Test coverage: Unit tests for individual modules (currently only integration testing via curl exists)
- Multi-database support: Ability to query multiple databases in a single session
- Query history: Persistent storage of past queries and results
- Rate limiting: Request throttling for the API endpoints
π Paper: arXiv:2604.16511 | π Dataset: Hugging Face | π» Source: GitHub
SQL Query Engine
Design
Setup
API
Internals
Evaluation
Help