Skip to content

Divyansh200301/sql-query-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SQL Query Optimization Tool# SQL Query Optimization Tool

A terminal-based Python tool for analyzing and optimizing SQL query performance across SQLite, MySQL, and PostgreSQL databases.A powerful, terminal-based Python tool for analyzing, measuring, and optimizing SQL query performance. This tool helps developers identify slow queries, provides actionable optimization suggestions, and generates comprehensive performance reports.

Features## โœจ Features

  • Multi-database support: SQLite, MySQL, PostgreSQL- ๐Ÿ”Œ Multi-Database Support: SQLite, MySQL, and PostgreSQL

  • Performance measurement with millisecond accuracy- โšก Performance Measurement: Precise execution time tracking with millisecond accuracy

  • Slow query detection based on configurable thresholds- ๐ŸŒ Slow Query Detection: Automatic identification of queries exceeding performance thresholds

  • Pattern-based optimization suggestions:- ๐Ÿ’ก Smart Optimization Suggestions: AI-powered analysis with actionable recommendations including:

    • Index recommendations for WHERE, JOIN, ORDER BY clauses - Index suggestions for WHERE, JOIN, and ORDER BY clauses

    • Detection of SELECT * anti-patterns - Detection of SELECT * anti-patterns

    • Subquery optimization - Subquery optimization recommendations

    • LIKE wildcard warnings - LIKE wildcard performance warnings

    • Missing WHERE clause detection - Missing WHERE clause detection

    • Multiple JOIN analysis - Multiple JOIN analysis

  • CSV logging of query executions- ๐Ÿ“Š Comprehensive Logging: CSV export of all query executions with performance metrics

  • Summary reports with statistics- ๐Ÿ“ˆ Summary Reports: Detailed session summaries with statistics and insights

  • Rich terminal output (optional)- ๐ŸŽจ Rich Terminal Output: Beautiful, color-coded output using Rich library (optional)

  • Interactive and batch modes- ๐Ÿ”„ Interactive & Batch Modes: Run queries interactively or execute single queries

  • ๐ŸŽฏ Modular Design: Easy to extend and customize

Installation

๐Ÿš€ Installation

Prerequisites

  • Python 3.7+### Prerequisites

  • Database connectors as needed

  • Python 3.7 or higher

Install Dependencies- Database connector for your target database

pip install -r requirements.txt### Step 1: Clone or Download
Dependencies: mysql-connector-python, psycopg2-binary, rich (optional)cd d:\SQl_Query_optimization

Configure Database

Copy .env.example to .env and fill in credentials.### Step 2: Install Dependencies

Usage```powershell

pip install -r requirements.txt

Interactive Mode```

python sql_optimizer.py --db-type sqlite --database test.db**Dependencies:**

```- `mysql-connector-python` - For MySQL support

- `psycopg2-binary` - For PostgreSQL support

### Single Query- `rich` - For enhanced terminal output (optional but recommended)

```bash

python sql_optimizer.py --db-type sqlite --database test.db --query "SELECT * FROM users"**Note:** SQLite support is included in Python's standard library.

Step 3: Configure Database (Optional)

Options

  • --threshold: Custom slow query threshold (ms)Copy the example environment file and configure your database credentials:

  • --log-file: Custom CSV log file

  • --summary-file: Custom summary file```powershell

Copy-Item .env.example .env

Architecture```

Core ComponentsEdit .env with your database credentials.

  • SQLOptimizer class: Main optimizer with connect(), execute_query(), analyze_performance(), suggest_optimizations(), log_results()

  • Interactive mode: CLI interface## ๏ฟฝ Protecting secrets and the .env file

  • Configuration: config.py for thresholds and settings

Do NOT commit your .env file (it contains sensitive credentials). This project ships a .env.example file so you can copy it to .env locally and fill in your credentials.

Detection Patterns

  • SELECT *: High severity, suggest column specificationIf you accidentally added .env to git, untrack it and commit the removal from the index (this keeps your local file but removes it from the repository):

  • Missing WHERE: Medium, suggest filters

  • Multiple JOINs: Medium, ensure indexesPowerShell commands (run from the repository root):

  • Subqueries: Medium, consider JOINs/CTEs

  • ORDER BY without LIMIT: Medium, add LIMIT```powershell

  • LIKE leading wildcard: High, restructure or use full-text# 1. Make sure .env is in .gitignore (this repo's .gitignore already includes it)

  • DISTINCT: Low, verify necessitygit status

Index Suggestions# 2. Stop tracking the file in git (keeps the local file)

Automatic recommendations for columns in WHERE, JOIN, ORDER BY.git rm --cached .env

Configuration# 3. Commit the change and push

git add .gitignore

Edit config.py:git commit -m "chore: stop tracking .env and add to .gitignore"

  • SLOW_QUERY_THRESHOLD: Default 500.0 msgit push origin main

  • WARNING_THRESHOLD: 300.0 ms```

  • MAX_ROWS_DISPLAY: 10

  • USE_RICH_FORMATTING: TrueIf .env (or other secrets) were already committed in previous commits, you should assume those secrets may be exposed and rotate them immediately. To remove a secret entirely from repository history you can use tools like git filter-repo or the BFG Repo-Cleaner. These operations rewrite history and require force-pushing โ€” use with caution.

  • LOG_FILE: 'query_log.csv'

  • SUMMARY_FILE: 'query_summary.txt'Short guidance for history cleaning:

  1. Rotate the compromised credentials right away (change passwords, API keys, tokens).
  2. Use git filter-repo or BFG to remove the file from all commits (example with BFG):
# Using BFG (Java required):
bfg --delete-files .env
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push --force
  1. Notify collaborators to reclone or reset their local copies because history was rewritten.

If you'd like, I can run the git rm --cached .env step for you and create the commit (I will not rewrite history or force-push). Tell me if you want me to perform that local git change now.

๏ฟฝ๐Ÿ“– Usage

Interactive Mode (Recommended)

SQLite

python sql_optimizer.py --db-type sqlite --database test.db

MySQL

python sql_optimizer.py --db-type mysql --host localhost --user root --password yourpass --database testdb

PostgreSQL

python sql_optimizer.py --db-type postgresql --host localhost --user postgres --password yourpass --database testdb

Single Query Execution

Execute a single query and get immediate analysis:

python sql_optimizer.py --db-type sqlite --database test.db --query "SELECT * FROM users WHERE age > 25"

Custom Slow Query Threshold

Set a custom threshold (in milliseconds) for slow query detection:

python sql_optimizer.py --db-type sqlite --database test.db --threshold 1000

Custom Output Files

Specify custom log and summary file names:

python sql_optimizer.py --db-type sqlite --database test.db --log-file my_queries.csv --summary-file my_summary.txt

๐ŸŽฏ Example Session

================================================================================
SQL QUERY OPTIMIZATION TOOL - Interactive Mode
================================================================================
Enter SQL queries to analyze (type 'exit', 'quit', or press Ctrl+C to stop)
Type 'summary' to generate a summary report
================================================================================

Enter SQL query: SELECT * FROM users WHERE email LIKE '%@gmail.com'

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Query                                                                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ SELECT * FROM users WHERE email LIKE '%@gmail.com'                      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โฑ๏ธ  Execution Time: 752.34ms

๐Ÿ“Š Results: 1,247 row(s) returned

๐Ÿ’ก Optimization Suggestions:
  โ€ข ๐ŸŒ SLOW QUERY DETECTED! Execution time: 752.34ms (threshold: 500ms)
  โ€ข โŒ Avoid SELECT *: Specify only the columns you need. This reduces data transfer.
  โ€ข โŒ LIKE with leading wildcard ('%text'): This prevents index usage.
  โ€ข ๐Ÿ” Create indexes on WHERE clause columns: email

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

๐Ÿ“Š Optimization Logic

Detection Patterns

The tool analyzes queries for the following patterns:

  1. SELECT * Usage

    • Severity: High
    • Suggestion: Specify exact columns needed
    • Impact: Reduces data transfer and improves performance
  2. Missing WHERE Clause

    • Severity: Medium
    • Suggestion: Add filters to reduce result set
    • Impact: Prevents full table scans
  3. Multiple JOINs (>3)

    • Severity: Medium
    • Suggestion: Ensure proper indexes on JOIN columns
    • Impact: Prevents cartesian products and improves JOIN efficiency
  4. Subqueries

    • Severity: Medium
    • Suggestion: Consider using JOINs or CTEs
    • Impact: Better query plan optimization
  5. ORDER BY without LIMIT

    • Severity: Medium
    • Suggestion: Add LIMIT clause
    • Impact: Reduces sorting overhead
  6. LIKE with Leading Wildcard

    • Severity: High
    • Suggestion: Restructure query or use full-text search
    • Impact: Allows index usage
  7. DISTINCT Usage

    • Severity: Low
    • Suggestion: Ensure necessity, consider GROUP BY
    • Impact: Reduces sorting/deduplication overhead

Index Suggestions

The tool automatically suggests indexes for:

  • Columns used in WHERE clauses
  • Columns used in JOIN conditions
  • Columns used in ORDER BY clauses

Performance Thresholds

  • Default Slow Query Threshold: 500ms
  • Customizable: Use --threshold flag to set your own threshold
  • Recommendations:
    • <100ms: Excellent
    • 100-300ms: Good
    • 300-500ms: Acceptable
    • 500ms: Needs optimization

๐Ÿ“ Output Files

query_log.csv

CSV file containing all executed queries with:

  • Timestamp
  • Query Type (SELECT, INSERT, etc.)
  • Execution Time
  • Slow Query Flag
  • Complete Query Text
  • Optimization Suggestions

Example:

Timestamp,Query Type,Execution Time (ms),Is Slow,Query,Suggestions
2025-10-15 14:32:10,SELECT,752.34,Yes,"SELECT * FROM users WHERE email LIKE '%@gmail.com'","๐ŸŒ SLOW QUERY DETECTED! | โŒ Avoid SELECT * | ๐Ÿ” Create indexes on WHERE clause columns: email"

query_summary.txt

Comprehensive summary report with:

  • Performance statistics
  • Query type breakdown
  • Detailed query log
  • Recommendations

๐Ÿ› ๏ธ Architecture

Core Components

sql_optimizer.py
โ”œโ”€โ”€ SQLOptimizer (Main Class)
โ”‚   โ”œโ”€โ”€ connect()              # Database connection management
โ”‚   โ”œโ”€โ”€ execute_query()        # Query execution with timing
โ”‚   โ”œโ”€โ”€ analyze_performance()  # Pattern detection and analysis
โ”‚   โ”œโ”€โ”€ suggest_optimizations()# Generate recommendations
โ”‚   โ”œโ”€โ”€ log_results()         # CSV logging
โ”‚   โ””โ”€โ”€ generate_summary_report() # Summary generation
โ”œโ”€โ”€ interactive_mode()         # Interactive CLI interface
โ””โ”€โ”€ main()                    # CLI argument parsing

Helper Methods

  • _extract_table_names(): Identify tables in query
  • _extract_where_columns(): Extract WHERE clause columns
  • _extract_join_columns(): Extract JOIN condition columns
  • _extract_order_by_columns(): Extract ORDER BY columns
  • _display_rich_results(): Formatted output with Rich
  • _display_plain_results(): Plain text output

๐Ÿ”ง Configuration

config.py

Customize default settings:

# Performance thresholds (milliseconds)
SLOW_QUERY_THRESHOLD = 500.0
WARNING_THRESHOLD = 300.0

# Display settings
MAX_ROWS_DISPLAY = 10
USE_RICH_FORMATTING = True

# File settings
LOG_FILE = 'query_log.csv'
SUMMARY_FILE = 'query_summary.txt'

๐Ÿ“š Advanced Usage

Programmatic Usage

from sql_optimizer import SQLOptimizer

# Initialize
optimizer = SQLOptimizer(
    db_type='mysql',
    db_config={
        'host': 'localhost',
        'user': 'root',
        'password': 'pass',
        'database': 'testdb'
    },
    slow_threshold=500.0
)

# Connect
optimizer.connect()

# Execute query
results, exec_time, error = optimizer.execute_query("SELECT * FROM users")

# Analyze
analysis = optimizer.analyze_performance("SELECT * FROM users", exec_time)
suggestions = optimizer.suggest_optimizations("SELECT * FROM users", analysis)

# Display
optimizer.display_results("SELECT * FROM users", results, exec_time, suggestions)

# Log
optimizer.log_results("SELECT * FROM users", analysis, suggestions)

# Cleanup
optimizer.disconnect()

Batch Query Processing

Create a file queries.txt with SQL queries (one per line) and process them:

with open('queries.txt', 'r') as f:
    queries = f.readlines()

for query in queries:
    results, exec_time, error = optimizer.execute_query(query.strip())
    if not error:
        analysis = optimizer.analyze_performance(query, exec_time)
        suggestions = optimizer.suggest_optimizations(query, analysis)
        optimizer.log_results(query, analysis, suggestions)

optimizer.generate_summary_report()

๐Ÿงช Testing with Sample Data

See examples/ directory for:

  • sample_data.sql: Sample database schema and data
  • test_queries.sql: Example queries to test the tool
  • test_optimizer.py: Unit tests for the optimizer

Run Tests

# Create sample database
sqlite3 test.db < examples/sample_data.sql

# Run optimizer with sample queries
python sql_optimizer.py --db-type sqlite --database test.db

๐ŸŽจ Output Examples

With Rich Library (Recommended)

Beautiful, color-coded output with:

  • Bordered panels for queries
  • Color-coded execution times (green/red)
  • Formatted tables for results
  • Emoji icons for visual clarity

Without Rich Library

Plain text output with:

  • Clear section separators
  • Structured formatting
  • All essential information

๐Ÿš€ Performance Improvements

Users typically see:

  • 35-50% reduction in query execution time after applying suggestions
  • 60-70% improvement with proper indexing
  • 80%+ improvement by eliminating SELECT * in large tables

๐Ÿค Contributing

Contributions are welcome! Areas for enhancement:

  • Additional database support (Oracle, SQL Server)
  • Query rewriting capabilities
  • EXPLAIN plan analysis
  • Real-time monitoring mode
  • Web dashboard interface

๐Ÿ“ License

MIT License - Feel free to use and modify for your projects.

๐Ÿ› Troubleshooting

"Module not found" errors

Install required dependencies:

pip install -r requirements.txt

Database connection issues

  1. Verify database credentials
  2. Check database server is running
  3. Ensure proper network connectivity
  4. Verify database user permissions

Rich formatting not working

Rich is optional. The tool falls back to plain text if Rich is not installed:

pip install rich

๐Ÿ“ž Support

For issues, questions, or suggestions:

  • Create an issue in the repository
  • Check existing documentation
  • Review example files in examples/ directory

๐ŸŽฏ Roadmap

  • EXPLAIN plan integration
  • Query rewriting engine
  • Real-time monitoring dashboard
  • Integration with popular ORMs (SQLAlchemy, Django ORM)
  • Machine learning-based optimization suggestions
  • Query caching analysis
  • Connection pooling recommendations

Happy Optimizing! ๐Ÿš€

Made with โค๏ธ for developers who care about performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages