Skip to content

An AI-powered data analyst that translates plain English questions into SQL queries to analyze a sales dataset using Python and Google's Gemini AI.

Notifications You must be signed in to change notification settings

Squaddyy/AI-SQL-Analyst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

AI-Powered SQL Analyst for Complex Datasets

🚀 Overview

This project demonstrates a powerful application of Generative AI to democratize data analysis. It's a Python tool that allows a non-technical user to ask complex questions in plain English about a large sales dataset. The script uses the Google Gemini Pro AI model to translate the question into a precise SQL query, which is then executed against a database to find the answer.

This moves beyond simple data retrieval and showcases how AI can act as an intelligent "data analyst," enabling deep insights without requiring any knowledge of SQL.


⚙️ How It Works

The script operates on a sample of the real-world "Superstore" dataset and follows a simple pipeline:

  1. Database Setup: A local SQLite database is created and populated with thousands of sales records.
  2. User Input: A user defines a complex business question in English.
  3. AI Translation: The question and the database schema are sent to the Gemini Pro model. The AI is prompted to act as an expert SQL analyst and returns a syntactically correct SQL query.
  4. Database Execution & Output: The Python script executes the AI-generated SQL query and displays the final, structured answer using Pandas.

🛠️ Technologies Used

  • Python: The core programming language.
  • Google Gemini Pro API: For state-of-the-art Natural Language to SQL translation.
  • SQL (SQLite): For database creation and querying on a realistic dataset.
  • Pandas: For executing SQL queries and displaying the final data.

💡 Sample Output

Here is an example of the script translating a complex request into a precise answer:

Human Question: 'What is the most profitable product in California?'
AI-Generated SQL: 'SELECT product_name, SUM(profit) AS total_profit FROM sales WHERE state = "California" GROUP BY product_name ORDER BY total_profit DESC LIMIT 1;'

--- Final Answer from Superstore Database ---
                                product_name  total_profit
0  Mitel 5320 IP Phone VoIP phone       90.7152

▶️ How to Run

  1. Clone the repository:

    git clone [https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git](https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git)
    cd YOUR_REPOSITORY_NAME
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up your API Key: This script requires a Google Gemini API key. Set it as an environment variable for security:

    export GEMINI_API_KEY='YOUR_API_KEY_HERE'
  4. Run the script:

    python sql_analyzer_superstore.py

    The script will create a temporary superstore.db file, run the analysis, and print the final results to the console.

About

An AI-powered data analyst that translates plain English questions into SQL queries to analyze a sales dataset using Python and Google's Gemini AI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages