This project demonstrates a powerful application of Generative AI to democratize data analysis. It's a Python tool that allows a non-technical user to ask complex questions in plain English about a large sales dataset. The script uses the Google Gemini Pro AI model to translate the question into a precise SQL query, which is then executed against a database to find the answer.
This moves beyond simple data retrieval and showcases how AI can act as an intelligent "data analyst," enabling deep insights without requiring any knowledge of SQL.
The script operates on a sample of the real-world "Superstore" dataset and follows a simple pipeline:
- Database Setup: A local SQLite database is created and populated with thousands of sales records.
- User Input: A user defines a complex business question in English.
- AI Translation: The question and the database schema are sent to the Gemini Pro model. The AI is prompted to act as an expert SQL analyst and returns a syntactically correct SQL query.
- Database Execution & Output: The Python script executes the AI-generated SQL query and displays the final, structured answer using Pandas.
- Python: The core programming language.
- Google Gemini Pro API: For state-of-the-art Natural Language to SQL translation.
- SQL (SQLite): For database creation and querying on a realistic dataset.
- Pandas: For executing SQL queries and displaying the final data.
Here is an example of the script translating a complex request into a precise answer:
Human Question: 'What is the most profitable product in California?'
AI-Generated SQL: 'SELECT product_name, SUM(profit) AS total_profit FROM sales WHERE state = "California" GROUP BY product_name ORDER BY total_profit DESC LIMIT 1;'
--- Final Answer from Superstore Database ---
product_name total_profit
0 Mitel 5320 IP Phone VoIP phone 90.7152
-
Clone the repository:
git clone [https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git](https://github.com/YOUR_USERNAME/YOUR_REPOSITORY_NAME.git) cd YOUR_REPOSITORY_NAME -
Install dependencies:
pip install -r requirements.txt
-
Set up your API Key: This script requires a Google Gemini API key. Set it as an environment variable for security:
export GEMINI_API_KEY='YOUR_API_KEY_HERE'
-
Run the script:
python sql_analyzer_superstore.py
The script will create a temporary
superstore.dbfile, run the analysis, and print the final results to the console.