# Ungraded Lab:  AI-Assisted Query Optimization

## 📋 Overview 
In this lab, you'll explore how to leverage AI tools to optimize SQL queries for improved performance. As a data analyst at BookCycle, you'll use AI suggestions to enhance query efficiency, focusing on  query restructuring. This hands-on experience will help you understand how AI can be a powerful ally in query optimization.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:
- Use AI tools to analyze SQL queries for potential optimizations
- Interpret AI suggestions for query restructuring and indexing
- Implement AI-recommended optimizations to improve query performance

## 📚 Dataset Information
We'll be working with the BookCycle database, which includes information about books, customers, and transactions. The database contains three main tables:
1. <b>books:</b> Contains information about the book inventory
2. <b>customers:</b> Stores customer data
3. <b>transactions:</b> Records all book purchase transactions

## 🖥️ Activities

### Activity 1: Connecting to the Database and Initial Query 

As a data analyst at BookCycle, you need to retrieve information about high-value customers and their purchases. Let's start by connecting to the database and writing an initial query.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

# Connect to the database
conn = sqlite3.connect('bookcycle.db')

✅ Database setup complete: Tables created and populated with data!


<b>Step 2:</b> Write an initial query to get high-value customers and their purchases:

In [2]:
query = """
SELECT c.customer_id, COUNT(t.transaction_id) as purchase_count, SUM(t.sale_price) as total_spent
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
GROUP BY c.customer_id
HAVING COUNT(t.transaction_id) > 5
ORDER BY total_spent DESC
LIMIT 10;
"""

# Execute the query and display results
df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,customer_id,purchase_count,total_spent
0,C1023,13,155.87
1,C1089,13,154.87
2,C1012,14,153.86
3,C1034,13,146.87
4,C1045,12,136.88
5,C1078,12,132.88
6,C1067,12,119.88
7,C1056,11,113.89


 <b>💡 Tip:</b> This query joins the customers and transactions tables to find customers with more than 5 purchases, ordered by total amount spent.

### Activity 2: AI-Assisted Query Analysis  

Now that we have our initial query, let's use an AI tool to analyze it for potential optimizations

<b>Step 1:</b> We will simulate this operation in the following code. Alternatively you can paste your query into an AI, and ask the AI for potential optimizations. 

<b>Step 2:</b> Review the AI suggestions and think about how they might improve the query performance.

 <b>💡 Tip:</b> AI suggestions are starting points. Always validate them against your specific database and use case.

### Activity 3: Implementing AI Suggestions 

Based on the AI analysis, let's modify our query to potentially improve its performance.

<b>Step 1:</b> Implement the suggested optimizations. As per the response you receive from your AI analysis, make changes to optimise the query further and compare the results.

In [3]:
#Make changes to the query below as per the AI suggestion you received

optimized_query = """
SELECT c.customer_id, COUNT(t.transaction_id) as purchase_count, SUM(t.sale_price) as total_spent
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
GROUP BY c.customer_id
HAVING COUNT(t.transaction_id) > 5
ORDER BY total_spent DESC
LIMIT 10;

"""

# Execute the optimized query and display results
df_optimized = pd.read_sql_query(optimized_query, conn)
display(df_optimized)

Unnamed: 0,customer_id,purchase_count,total_spent
0,C1023,13,155.87
1,C1089,13,154.87
2,C1012,14,153.86
3,C1034,13,146.87
4,C1045,12,136.88
5,C1078,12,132.88
6,C1067,12,119.88
7,C1056,11,113.89


<b>Step 2:</b> Compare the results with the original query. Are they the same?

 <b>💡 Tip:</b> While the results should be the same, the optimized query might perform better, especially on larger datasets.

### Close the Connection
It's good practice to close the database connection when you're done

In [4]:
# Close the database connection 
conn.close()

#### ⚙️ Test Your Work:
- Run both the original and optimized queries
- Compare the output to ensure they produce the same results
- If available, compare execution times (Note: On small datasets, differences may not be noticeable)

## ✅ Success Checklist
- Connected to the database successfully
- Wrote and executed the initial query
- Analyzed the query using the simulated AI tool
- Implemented suggested optimizations
- Compared results between original and optimized queries

## 🔍 Common Issues & Solutions 

- Problem: Query returns no results
  - Solution: Check table names and join conditions
  
- Problem: Error in SQL syntax   
  - Carefully review the query for typos or missing clauses

## ➡️ Summary

Congratulations on completing the AI-Assisted Query Optimization lab – you've now gained valuable experience in harnessing AI tools to enhance SQL query efficiency, a cutting-edge skill that will significantly boost your ability to create high-performance queries and streamline data analysis processes at BookCycle and in your future data science endeavors.

### 🔑 Key Points
- AI tools can provide valuable insights for query optimization
- Common optimizations include query restructuring
- Always validate AI suggestions against your specific use case