# Ungraded Lab: Optimized SQL Join Generation Lab

## 📋 Overview 
In this lab, you'll venture into the world of SQL joins and explore how AI can assist in query optimization. As a data analyst at BookCycle, you'll use join queries to combine customer and transaction data, gaining insights into customer behavior. You'll also learn how to leverage AI to refine and optimize your queries.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:
- Write SQL join queries to combine data from multiple tables
- Generate an optimized SQL join query using AI assistance
- Compare and refine AI-suggested queries for improved performance

## 📚 Dataset Information
We'll be working with two tables from the BookCycle database:
1. customers: Contains customer information including IDs, join dates, and preferences
2. transactions: Contains transaction details including customer IDs, book IDs, and sale prices

## 🖥️ Activities

### Activity 1: Exploring the Data 

Before we start joining tables, let's examine our data to understand what we're working with.

<b>Step 1:</b> Connect to the database and view the structure of both tables:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

# Connect to the SQLite database
conn = sqlite3.connect('bookcycle.db')

✅ Database setup complete: Tables created and populated with data!


In [2]:
# View the structure of the customers table
query = """
PRAGMA table_info(customers);
"""
df = pd.read_sql_query(query, conn)
print("Customers Table Structure:")
display(df)

# View the structure of the transactions table
query = """
PRAGMA table_info(transactions);
"""
df = pd.read_sql_query(query, conn)
print("\nTransactions Table Structure:")
display(df)

Customers Table Structure:


Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,customer_id,TEXT,0,,0
1,1,join_date,TEXT,0,,0
2,2,is_member,INTEGER,0,,0
3,3,zip_code,INTEGER,0,,0
4,4,birth_year,INTEGER,0,,0
5,5,preferred_store,TEXT,0,,0



Transactions Table Structure:


Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,transaction_id,TEXT,0,,0
1,1,date_time,TEXT,0,,0
2,2,store_location,TEXT,0,,0
3,3,customer_id,TEXT,0,,0
4,4,book_id,TEXT,0,,0
5,5,sale_price,REAL,0,,0
6,6,payment_method,TEXT,0,,0
7,7,is_online,INTEGER,0,,0


<b>Step 2:</b> Let's view a sample of data from each table:

In [3]:
# Sample data from customers table
query = """
SELECT *
FROM customers
LIMIT 5;
"""
df = pd.read_sql_query(query, conn)
print("Sample Customers Data:")
display(df)

# Sample data from transactions table
query = """
SELECT *
FROM transactions
LIMIT 5;
"""
df = pd.read_sql_query(query, conn)
print("\nSample Transactions Data:")
display(df)

Sample Customers Data:


Unnamed: 0,customer_id,join_date,is_member,zip_code,birth_year,preferred_store
0,C1001,2022-01-15,1,98105,1995,University
1,C1002,2022-01-15,0,98115,1988,Suburban
2,C1003,2022-01-16,1,98101,1992,Downtown
3,C1004,2022-01-16,1,98105,1999,University
4,C1005,2022-01-16,0,98115,1975,Suburban



Sample Transactions Data:


Unnamed: 0,transaction_id,date_time,store_location,customer_id,book_id,sale_price,payment_method,is_online
0,T1001,2023-01-15 09:23:45,University,C1045,B1055,11.99,credit,0
1,T1002,2023-01-15 10:15:22,Downtown,C1023,B1032,10.99,debit,0
2,T1003,2023-01-15 10:45:33,Suburban,C1078,B1036,11.99,cash,0
3,T1004,2023-01-15 11:30:15,University,C1012,B1071,13.99,credit,1
4,T1005,2023-01-15 13:45:22,University,C1034,B1075,12.99,debit,0


 <b>💡 Tip:</b> Pay attention to the common columns between the two tables. We'll use these for joining.

### Activity 2:  Basic Join Query 

Now that we understand our data, let's write a basic join query to combine customer and transaction information.

<b>Step 1:</b> Write a query to join the customers and transactions tables:

In [4]:
query = """
SELECT c.customer_id, c.join_date, t.transaction_id, t.sale_price
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
LIMIT 10;
"""
df = pd.read_sql_query(query, conn)
print("Joined Customer and Transaction Data:")
display(df)

Joined Customer and Transaction Data:


Unnamed: 0,customer_id,join_date,transaction_id,sale_price
0,C1012,2022-01-19,T1004,13.99
1,C1012,2022-01-19,T1011,10.99
2,C1012,2022-01-19,T1016,12.99
3,C1012,2022-01-19,T1024,10.99
4,C1012,2022-01-19,T1030,8.99
5,C1012,2022-01-19,T1037,12.99
6,C1012,2022-01-19,T1045,10.99
7,C1012,2022-01-19,T1052,7.99
8,C1012,2022-01-19,T1060,11.99
9,C1012,2022-01-19,T1068,8.99


<b>Step 2:</b> Modify the query to include more relevant columns and add a WHERE clause:

In [5]:
query = """
SELECT c.customer_id, c.join_date, c.preferred_store, t.transaction_id, t.sale_price, t.store_location
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
WHERE c.is_member = 1
LIMIT 10;
"""
df = pd.read_sql_query(query, conn)
print("Joined Data for Members:")
display(df)

Joined Data for Members:


Unnamed: 0,customer_id,join_date,preferred_store,transaction_id,sale_price,store_location
0,C1012,2022-01-19,University,T1004,13.99,University
1,C1012,2022-01-19,University,T1011,10.99,University
2,C1012,2022-01-19,University,T1016,12.99,University
3,C1012,2022-01-19,University,T1024,10.99,University
4,C1012,2022-01-19,University,T1030,8.99,University
5,C1012,2022-01-19,University,T1037,12.99,University
6,C1012,2022-01-19,University,T1045,10.99,Downtown
7,C1012,2022-01-19,University,T1052,7.99,Downtown
8,C1012,2022-01-19,University,T1060,11.99,University
9,C1012,2022-01-19,University,T1068,8.99,University


 <b>💡 Tip:</b>  Experiment with different JOIN types (INNER, LEFT, RIGHT) to see how they affect the results.

### Activity 3: AI-Assisted Query Optimization 

Let's use AI to help us generate an optimized SQL join query for a more complex analysis.

<b>Step 1:</b> Prepare the prompt for the AI. We’ve provided a sample prompt you can use below. The prompt contains the column and data type information for two tables called “customers” and “transactions” in a SQLite database.


Given two tables in a SQLite database:

1. customers:
   - customer_id (string)
   - join_date (date)
   - is_member (integer, 0 or 1)
   - zip_code (integer)
   - birth_year (integer)
   - preferred_store (string)

2. transactions:
   - transaction_id (string)
   - date_time (datetime)
   - store_location (string)
   - customer_id (string)
   - book_id (string)
   - sale_price (float)
   - payment_method (string)
   - is_online (integer, 0 or 1)

Generate an optimized SQL join query to list customer IDs along with their total order amounts, 
focusing on in-store purchases by members who joined in 2022. Include their preferred store and 
actual purchase location. Order the results by total order amount descending.


<b>Step 2:</b> Use an AI assistant (like ChatGPT) to generate the query. For example, here is an AI response : 

In [6]:
ai_response = """
Here's an optimized SQL join query based on your requirements:

SELECT 
    c.customer_id,
    c.preferred_store,
    SUM(t.sale_price) AS total_order_amount,
    COUNT(DISTINCT t.transaction_id) AS transaction_count,
    GROUP_CONCAT(DISTINCT t.store_location) AS purchase_locations
FROM 
    customers c
JOIN 
    transactions t ON c.customer_id = t.customer_id
WHERE 
    c.is_member = 1
    AND strftime('%Y', c.join_date) = '2022'
    AND t.is_online = 0
GROUP BY 
    c.customer_id, c.preferred_store
ORDER BY 
    total_order_amount DESC
LIMIT 20;

This query:
1. Joins the customers and transactions tables.
2. Filters for members who joined in 2022 and made in-store purchases.
3. Calculates the total order amount and transaction count for each customer.
4. Includes the preferred store and a list of actual purchase locations.
5. Orders the results by total order amount descending.
6. Limits the output to 20 rows for performance.

The use of GROUP_CONCAT for purchase locations provides a compact way to see all locations where a customer made purchases.
"""

print("AI-Generated Query:")
print(ai_response)

AI-Generated Query:

Here's an optimized SQL join query based on your requirements:

SELECT 
    c.customer_id,
    c.preferred_store,
    SUM(t.sale_price) AS total_order_amount,
    COUNT(DISTINCT t.transaction_id) AS transaction_count,
    GROUP_CONCAT(DISTINCT t.store_location) AS purchase_locations
FROM 
    customers c
JOIN 
    transactions t ON c.customer_id = t.customer_id
WHERE 
    c.is_member = 1
    AND strftime('%Y', c.join_date) = '2022'
    AND t.is_online = 0
GROUP BY 
    c.customer_id, c.preferred_store
ORDER BY 
    total_order_amount DESC
LIMIT 20;

This query:
1. Joins the customers and transactions tables.
2. Filters for members who joined in 2022 and made in-store purchases.
3. Calculates the total order amount and transaction count for each customer.
4. Includes the preferred store and a list of actual purchase locations.
5. Orders the results by total order amount descending.
6. Limits the output to 20 rows for performance.

The use of GROUP_CONCAT for purchas

##### Try it yourself:  Using any AI assistant of your choice, generate the query and add it below 

In [None]:
Learner_AI_Response = """<YOUR RESPONSE HERE>"""

print("Your AI-Generated Query:")
print(Learner_AI_Response)

<b>Step 3:</b> Execute and analyze the AI-generated query:

In [None]:
# Extract the query from the AI response. You can replace the response used below with the response you generated.
if "SELECT" in Learner_AI_Response:
    ai_query = "SELECT" + Learner_AI_Response.split("SELECT", 1)[1]
else:
    ai_query = ""  # Default empty string if "SELECT" is not found

print(ai_query)

# Execute the AI-generated query
df = pd.read_sql_query(ai_query, conn)
print("Results of AI-Generated Query:")
display(df)

 <b>💡 Tip:</b> Always review and test AI-generated queries to ensure they meet your specific needs and perform well.

#### Close the Connection
It's good practice to close the database connection when you're done

In [8]:
# Close the database connection 
conn.close()

## ✅ Success Checklist
- Successfully connected to the database and viewed table structures
- Wrote and executed a basic join query
- Generated an optimized join query using AI assistance
- Executed and analyzed the results of the AI-generated query
- Program runs without errors

## 🔍 Common Issues & Solutions 

- Problem: Join query returns no results
    - Solution: Check that the ON condition uses the correct columns and that there are matching values in both tables

- Problem: Error when executing AI-generated query 
    - Solution: Carefully review the query for syntax errors or unsupported functions in SQLite

## ➡️ Summary
Congratulations on completing the Optimized SQL Join Generation lab – you've now mastered the art of combining data from multiple tables using SQL joins and explored the potential of AI in query optimization, equipping you with advanced skills to tackle complex data analysis tasks more efficiently in your future projects.

### 🔑 Key Points
- SQL JOIN operations allow us to combine data from multiple tables
- AI can assist in generating complex, optimized queries
- Always review and test AI-generated code for accuracy and performance