# Ungraded Lab:  CTE Refactoring Challenge

## 📋 Overview 
In this lab, you'll explore the realm of Common Table Expressions (CTEs) and refactoring complex queries. As a data analyst at BookCycle, you'll work on improving the readability and performance of a multi-step query. This lab will help you understand the power of CTEs in simplifying complex SQL queries and preparing you for real-world data analysis challenges.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:
- Identify complex, multi-step queries that can benefit from refactoring
- Apply CTEs to simplify and improve query readability
- Compare your refactored query with AI-generated suggestions
- Evaluate query performance and readability improvements

## 📚 Dataset Information
We'll be working with two datasets from BookCycle:

1. <b>books.csv:</b> Contains information about the books in BookCycle's inventory, including book_id, title, author, genre, condition, price, and more.
2. <b>transactions.csv:</b> Contains transaction data, including transaction_id, date_time, store_location, customer_id, book_id, sale_price, and payment method.


## 🖥️ Activities

### Activity 1: Analyze the Complex Query  

As a data analyst at BookCycle, you've been asked to find the top-selling books in each genre for the Univerity store location. A colleague has written a query, but it's complex and hard to read.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

# Connect to the database
conn = sqlite3.connect('bookcycle.db')

✅ Database setup complete: Tables created and populated with data!


<b>Step 2:</b>  Examine the following complex query:

<b>Step 3:</b> Run the query and observe the results:

In [2]:
query = """
SELECT genre, title, total_sales
FROM (
    SELECT b.genre, b.title, SUM(t.sale_price) as total_sales,
           ROW_NUMBER() OVER (PARTITION BY b.genre ORDER BY SUM(t.sale_price) DESC) as rank
    FROM books b
    JOIN transactions t ON b.book_id = t.book_id
    WHERE t.store_location = 'University'
    GROUP BY b.genre, b.title
) ranked
WHERE rank = 1
ORDER BY total_sales DESC;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,genre,title,total_sales
0,Classic Fiction,The Catcher in the Rye,69.95
1,Classic Poetry,The Odyssey,51.96
2,Classic Drama,Hamlet,43.96
3,Classic Non-Fiction,Walden,21.98
4,Classic Philosophy,The Prince,17.98


 <b>💡 Tip:</b> Take a moment to understand what the query is doing. It's finding the top-selling book in each genre for the Univerity store.

### Activity 2: Refactor the Query Using CTEs  

Now that you've analyzed the complex query, it's time to refactor it using Common Table Expressions (CTEs) to improve readability and potentially boost performance.

<b>Step 1:</b> Create a new code cell in your notebook, then write the CTE structure and the main query:

In [None]:
WITH sales_by_book AS (
    -- Calculate total sales for each book in the University store
    <YOUR CODE HERE>
),
ranked_books AS (
    -- Rank books within each genre based on total sales
    <YOUR CODE HERE>
)
-- Select the top-selling book in each genre
<YOUR CODE HERE>

<b>Step 2:</b>  Run your refactored query and compare the results with the original query:

In [None]:
refactored_query = """
-- Your refactored query here
"""

df_refactored = pd.read_sql_query(refactored_query, conn)
display(df_refactored)

 <b>💡 Tip:</b> Make sure your refactored query produces the same results as the original query.

### Activity 3: Compare with AI-Generated Suggestions

Let's see how an AI model might approach this refactoring task.

<b>Steps:</b>
1. Use the following prompt with an AI assistant (e.g., ChatGPT): "Refactor this SQL query using Common Table Expressions (CTEs) for better readability and performance:" (Paste the original complex query here)
2. Compare the AI-generated solution with your refactored query. Consider:
    - How did the AI structure the CTEs?
    - Are there any differences in approach?
    - Which version do you find more readable?
3. Discuss your findings in a markdown cell in your notebook.

In [None]:
# Add your finding in this cell. This is a code cell, please convert this into a markdown cell before adding 
# your findings


#### ⚙️ Test Your Work:
- Run both the original and refactored queries
- Compare execution times (you can use Python's time module)
- Verify that both queries return the same results

### Close the Connection
It's good practice to close the database connection when you're done

In [None]:
# Close the database connection 
conn.close()

## ✅ Success Checklist
- The refactored query uses CTEs
- The refactored query produces the same results as the original
- The refactored query is more readable
- You've compared your solution with an AI-generated one


## 🔍 Common Issues & Solutions 

- Problem: Results don't match the original query
  - Solution:  Double-check your JOIN conditions and WHERE clauses in the CTEs
  
- Problem: Syntax errors in the CTE 
  - Solution: Ensure each CTE is properly named and separated by commas

## ➡️ Summary

Congratulations on completing this CTE refactoring challenge! You've taken a significant step in writing more efficient and readable SQL queries.

### 🔑 Key Points
- CTEs can greatly improve the readability of complex queries
- Breaking down a complex query into CTEs can make it easier to understand and maintain
- Comparing your solution with AI-generated code can provide new perspectives on problem-solving