# Ungraded Lab:  CTEs in Practice

## 📋 Overview 
Welcome to the "CTEs in Practice" lab! In this hands-on session, you'll work with the BookCycle dataset to explore the power of Common Table Expressions (CTEs) and Window Functions in SQL. By the end of this lab, you'll be able to restructure complex queries for better clarity and performance, and apply advanced analysis techniques using window functions.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:
- Simplify complex queries using Common Table Expressions (CTEs)
- Apply window functions like ROW_NUMBER for advanced data analysis
- Evaluate how CTEs and window functions improve query clarity and performance

## 📚 Dataset Information
We'll be working with the BookCycle dataset, which contains information about book sales and inventory across multiple store locations. The dataset includes two main tables:
1. <b>books:</b> Contains information about the books in inventory, including <b>book_id, title, author, genre, and pricing details.</b>
2. <b>transactions:</b> Records book sales, including <b>transaction_id, date_time, store_location, and sale_price.</b>

## 🖥️ Activities

### Activity 1: Setting Up and Basic CTE 

As a data analyst at BookCycle, you need to analyze sales patterns across different store locations. Let's start by setting up our environment and creating a basic CTE to simplify our query structure.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

# Connect to the database
conn = sqlite3.connect('bookcycle.db')

✅ Database setup complete: Tables created and populated with data!


<b>Step 2:</b> Create a basic CTE to calculate total sales for each store location:

In [2]:
query = """
WITH store_sales AS (
    SELECT store_location, SUM(sale_price) AS total_sales
    FROM transactions
    GROUP BY store_location
)
SELECT *
FROM store_sales
ORDER BY total_sales DESC;
"""
df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,store_location,total_sales
0,University,586.5
1,Suburban,286.74
2,Downtown,241.76


 <b>💡 Tip:</b> CTEs are defined using the WITH clause and can make complex queries more readable by breaking them into logical parts.

### Activity 2 : Using Window Functions  

Now that we've seen a basic CTE, let's combine it with window functions to perform more advanced analysis.

<b>Step 1:</b> Use ROW_NUMBER() to rank books by sales within each genre:

In [3]:
query = """
WITH book_sales AS (
    SELECT b.book_id, b.title, b.genre, COUNT(*) AS sales_count
    FROM books b
    JOIN transactions t ON b.book_id = t.book_id
    GROUP BY b.book_id
)
SELECT 
    book_id,
    title,
    genre,
    sales_count,
    ROW_NUMBER() OVER (PARTITION BY genre ORDER BY sales_count DESC) AS rank_in_genre
FROM book_sales
ORDER BY genre, rank_in_genre;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,book_id,title,genre,sales_count,rank_in_genre
0,B1027,Hamlet,Classic Drama,4,1
1,B1038,Macbeth,Classic Drama,2,2
2,B1052,Othello,Classic Drama,2,3
3,B1071,The Catcher in the Rye,Classic Fiction,5,1
4,B1087,The Scarlet Letter,Classic Fiction,5,2
5,B1095,The Wind in the Willows,Classic Fiction,5,3
6,B1001,A Christmas Carol,Classic Fiction,4,4
7,B1003,A Tale of Two Cities,Classic Fiction,4,5
8,B1032,Jane Eyre,Classic Fiction,4,6
9,B1047,Nineteen Eighty-Four,Classic Fiction,4,7


 <b>💡 Tip:</b> Window functions like ROW_NUMBER() allow you to perform calculations across a set of rows that are related to the current row.

### Activity 3 : Combining CTEs and Window Functions 

Let's combine what we've learned to create a more complex analysis that identifies top-selling books in each genre, along with their percentage of total genre sales.

<b>Step 1:</b> Write a query that uses multiple CTEs and window functions:

In [4]:
query = """
WITH book_sales AS (
    SELECT 
        b.book_id, 
        b.title, 
        b.genre, 
        COUNT(*) AS sales_count
    FROM books b
    JOIN transactions t ON b.book_id = t.book_id
    GROUP BY b.book_id
),
genre_totals AS (
    SELECT genre, SUM(sales_count) AS total_genre_sales
    FROM book_sales
    GROUP BY genre
),
ranked_books AS (
    SELECT 
        bs.genre,
        bs.title,
        bs.sales_count,
        ROW_NUMBER() OVER (PARTITION BY bs.genre ORDER BY bs.sales_count DESC) AS rank_in_genre,
        ROUND(bs.sales_count * 100.0 / gt.total_genre_sales, 2) AS percent_of_genre_sales
    FROM book_sales bs
    JOIN genre_totals gt ON bs.genre = gt.genre
)
SELECT * 
FROM ranked_books
WHERE rank_in_genre <= 3
ORDER BY genre, rank_in_genre;

"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,genre,title,sales_count,rank_in_genre,percent_of_genre_sales
0,Classic Drama,Hamlet,4,1,50.0
1,Classic Drama,Macbeth,2,2,25.0
2,Classic Drama,Othello,2,3,25.0
3,Classic Fiction,The Catcher in the Rye,5,1,6.85
4,Classic Fiction,The Scarlet Letter,5,2,6.85
5,Classic Fiction,The Wind in the Willows,5,3,6.85
6,Classic Mystery,The Adventures of Sherlock Holmes,4,1,57.14
7,Classic Mystery,The Hound of the Baskervilles,3,2,42.86
8,Classic Non-Fiction,Walden,2,1,100.0
9,Classic Philosophy,The Prince,2,1,66.67


 <b>💡 Tip:</b> By combining CTEs and window functions, you can create powerful, readable queries that provide deep insights into your data.

### Close the Connection
It's good practice to close the database connection when you're done

In [5]:
# Close the database connection 
conn.close()

## ✅ Success Checklist
- You've successfully created and used a basic CTE
- You've applied window functions like ROW_NUMBER() in your queries
- You've combined CTEs and window functions for advanced analysis
- Your queries run without errors and produce meaningful results

## 🔍 Common Issues & Solutions 

- Problem: Syntax error in CTE definition
  - Solution: Ensure each CTE is properly closed with a comma, except for the last one before the main query

- Problem: Incorrect column names in window function  
  - Solution: Double-check that column names in the OVER clause match your CTE or table definition

## ➡️ Summary

Congratulations on completing the "CTEs in Practice" lab – you've now mastered the art of using Common Table Expressions and Window Functions, enhancing your ability to write clear, efficient, and powerful SQL queries that will significantly improve your data analysis capabilities at BookCycle and beyond.


### 🔑 Key Points
- CTEs simplify complex queries by breaking them into logical parts
- Window functions allow for advanced calculations across related rows
- Combining CTEs and window functions enables powerful, readable data analysis
