# Ungraded Lab: Subqueries Lab

## 📋 Overview 
Welcome to the Subqueries Lab! This hands-on exercise focuses on implementing subqueries with the Bookcycle dataset. As a data analyst at Bookcycle, you'll use subqueries to extract valuable insights from their multi-table database, answering complex business questions that can't be solved with simple queries alone.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:
- Write subqueries to filter and aggregate data
- Use nested queries to solve complex problems
- Apply subqueries in WHERE, FROM, and SELECT clauses


## 📚 Dataset Information
We'll be working with two main tables from the Bookcycle dataset:
1. books: Contains information about the books in inventory
2. transactions: Records of book sales

## 🖥️ Activities

### Activity 1: Setting Up and Basic Subquery

Let's start by setting up our environment and writing a basic subquery to find books that are priced above the average.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

# Connect to the database
conn = sqlite3.connect('bookcycle.db')

✅ Database setup complete: Tables created and populated with data!


<b>Step 2:</b> Write a query, that will be used as a subquery in the next step, to find the average book price:

In [2]:
query = """
SELECT AVG(list_price) as avg_price
FROM books;
"""
df = pd.read_sql_query(query, conn)
print("Average book price:", df['avg_price'][0])

Average book price: 11.010000000000007


<b>Step 3:</b> Use the above query, as a subquery, in the WHERE clause to find books priced above average:

In [3]:
query = """
SELECT title, author, list_price
FROM books
WHERE list_price > (SELECT AVG(list_price) FROM books)
LIMIT 5;
"""
df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,title,author,list_price
0,A Farewell to Arms,Ernest Hemingway,11.99
1,Anna Karenina,Leo Tolstoy,11.99
2,Anne of Green Gables,L.M. Montgomery,12.99
3,Brave New World,Aldous Huxley,12.99
4,Crime and Punishment,Fyodor Dostoevsky,13.99


 <b>💡 Tip:</b> Subqueries in the WHERE clause are often used for comparisons against aggregated values.

### Activity 2: Subquery in the FROM Clause 

Let's use a subquery in the FROM clause to analyze high-value transactions.

<b>Step 1:</b> Write a query, that will be used as a subquery, to calculate average sale price:

In [4]:
query = """
SELECT AVG(sale_price) as avg_sale_price
FROM transactions;
"""
df = pd.read_sql_query(query, conn)
print("Average sale price:", df['avg_sale_price'][0])

Average sale price: 11.150000000000007


<b>Step 2:</b> Use the above query as a subquery in the FROM clause to find high-value transactions:

In [5]:
query = """
SELECT t.transaction_id, t.book_id, t.sale_price, t.store_location
FROM transactions t
JOIN (SELECT AVG(sale_price) as avg_sale_price FROM transactions) avg
ON t.sale_price > avg.avg_sale_price
LIMIT 5;
"""
df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,transaction_id,book_id,sale_price,store_location
0,T1001,B1055,11.99,University
1,T1003,B1036,11.99,Suburban
2,T1004,B1071,13.99,University
3,T1005,B1075,12.99,University
4,T1007,B1047,12.99,University


 <b>💡 Tip:</b> Subqueries in the FROM clause are useful for creating derived tables that you can then join or query further.

### Activity 3: Correlated Subquery

Let's use a correlated subquery to find books that have sold more than the average number of copies for their genre.

<b>Step 1:</b> First, let's see the average number of copies sold per genre:

In [None]:
query = """
SELECT sales.genre, AVG(sales.count_per_book) AS avg_quantity
FROM (
    SELECT b.book_id, b.genre, COUNT(t.transaction_id) AS count_per_book
    FROM books b
    JOIN transactions t ON b.book_id = t.book_id
    GROUP BY b.book_id, b.genre
) sales
GROUP BY sales.genre
LIMIT 5;

"""
df = pd.read_sql_query(query, conn)
display(df)

<b>Step 2:</b> Now, use a correlated subquery to find books selling above their genre average:

In [None]:
query = """
SELECT book_sales.title, book_sales.genre, book_sales.total_sales
FROM (
    SELECT b.book_id, b.title, b.genre, COUNT(t.transaction_id) AS total_sales
    FROM books b
    JOIN transactions t ON b.book_id = t.book_id
    GROUP BY b.book_id, b.title, b.genre
) book_sales
JOIN (
    SELECT sales.genre, AVG(sales.count_per_book) AS avg_genre_sales
    FROM (
        SELECT b.book_id, b.genre, COUNT(t.transaction_id) AS count_per_book
        FROM books b
        JOIN transactions t ON b.book_id = t.book_id
        GROUP BY b.book_id, b.genre
    ) sales
    GROUP BY sales.genre
) genre_avg ON book_sales.genre = genre_avg.genre
WHERE book_sales.total_sales > genre_avg.avg_genre_sales
LIMIT 5;

"""
df = pd.read_sql_query(query, conn)
display(df)

 <b>💡 Tip:</b> Correlated subqueries are powerful for row-by-row comparisons against aggregated subsets of data.

### Close the Connection
It's good practice to close the database connection when you're done

In [None]:
# Close the database connection 
conn.close()

<b>⚙️ Test Your Work:</b> For each query:

1. Run the code and ensure it executes without errors
2. Check that the output makes sense in the context of BookCycle's business
3. Try modifying the LIMIT clause to see more results

## ✅ Success Checklist
- You can write a subquery in the WHERE clause
- You can use a subquery in the FROM clause
- You understand and can write a correlated subquery
- Your queries run without errors and produce meaningful results

## 🔍 Common Issues & Solutions 

- Problem: Syntax errors in subqueries 
  - Solution: Ensure each subquery is properly enclosed in parentheses

- Problem: Unexpected results from correlated subqueries  
  - Solution: Double-check that your correlated subquery references the outer query correctly

## ➡️ Summary

Congratulations on mastering subqueries in this lab – you've now expanded your SQL toolkit with powerful nested query techniques, enabling you to tackle complex data analysis tasks and extract deeper insights from the Bookcycle database, skills that will prove invaluable in your role as a data analyst.

### 🔑 Key Points
- Subqueries can be used in WHERE, FROM, and SELECT clauses
- Subqueries in WHERE are often used for filtering against aggregated values
- Subqueries in FROM create derived tables for further querying
- Correlated subqueries perform row-by-row operations and are powerful for complex comparisons