# Ungraded Lab: Join Practice Lab

## 📋 Overview 
Welcome to BookCycle's data analysis team! In this lab, you'll explore advanced SQL techniques by learning how to use JOIN operations to combine data from multiple tables. You'll help BookCycle's management understand the relationships between customers, transactions, and books, providing valuable insights for business decisions.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- Implement INNER JOIN to combine data from two related tables
- Use LEFT JOIN to include all records from one table and matching records from another
- Write multi-table joins to answer complex business questions
- Apply joins to real-world scenarios in a book retail context

## 📚 Dataset Information
You'll be working with three main tables in the BookCycle database:
1. <b>customers:</b> Contains customer information including IDs, join dates, and preferences
2. <b>transactions:</b> Records of book purchases, including transaction details and customer IDs
3. <b>books:</b> Inventory information about the books, including titles, authors, and prices


## 🖥️ Activities

### Activity 1: Understanding INNER JOIN 

BookCycle wants to analyze customer purchases by combining customer and transaction data.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

✅ Database setup complete: Tables created and populated with data!


In [2]:
# Connect to the SQLite database
conn = sqlite3.connect('bookcycle.db')

<b>Step 2:</b> Let's start with a simple INNER JOIN to get customer names along with their transaction details:

In [3]:
query = """
SELECT c.customer_id, c.join_date, t.transaction_id, t.date_time, t.sale_price
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,customer_id,join_date,transaction_id,date_time,sale_price
0,C1012,2022-01-19,T1004,2023-01-15 11:30:15,13.99
1,C1012,2022-01-19,T1011,2023-01-16 11:30:22,10.99
2,C1012,2022-01-19,T1016,2023-01-17 09:30:22,12.99
3,C1012,2022-01-19,T1024,2023-01-18 10:35:12,10.99
4,C1012,2022-01-19,T1030,2023-01-19 09:15:48,8.99


<b>Step 3: Try it yourself:</b> Write a query to get the customer's preferred store along with their transaction details:

In [None]:
query = """
<YOUR CODE HERE>
"""

df = pd.read_sql_query(query, conn)
display(df)

 <b>💡 Tip:</b> Remember to include the new column in your SELECT statement and keep the join condition the same.

### Activity 2: Exploring LEFT JOIN

BookCycle wants to identify customers who haven't made any purchases yet.

<b>Step 1:</b> Here's an example of a LEFT JOIN to get all customers and their transactions (if any):

In [4]:
query = """
SELECT c.customer_id, c.join_date, t.transaction_id
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id
LIMIT 10;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,customer_id,join_date,transaction_id
0,C1001,2022-01-15,
1,C1002,2022-01-15,
2,C1003,2022-01-16,
3,C1004,2022-01-16,
4,C1005,2022-01-16,
5,C1006,2022-01-17,
6,C1007,2022-01-17,
7,C1008,2022-01-17,
8,C1009,2022-01-18,
9,C1010,2022-01-18,


<b>Step 2: Try it yourself:</b> Write a query to find customers who haven't made any purchases:

In [None]:
query = """
<YOUR CODE HERE>
"""

df = pd.read_sql_query(query, conn)
display(df)

 <b>💡 Tip:</b> Use a WHERE clause to filter for NULL transaction_id values.

### Activity 3: Multi-table Joins 

BookCycle wants to analyze which books are popular in different store locations.

<b>Step 1:</b> Here's an example of joining three tables to get customer, transaction, and book information:

In [5]:
query = """
SELECT t.store_location, b.title, COUNT(*) as purchase_count
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
JOIN books b ON t.book_id = b.book_id
GROUP BY t.store_location, b.title
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,store_location,title,purchase_count
0,Downtown,A Christmas Carol,4
1,Downtown,A Tale of Two Cities,4
2,Downtown,Jane Eyre,4
3,Downtown,Persuasion,2
4,Downtown,Sense and Sensibility,3


<b>Step 2:</b> Write a query to find the most popular book (by purchase count) for each store location:

In [6]:
query = """
SELECT rb.store_location, rb.title, rb.purchase_count
FROM (
    SELECT 
        t.store_location, 
        b.title, 
        COUNT(*) AS purchase_count
    FROM customers c
    JOIN transactions t ON c.customer_id = t.customer_id
    JOIN books b ON t.book_id = b.book_id
    GROUP BY t.store_location, b.title
) AS rb
WHERE rb.purchase_count = (
    -- Get the max purchase count for each store_location
    SELECT MAX(sub.purchase_count)
    FROM (
        SELECT 
            t2.store_location, 
            b2.title, 
            COUNT(*) AS purchase_count
        FROM customers c2
        JOIN transactions t2 ON c2.customer_id = t2.customer_id
        JOIN books b2 ON t2.book_id = b2.book_id
        GROUP BY t2.store_location, b2.title
    ) AS sub
    WHERE sub.store_location = rb.store_location
)
ORDER BY rb.store_location, rb.title;

"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,store_location,title,purchase_count
0,Downtown,A Christmas Carol,4
1,Downtown,A Tale of Two Cities,4
2,Downtown,Jane Eyre,4
3,Suburban,The Wind in the Willows,5
4,University,The Catcher in the Rye,5
5,University,The Scarlet Letter,5


<b>💡 Tip:</b> Here you are using GROUP BY, ORDER BY, and a subquery, which you will learn more about in upcoming modules. 

#### Close the Connection
It's good practice to close the database connection when you're done

In [7]:
# Close the database connection 
conn.close()

## ✅ Success Checklist
- Successfully implemented INNER JOIN to combine customer and transaction data
- Used LEFT JOIN to identify customers without purchases
- Created a multi-table join to analyze book popularity by store location
- All queries run without errors and produce meaningful results

## 🔍 Common Issues & Solutions 

- Problem: Join returning unexpected number of rows
    - Solution: Double-check your join conditions and ensure you're not creating unintended Cartesian products

- Problem: Column ambiguity errors 
    - Solution: Always qualify column names with table aliases when joining tables

## ➡️ Summary
Great job completing this lab on SQL joins! You've gained valuable skills in combining data from multiple tables, which is crucial for comprehensive data analysis in real-world scenarios.

### 🔑 Key Points
- INNER JOIN combines rows from two tables based on a matching condition
- LEFT JOIN returns all rows from the left table and matching rows from the right table
- Multi-table joins allow for complex analyses across various data entities