# Module 03: JOINs & Relationships - Combining Data from Multiple Tables

**Estimated Time:** 75 minutes

## Learning Objectives

By the end of this module, you will be able to:
- Understand table relationships (one-to-many, many-to-many)
- Use INNER JOIN to combine related data
- Use LEFT JOIN to include all rows from the left table
- Perform self-joins for hierarchical data
- Join multiple tables in a single query
- Understand when to use each type of JOIN

In [None]:
# Setup
import sqlite3
import pandas as pd
from pathlib import Path

%load_ext sql

# Connect to database
DB_PATH = Path.cwd().parent / "data" / "databases" / "ecommerce.db"
conn = sqlite3.connect(DB_PATH)
%sql sqlite:///$DB_PATH

print("✓ Connected to ecommerce.db")

## 1. Understanding Table Relationships

Our e-commerce database has the following relationships:

```
categories (1) ----< (many) products
customers (1) ----< (many) orders
orders (1) ----< (many) order_items >---- (many) products
```

### Types of Relationships:
- **One-to-Many (1:N)**: One category has many products
- **Many-to-Many (M:N)**: Products and orders (through order_items)
- **One-to-One (1:1)**: Less common, each row in one table matches exactly one row in another

In [None]:
# Review our tables
%%sql
SELECT name FROM sqlite_master WHERE type='table' ORDER BY name

## 2. INNER JOIN: Matching Rows Only

INNER JOIN returns only rows that have matching values in both tables.

### Syntax
```sql
SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column;
```

In [None]:
# Products with their category names
%%sql
SELECT 
    p.product_name,
    c.category_name,
    p.price
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
ORDER BY c.category_name, p.product_name
LIMIT 15

In [None]:
# Orders with customer information
%%sql
SELECT 
    o.order_id,
    o.order_date,
    c.first_name,
    c.last_name,
    c.email,
    o.total_amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC
LIMIT 10

In [None]:
# INNER JOIN with WHERE clause
%%sql
SELECT 
    p.product_name,
    c.category_name,
    p.price,
    p.stock_quantity
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.price > 100
ORDER BY p.price DESC
LIMIT 10

In [None]:
# Using table aliases for cleaner queries
%%sql
SELECT 
    p.product_name AS "Product",
    c.category_name AS "Category",
    p.price AS "Price ($)",
    p.stock_quantity AS "In Stock"
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE c.category_name = 'Electronics'
ORDER BY p.price DESC

## 3. LEFT JOIN: Include All Left Table Rows

LEFT JOIN returns all rows from the left table, and matching rows from the right table. If no match exists, NULL values are returned for right table columns.

**Use Case:** Find all customers, even those who haven't placed orders.

In [None]:
# All customers with their order count (including those with 0 orders)
%%sql
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY order_count DESC
LIMIT 15

In [None]:
# Find customers who have never placed an order
%%sql
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL
LIMIT 10

In [None]:
# Categories with product counts (including empty categories)
%%sql
SELECT 
    c.category_name,
    COUNT(p.product_id) AS product_count
FROM categories c
LEFT JOIN products p ON c.category_id = p.category_id
GROUP BY c.category_id, c.category_name
ORDER BY product_count DESC

## 4. Multi-Table JOINs

You can join multiple tables in a single query by chaining JOIN clauses.

In [None]:
# Order details with customer and product information
%%sql
SELECT 
    o.order_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    o.order_date,
    p.product_name,
    oi.quantity,
    oi.unit_price,
    oi.quantity * oi.unit_price AS item_total
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
ORDER BY o.order_date DESC
LIMIT 20

In [None]:
# Products sold with category and order information
%%sql
SELECT 
    cat.category_name,
    p.product_name,
    COUNT(DISTINCT o.order_id) AS times_ordered,
    SUM(oi.quantity) AS total_quantity_sold
FROM products p
INNER JOIN categories cat ON p.category_id = cat.category_id
INNER JOIN order_items oi ON p.product_id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.order_id
GROUP BY cat.category_name, p.product_name
ORDER BY total_quantity_sold DESC
LIMIT 15

In [None]:
# Customer order summary with product categories
%%sql
SELECT 
    c.first_name || ' ' || c.last_name AS customer,
    cat.category_name,
    COUNT(DISTINCT o.order_id) AS orders,
    SUM(oi.quantity) AS items_purchased,
    SUM(oi.quantity * oi.unit_price) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
INNER JOIN categories cat ON p.category_id = cat.category_id
GROUP BY c.customer_id, customer, cat.category_name
ORDER BY total_spent DESC
LIMIT 20

## 5. Self-Joins: Joining a Table to Itself

Self-joins are useful for hierarchical data like employee-manager relationships.

Let's use the employees database for this example.

In [None]:
# Connect to employees database
EMP_DB_PATH = Path.cwd().parent / "data" / "databases" / "employees.db"
emp_conn = sqlite3.connect(EMP_DB_PATH)
%sql sqlite:///$EMP_DB_PATH

print("✓ Connected to employees.db")

In [None]:
# View employees table structure
%%sql
SELECT * FROM employees LIMIT 5

In [None]:
# Self-join: Employees with their managers
%%sql
SELECT 
    e.first_name || ' ' || e.last_name AS employee,
    e.job_title,
    m.first_name || ' ' || m.last_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id
ORDER BY e.employee_id
LIMIT 15

In [None]:
# Find all employees who report to a specific manager
%%sql
SELECT 
    m.first_name || ' ' || m.last_name AS manager,
    e.first_name || ' ' || e.last_name AS employee,
    e.job_title
FROM employees m
INNER JOIN employees e ON m.employee_id = e.manager_id
WHERE m.employee_id = 1
ORDER BY e.last_name

## 6. Combining Different JOIN Types

You can mix INNER JOIN and LEFT JOIN in the same query.

In [None]:
# Switch back to ecommerce database
%sql sqlite:///$DB_PATH

In [None]:
# All products with category info and optional order information
%%sql
SELECT 
    p.product_name,
    c.category_name,
    p.price,
    COUNT(oi.order_item_id) AS times_ordered
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name, c.category_name, p.price
ORDER BY times_ordered DESC
LIMIT 15

## 7. Real-World Examples

Practical business queries using JOINs.

In [None]:
# Example 1: Customer Purchase History
%%sql
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.email,
    COUNT(DISTINCT o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, customer_name, c.email
ORDER BY total_spent DESC
LIMIT 10

In [None]:
# Example 2: Product Sales Performance
%%sql
SELECT 
    p.product_name,
    c.category_name,
    COUNT(oi.order_item_id) AS times_sold,
    SUM(oi.quantity) AS total_quantity,
    SUM(oi.quantity * oi.unit_price) AS total_revenue,
    p.stock_quantity AS current_stock
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name, c.category_name, p.stock_quantity
ORDER BY total_revenue DESC
LIMIT 15

In [None]:
# Example 3: Recent Orders with Full Details
%%sql
SELECT 
    o.order_id,
    o.order_date,
    c.first_name || ' ' || c.last_name AS customer,
    c.city,
    o.status,
    COUNT(oi.order_item_id) AS items_count,
    o.total_amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.order_id, o.order_date, customer, c.city, o.status, o.total_amount
ORDER BY o.order_date DESC
LIMIT 10

In [None]:
# Example 4: Category Performance Report
%%sql
SELECT 
    c.category_name,
    COUNT(DISTINCT p.product_id) AS product_count,
    COUNT(DISTINCT o.order_id) AS orders_count,
    SUM(oi.quantity) AS total_items_sold,
    SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM categories c
LEFT JOIN products p ON c.category_id = p.category_id
LEFT JOIN order_items oi ON p.product_id = oi.product_id
LEFT JOIN orders o ON oi.order_id = o.order_id
GROUP BY c.category_id, c.category_name
ORDER BY total_revenue DESC

## 8. Exercises

Practice what you've learned with these exercises.

### Exercise 1: High-Value Customer Orders
Find all orders over $200 with customer name, email, order date, and total amount. Sort by total amount descending.

In [None]:
# Your code here
%%sql

### Exercise 2: Products Never Ordered
Find all products that have never been ordered. Include product name, category name, and price.

In [None]:
# Your code here
%%sql

### Exercise 3: Order Details Report
Create a detailed order report showing order_id, customer name, product name, quantity, unit price, and line total (quantity * unit_price) for order_id = 1.

In [None]:
# Your code here
%%sql

### Exercise 4: Customer Category Preferences
For each customer who has placed orders, show which category they've spent the most money in. Include customer name, category name, and total spent in that category.

In [None]:
# Your code here
%%sql

### Exercise 5: Top Products by Category
For each category, find the top 3 best-selling products by quantity sold. Include category name, product name, and total quantity sold.

In [None]:
# Your code here
%%sql

## Summary

In this module, you learned:
- ✓ Understanding table relationships (one-to-many, many-to-many)
- ✓ Using INNER JOIN to combine matching rows
- ✓ Using LEFT JOIN to include all left table rows
- ✓ Performing multi-table JOINs
- ✓ Using self-joins for hierarchical data
- ✓ Combining different JOIN types in one query

**Key Takeaways:**
- INNER JOIN: Only matching rows from both tables
- LEFT JOIN: All rows from left table + matches from right
- Use table aliases for clarity (e.g., customers c)
- JOIN order matters for readability and performance
- Self-joins use the same table twice with different aliases

**Next:** Module 04 - Aggregation & Grouping

In [None]:
# Cleanup
conn.close()
emp_conn.close()
print("✓ Database connections closed")