# Lab Module 5: Joining Tables

(Run the below cell first, to ensure connectivity)

In [None]:
%load_ext sql

%sql postgresql://admin:password@postgres:5432/postgres

## Challenge 1: The First Link
- **Context**: The Logistics team needs to verify which state every order is being shipped to. The orders table only has IDs, so we need to link it to customer data. 
- **Task**: Write a query to INNER JOIN the `orders` table with the `customers` table using the `customer_id` column. Select the `order_id` and the `customer_state`. Limit the results to 5 rows.

In [None]:
%%sql
-- Write your solution here


## Challenge 2: The Art of Aliasing
- **Context**: Writing out full table names is tedious. We need to standardize our code using aliases to make it more readable. 
- **Task**: Re-write the previous query to list `order_id` and `customer_city`. This time, give `orders` the alias `o` and `customers` the alias `c`. Use these aliases in your SELECT and ON clauses.

In [None]:
%%sql
-- Write your solution here


## Challenge 3: Joining Order Details
- **Context**: The Finance team wants to know the price of items sold in specific orders. This data lives in the order_items table. 
- **Task**: Perform an INNER JOIN between `orders` and `order_items`. Select the `order_purchase_timestamp` and the `price` of the item. Limit to 10 rows.

In [None]:
%%sql
-- Write your solution here


## Challenge 4: Product Category Lookup
- **Context**: We have product IDs in the order items, but we need the human-readable category names for a report. 
- **Task**: Join `order_items` with `products`. Return the `order_id`, `product_id`, and `product_category_name`. Filter the results to show only items where the `product_category_name` is 'pet_shop'.

In [None]:
%%sql
-- Write your solution here

## Challenge 5: Seller Locations
- **Context**: We are analyzing our supply chain. We need to know the city and state of the seller for every item sold. 
- **Task**: Join `order_items` and `sellers`. Select the `order_id`, `seller_id`, and the seller's `seller_city` and `seller_state`.

In [None]:
%%sql
-- Write your solution here

## Challenge 6: The Unsold Products (Left Join)
- **Context**: The inventory manager wants to identify products that have been registered but never sold. 
- **Task**: Use a LEFT JOIN to combine `products` (left) with `order_items` (right). Select the `product_category_name` and the `order_id`. Filter the results to only show rows where the `order_id` is `NULL`.

In [None]:
%%sql
-- Write your solution here

## Challenge 7: The "Right" Way (Right Join)
- **Context**: Let's practice the inverse. We want to list all order items and their associated product categories, ensuring we don't lose any order items even if the product lookup fails (though in this schema, referential integrity usually prevents that). 
- **Task**: Use a RIGHT JOIN between `products` (left) and `order_items` (right). Select the `product_category_name` and `price`. Limit to 10 rows.

In [None]:
%%sql
-- Write your solution here

## Challenge 8: Multi-Table Chain (3 Tables)
- **Context**: Management needs a full view of an order: Who bought it, and what did they buy? 
- **Task**: Join `customers` to `orders`, and then join `orders` to `order_items`. Select `customer_city`, `order_status`, and `price`.

In [None]:
%%sql
-- Write your solution here

## Challenge 9: Revenue by State (Aggregation + Join)
- **Context**: Which state generates the most revenue? We need to connect customer locations to item prices. 
- **Task**: Join `customers`, `orders`, and `order_items`. Group by `customer_state` and calculate the SUM of `price` as `total_revenue`. Order by `total_revenue` descending.

In [None]:
%%sql
-- Write your solution here

## Challenge 10: Late Shipping Analysis
- **Context**: We need to find orders where the actual delivery to the carrier happened after the shipping limit date. 
- **Task**: Join `orders` and `order_items`. Filter for rows where `order_delivered_carrier_date` is greater than `shipping_limit_date`. Select `order_id` and both dates.

In [None]:
%%sql
-- Write your solution here


## Challenge 11: Seller Freight Auditing
- **Context**: The logistics team wants to compare the average freight value charged by sellers in different states. 
- **Task**: Join `sellers` and `order_items`. Group the results by `seller_state`. Calculate the average `freight_value`. Sort alphabetically by `seller_state`.

In [None]:
%%sql
-- Write your solution here

## Challenge 12: High-Value Tech Products
- **Context**: We want to find specific products in the 'telephony' category that have sold for more than 500.00. 
- **Task**: Join `products` and `order_items`. Filter for `product_category_name` = 'telephony' AND `price` > 500. Select the `product_id` and `price`.

In [None]:
%%sql
-- Write your solution here

## Challenge 13: Local Transactions (Matching Cities)
- **Context**: We want to identify orders where the customer and the seller are located in the same city. 
- **Task**: Join `customers`, `orders`, `order_items`, and `sellers`. Filter for rows where `customer_city` equals `seller_city`. Select `order_id` and the shared city name.

In [None]:
%%sql
-- Write your solution here

## Challenge 14: Cross Join (Planning)
- **Context**: The marketing team wants a list of all possible combinations of product categories and customer states to plan a massive ad campaign matrix. 
- **Task**: Perform a CROSS JOIN between `products` and `customers`. Select distinct `product_category_name` and `customer_state`. Limit to 20 rows. (Note, this query will take some time to run. Why is that, do you think?)

In [None]:
%%sql
-- Write your solution here

## Challenge 15: Self Join (Seller Neighbors)
- **Context**: We want to find pairs of sellers who are located in the same city to organize local meetup events. 
- **Task**: Perform a self-join on the `sellers` table. Aliases are required (e.g., `s1` and `s2`). Match them where `seller_city` is the same, but `seller_id` is different (to avoid matching a seller with themselves). Select the `seller_city` and both seller IDs. Limit to 10.

In [None]:
%%sql
-- Write your solution here

## Challenge 16: Category Counts per Order
- **Context**: We want to know how many items are in each order, but we also want the category name attached. 
- **Task**: Join `order_items` and `products`. Group by `order_id` and `product_category_name`. Count the number of items. Order by the count descending.

In [None]:
%%sql
-- Write your solution here

## Challenge 17: Full Outer Join (Conceptual)
- **Context**: We want a list of all product categories and all seller cities, aligning them where possible, but keeping records even if there is no match (Conceptually, just to see the coverage). 
- **Task**: Perform a FULL OUTER JOIN between `products` and `sellers` on `product_category_name` = `seller_city` (Note: This is a nonsense join logically, but tests the syntax). Select `product_category_name` and `seller_city`. Limit to 10.

In [None]:
%%sql
-- Write your solution here

## Challenge 18: Heavy Items Analysis
- **Context**: Logistics needs to know which sellers are shipping the heaviest items. 
- **Task**: Join `sellers`, `order_items`, and `products`. Filter for items with `product_weight_g` > 10000 (10kg). Select `seller_id` and the average weight of items they sell, aliased as `avg_heavy_weight`. Group by `seller_id`.

In [None]:
%%sql
-- Write your solution here

## Challenge 19: The "Deuce" Categories
- **Context**: We want to find product categories that have only been ordered twice. 
- **Task**: Perform a LEFT JOIN starting with `products` and joining to `order_items`. Group by `product_category_name`. Filter using HAVING to find categories where the count of `order_id` is 2.

In [None]:
%%sql
-- Write your solution here

## Challenge 20: The Master View
- **Context**: Generate a comprehensive report for order order_id = '00010242fe8c5a6d1ba2dd792cb16214' (a specific real ID from the dataset). 
- **Task**: Join all 5 tables (`orders`, `customers`, `order_items`, `products`, `sellers`). Select the `order_id`, `customer_city`, `product_category_name`, `price`, and `seller_city`. Filter for orders where `order_id` = ''00010242fe8c5a6d1ba2dd792cb16214'.

In [None]:
%%sql
-- Write your solution here