# Lab Module: Chapter 8 - Subqueries

We will be utilizing the Olist E-COmmerce dataset for the first set of questions. This dataset is created in the `00_Start_Here.ipynb` notebook, that should be run before attempting these challenges. The Python code in that notebook describes each of the tables, and their relationships. For schema, please review the create table statements in that notebook.

(Run the cell below to ensure connectivity)

In [1]:
%load_ext sql

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = True
%config SqlMagic.displaycon = False

%sql postgresql://admin:password@postgres:5432/postgres

## Challenge 1: Single Row Retrieval
- **Context**: We need to identify the specific product category associated with the most expensive single item ever sold on the platform to feature it in a newsletter. 
- **Task**: Write a query to find the `product_category_name` from the `products` table for the product that had the highest individual `price` in the `order_items` table.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE
SELECT product_catagory_name
FROM products
WHERE product_id = (
    SELECT product_id
    FROM order_items
    ORDER BY price DESC
    LIMIT 1
);

The answer is: utilidades_domesticas

## Challenge 2: Multi-Column Subquery
- **Context**: We are looking for "Super Users" who match the location profile of our top sellers.
- **Task**: Find all customers who live in the same `customer_city` and `customer_state` as any seller (`seller_city`, `seller_state` from the `sellers` table). Select `customer_id`, `customer_city`, and `customer_state`, from the `customers` table. Order the results by `customer_id` descending and limit the results to 5.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is

|	|customer_id|	customer_city|	customer_state|
|:---|:---|:---|:---|
|0|	ffffe8b65bbe3087b653a978c870db99|	osasco|	SP|
|1|	ffffa3172527f765de70084a7e53aae8|	alfenas|	MG|
|2|	ffff42319e9b2d713724ae527742af25|	taboao da serra|	SP|
|3|	fffeda5b6d849fbd39689bb92087f431|	rio de janeiro|	RJ|
|4|	fffcb937e9dd47a13f05ecb8290f4d3e|	sao paulo|	SP|


## Challenge 3: Basic Correlated Subquery
- **Context**: We need to spot pricing anomalies. The product team wants to know which specific transaction items were sold for a price higher than the average price *of that specific product*.
- **Task**: Select `order_id`, `product_id`, and `price` from `order_items`. Filter for rows where the `price` is greater than the average price of that specific `product_id`. Order the results by `order_id` and `product_id` descending and limit the results to 5.
- **Lesson**: This query will take 15 minutes to run, the lesson to be had is how long this query takes, the performance implications that we discussed in the chapter regarding correlated subqueries.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE


The answer is:

| |order_id|	product_id|	price|
|:---|:---|:---|:---|
|0|	fffe18544ffabc95dfada21779c9644f|	9c422a519119dcad7575db5af1ba540e|	55.99|
|1|	fffce4705a9662cd70adb13d4a31832d|	72a30483855e2eafc67aee5dc2560482|	99.90|
|2|	fffc94f6ce00a00581880bf54a75a037|	4aa6014eceb682077f9dc4bffebc05b0|	299.99|
|3|	fffbee3b5462987e66fb49b1c5411df2|	6f0169f259bb0ff432bfff7d829b9946|	119.85|
|4|	fff8287bbae429a99bb7e8c21d151c41|	bee2e070c39f3dd2f6883a17a5f0da45|	180.00|


## Challenge 4: Correlated Logic with Existence
- **Context**: The logistics team wants to verify which sellers are currently active and generating sales.
- **Task**: Write a query using `EXISTS` to find all columns for `sellers` who satisfy the condition of appearing in the `order_items` table. Order by `seller_id` descending and limit the results to 5.
- **Lesson**: This query is much faster, and its still a correlated subquery. Hopefully from the lesson, you understand why.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is:

|	|seller_id|	seller_zip_code_prefix|	seller_city|	seller_state|
|:---|:---|:---|:---|:---|
|0|	ffff564a4f9085cd26170f4732393726|	13070|	campinas|	SP|
|1|	fffd5413c0700ac820c7069d66d98c89|	13908|	amparo|	SP|
|2|	ffeee66ac5d5a62fe688b9d26f83f534|	15130|	mirassol|	SP|
|3|	ffdd9f82b9a447f6f8d4b91554cc7dd3|	80030|	curitiba|	PR|
|4|	ffcfefa19b08742c5d315f2791395ee5|	80045|	curitiba|	PR|


## Challenge 5: Subqueries in SELECT (Scalar Projection)
- **Context**: We want a report comparing specific orders to the global average without filtering rows.
- **Task**: Select `order_id`, `price`, and a third column named `diff_from_avg` which calculates the difference between the item `price` and the global average price of all items. Order the results by `diff_from_avg` descending and limit the results to 5.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is:

|	|order_id|	price|	diff_from_avg|
|:---|:---|:---|:---|
|0|	0812eb902a67711a1cb742b3cdaa65ae|	6735.00|	6614.3462609853528628|
|1|	fefacc66af859508bf1a7934eab1e97f|	6729.00|	6608.3462609853528628|
|2|	f5136e38d1a14a4dbd87dff67da82701|	6499.00|	6378.3462609853528628|
|3|	a96610ab360d42a2e5335a3998b4718a|	4799.00|	4678.3462609853528628|
|4|	199af31afc78c699f0dbf71fb178d4d4|	4690.00|	4569.3462609853528628|
