# Lab Module 8: Window Functions

(Run the below cell first, to ensure connectivity)

In [None]:
%load_ext sql

%sql postgresql://admin:password@postgres:5432/postgres

## Challenge 1: The Window Aggregate (Total Order Value)
- **Context**: The finance team wants to see individual item prices alongside the total value of the order they belong to, without grouping the data. 
- **Task**: Write a query using `order_items`. Select `order_id`, `order_item_id`, `price`, and a new column `order_total` that calculates the sum of price for that specific `order_id`.

In [None]:
%%sql
-- Write your solution here


## Challenge 2: Contextual Analysis (Category Average)
- **Context**: Product managers want to know how a product's weight compares to the average weight of all products in the same category. 
- **Task**: Select `product_id`, `product_category_name`, and `product_weight_g` from `products`. Create a window function column `avg_category_weight` that shows the average weight for that row's category. Filter out products with NULL categories.

In [None]:
%%sql
-- Write your solution here


## Challenge 3: Calculating Contribution Ratios
- **Context**: We need to calculate what percentage of an order's total freight cost comes from each individual item. 
- **Task**: Using `order_items`, select `order_id`, `order_item_id`, and `freight_value`. Calculate a column `freight_ratio` (Individual Freight / Total Order Freight). (Hint: Use `NULLIF` to handle the event that shipping was free.)

In [None]:
%%sql
-- Write your solution here


## Challenge 4: First Steps in Ranking (ROW_NUMBER)
- **Context**: We want to assign a unique sequential number to each item within an order. 
- **Task**: Select `order_id`, `order_item_id`, and `price` from `order_items`. Use `ROW_NUMBER()` to assign a rank named `item_rank` to each row, partitioned by `order_id` and ordered by `price` descending (most expensive first).

In [None]:
%%sql
-- Write your solution here

## Challenge 5: Handling Ties (RANK)
- **Context**: We are gamifying the seller platform. We want to rank sellers based on their city, ordering them by their zip code. 
- **Task**: Select `seller_id`, `seller_city`, and `seller_zip_code_prefix` from `sellers`. Rank them (`city_rank`) within their `seller_city` using `seller_zip_code_prefix`. Use the `RANK()` function.

In [None]:
%%sql
-- Write your solution here

## Challenge 6: Handling Ties Without Gaps (DENSE_RANK)
- **Context**: Similar to the previous task, but for display purposes, we don't want gaps in the ranking numbers (e.g., we want 1, 1, 2, not 1, 1, 3). 
- **Task**: Select `product_id`, `product_category_name`, and `product_length_cm` from `products`. Rank products by length (`length_rank`) within their category using `DENSE_RANK()`.

In [None]:
%%sql
-- Write your solution here

## Challenge 7: The "Most Recent" Order
- **Context**: The Customer Support team often needs to see just the very last order a customer placed. 
- **Task**: Write a query that selects `customer_unique_id`, `order_id`, and `order_purchase_timestamp` from `orders` and `customers`. Assign a rank `ROW_NUMBER` (`latest_order_rank`) based on the timestamp (latest first) per unique customer. Hint: You will need to join orders and customers.

In [None]:
%%sql
-- Write your solution here

## Challenge 8: Filtering on Window Functions (Top N)
- **Context**: Standard SQL does not allow using window aliases in the WHERE clause. You need to list only the Top 3 heaviest products per category. 
- **Task**: Create a CTE (or subquery) that calculates the rank of `products` by weight (DESC) per category. Then, select from that CTE where the rank is less than or equal to 3.

In [None]:
%%sql
-- Write your solution here

## Challenge 9: Cumulative Sum (Running Total)
- **Context**: We want to visualize the accumulation of payment value over time for a specific order to analyze payment structure. 
- **Task**: Using `order_items`, select `order_id`, `order_item_id`, and `price`. Calculate a `running_total` of `price`, ordered by `order_item_id`, partitioned by `order_id`.

In [None]:
%%sql
-- Write your solution here

## Challenge 10: Previous Order Value (LAG)
- **Context**: To understand customer behavior, we want to see the value of a seller's previous sale alongside their current sale. 
- **Task**: select `seller_id`, `shipping_limit_date`, and `price` from `order_items`. Create a column `previous_price` that retrieves the price from the previous row (ordered by date) for that seller.

In [None]:
%%sql
-- Write your solution here

## Challenge 11: Next Estimated Delivery (LEAD)
- **Context**: Logistics wants to optimize routes. For each order, they want to see when the next order (chronologically) is estimated to be delivered for the same customer. 
- **Task**: Join `orders` and `customers`. Select `customer_unique_id`, `order_id`, and `order_estimated_delivery_date`. Use `LEAD` to create `next_estimated_delivery`.

In [None]:
%%sql
-- Write your solution here

## Challenge 12: Calculating Time Deltas
- **Context**: How many days pass between a customer's orders? This implies retention. 
- **Task**: Using the logic from Challenge 7, create a CTE that finds the `order_purchase_timestamp` and the `LAG` (previous) timestamp per `customer_unique_id`. In the main query, calculate the difference in days between the two.

In [None]:
%%sql
-- Write your solution here

## Challenge 13: Year-Over-Year Growth (Aggregates + Windows)
- **Context**: We want to compare the total sales of the current month to the previous month. 
- **Task**:
    - Create a CTE `MonthlySales`:
        - Create a column that pulls the year out of `order_purchase_timestamp` from the `orders` table. As `sales_year`.
        - Create a column that pulls the month out of `order_purchase_timestamp` from the `orders`. table. As `sales_month`.
        - SUM the `price` from `order_items` to create a new column `total_revenue`.
        - Group the result set by `sales_year` then `sales_month`. (Hint: `GROUP BY 1, 2`)
    - In the main query:
      - Select the `sales_year`, `sales_month`, `total_revenue` from `MonthlySales`. Create a column that is the `LAG` of `total_revenue` ordered by `sales_year` and `sales_month` called `prev_month_revenue`

In [None]:
%%sql
-- Write your solution here

## Challenge 14: Identifying "High Value" Anomalies
- **Context**: We want to flag order items that are significantly more expensive than the average item in that specific order. 
- **Task**: Select `order_id`, `price` from `order_items`. Use a window function to calculate the `avg_order_price`. Then wrap this in a CTE or subquery to filter rows where `price` > `avg_order_price` * 1.5.

In [None]:
%%sql
-- Write your solution here