# Lab Module: Chapter 9 - Advanced Functions
We will be utilizing the Olist E-COmmerce dataset for the first set of questions. This dataset is created in the `00_Start_Here.ipynb` notebook, that should be run before attempting these challenges. The Python code in that notebook describes each of the tables, and their relationships. For schema, please review the create table statements in that notebook.

(Run the cell below to ensure connectivity)

In [None]:
%load_ext sql

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = True
%config SqlMagic.displaycon = False

%sql postgresql://admin:password@postgres:5432/postgres

## Challenge 1: Standardizing Category Labels
- **Context**: The catalog team noticed that product categories are stored in snake_case and want them cleaned up for a report.
- **Task**: Write a query to return the `product_category_name` in all uppercase and a new column `name_length` as its character length, from the `products` table. Ensure that the `product_category_name` `IS NOT NULL`, order by `name_length` DESC and limit the results to 5.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE


The answer is:

|	|upper|	name_length|
|:---|:---|:---|
|0|	MOVEIS_COZINHA_AREA_DE_SERVICO_JANTAR_E_JARDIM|	46|
|1|	MOVEIS_COZINHA_AREA_DE_SERVICO_JANTAR_E_JARDIM|	46|
|2|	MOVEIS_COZINHA_AREA_DE_SERVICO_JANTAR_E_JARDIM|	46|
|3|	MOVEIS_COZINHA_AREA_DE_SERVICO_JANTAR_E_JARDIM|	46|
|4|	MOVEIS_COZINHA_AREA_DE_SERVICO_JANTAR_E_JARDIM|	46|


## Challenge 2: Product ID Masking
- **Context**: For data privacy in external reports, the security team wants to see only the first 8 characters of the `product_id`.
- **Task**: Use a string function to retrieve the first 8 characters of every `product_id` from the `products` table. Order the results by `product_id` descending and limit the results to 5.

In [None]:
%%sql
# WRiTE YOUR SOLUTION HERE

The answer is:

|	|substring|
|:---|:---|
|0|	fffe9eef|
|1|	fffdb2d0|
|2|	fff9553a|
|3|	fff81cc3|
|4|	fff61776|


## Challenge 3: Missing Category Data
- **Context**: Some products are missing a category name; the UI should display "Uncategorized" instead of a blank space.
- **Task**: Use `COALESCE` to replace `NULL` `product_category_name` values with the string 'uncategorized'. Filter for when `product_category_name` `IS NULL`. How many results are there?

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is 610.

## Challenge 4: Generating Shipping Labels
- **Context**: Warehouse staff needs a combined string for shipping manifests.
- **Task**: Concatenate the `customer_id` and the `customer_city` from the `customers` table, separated by a hyphen and a space (e.g., "ID - CITY"). Order the results by `customer_id` descending, and limit the results to 1.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is:

|	|concat|
|:---|:---|
|0|	ffffe8b65bbe3087b653a978c870db99 - osasco|


## Challenge 5: Estimated Arrival Projections
- **Context**: To improve customer experience, we want to show customers an "Estimated Arrival" date, which is 7 days after their purchase.
- **Task**: Use `DATEADD` to add 7 days to the `order_purchase_timestamp`, from the `orders` table. Name the result `estimated_arrival_date`, order by `order_id` descending and limit the results to 5. (Remember this is PostgreSQL).

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is:

|	|estimated_arrival_date|
|:--|:---|
|0|	2018-06-16 17:00:18|
|1|	2017-08-21 23:02:59|
|2|	2017-10-30 17:07:56|
|3|	2018-07-21 10:26:46|
|4|	2018-04-30 13:57:06|


## Switching Engines
The remainder of this lab will be working on NYC Taxi Data that was already loaded for you into DuckDB. Run the below cell to connect to DucKDB to get started.

The NYC Taxi Yellow Dataset schema can be found [here](https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf). The table name we will be working with is `taxi_trips`.

In [None]:
%sql duckdb:////home/jovyan/data/analytics.duckdb

## Challenge 6: Extracting Pickup Hours
- **Context**: Operations wants to understand peak traffic periods by isolating the hour of the day from taxi pickups.
- **Task**: Extract the hour from `tpep_pickup_datetime` and alias it as `hour_time`. Count the number of trips per hour. Group the results by `hour_time`, your new column.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

Hint:

DuckDB is a little different than PostgreSQL, to use DATEPART you must declare the part as a string argument. For example: `DATEPART('hour', ....)`, where `...` is the name of the column  your looking to part.

The answer is:

|	|hour_time|	count_star()|
|:---|:---|:---|
|0|	15|	196424|
|1|	22|	147415|
|2|	11|	154157|
|3|	16|	195977|
|4|	0|	84969|


## Challenge 7: Fare Rounding for Cash Payments
- **Context**: To simplify cash handling, management wants to see what the `total_amount` looks like rounded to the nearest whole dollar.
- **Task**: Select the original `total_amount` and a new rounded version using the `ROUND` function aliased as `rounded_amount`. Order the results by `rounded_amount` descending and limit the results to 5.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is:

|	|total_amount|	rounded_amount|
|:---|:---|:---|
|0|	1169.4|	1169.0|
|1|	1000.0|	1000.0|
|2|	901.0|	901.0|
|3|	751.0|	751.0|
|4|	705.6|	706.0|


## Challenge 8: Categorizing Trip Distances
- **Context**: The analytics team wants to segment trips into 'Short', 'Medium', and 'Long' for better reporting.
- **Task**: Use a `CASE` statement to label trips:
    - `< 2` miles is 'Short'
    - `2 -5` miles is 'Medium'
    - `> 5` miles is 'Long'
- Select `trip_distance`, and use the above `CASE` statement to make a new column `trip_segment`. Order the results by `trip_distance` descending and limit the results by 5.

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is:

|	|trip_distance|	trip_segment|
|:---|:---|:---|
|0|	258928.15|	Long|
|1|	225987.37|	Long|
|2|	187872.33|	Long|
|3|	116439.71|	Long|
|4|	85543.66|	Long|


## Challenge 9: Identifying Unusual Payment Amounts
- **Context**: Finance is looking for "even" payments that might indicate flat-rate manual entries.
- **Task**: Count the number of trips when the fare was a whole number (no cents) using the `MOD` function. How many trips did you find?

In [None]:
%%sql
# WRITE YOUR SOLUTION HERE

The answer is 242020.