# Lab Module 12: Performance Basics

(Run the below cell first, to ensure connectivity)

In [None]:
%load_ext sql

%sql postgresql://admin:password@postgres:5432/postgres

## Challenge 1: The Execution Plan
- **Context**: Before optimizing, we must understand how the database retrieves data. The EXPLAIN command shows the query execution plan without actually running the full query. 
- **Task**: Write a command to generate the execution plan for a simple query that selects all columns from the `orders` table where the `order_status` is 'delivered'. Do not execute the query, just show the plan.

In [None]:
%%sql
-- Write your solution here


## Challenge 2: Measuring Actual Cost
- **Context**: EXPLAIN provides estimates. To see actual run times and row counts, we use `EXPLAIN ANALYZE`. This actually runs the query, so be careful with heavy operations! 
- **Task**: Write a query to retrieve the execution plan and actual runtime statistics for counting all rows in the `customers` table where the `customer_zip_code_prefix` = '01001'. Take note of the values you see, we will come back to them later.

In [None]:
%%sql
-- Write your solution here


## Challenge 3: Creating an Index
- **Context**: We frequently search for customers by their zip code prefix. Currently, this likely triggers a "Sequential Scan" (reading the whole table). 
- **Task**: Create an index named `idx_customer_zip` on the `customer_zip_code_prefix` column in the `customers` table.

In [None]:
%%sql
-- Write your solution here

## Challenge 4: SARGable Dates (The Year Trap)
- **Context**: You need to find all orders placed in 2017. A common mistake is applying a function to the column (e.g., EXTRACT(YEAR FROM...)), which prevents the database from using an index (Non-SARGable). 
- **Task**: Write an optimized, SARGable query to select all `order_ids` from the `orders` table for the year 2017. Use a date range comparison instead of a date function.

In [None]:
%%sql
-- Write your solution here

## Challenge 5: Data Type Mismatches
- **Context**: The customer_zip_code_prefix is defined as VARCHAR (text), even though it looks like a number. If you search using a number (e.g., 22020), the database must implicitly cast every row's value to a number to compare it, preventing index usage. 
- **Task**: Write a query to select `customer_id` for the zip code prefix '22020'. Ensure you treat the literal as a string, not a number, to ensure it is SARGable.

In [None]:
%%sql
-- Write your solution here

## Challenge 6: Leading Wildcards
- **Context**: The marketing team wants to find customers in cities ending with "...ville". Using a wildcard at the start of a pattern (e.g., %ville) prevents the use of B-Tree indexes because the tree is sorted from left to right. 
- **Task**: To demonstrate a good wildcard search, write a query to find all `customers` in cities that start with the word "Sao". Use the LIKE operator appropriately.

In [None]:
%%sql
-- Write your solution here

## Challenge 7: Indexing Foreign Keys
- **Context**: We often join order_items to products. By default, Foreign Keys are not indexed in PostgreSQL. This makes joining large tables slow. 
- **Task**: Create an index named `idx_order_items_product` on the `product_id` column within the `order_items` table to optimize future joins.

In [None]:
%%sql
-- Write your solution here

## Challenge 8: Composite Indexes (Column Order)
- **Context**: We often filter products by product_category_name AND product_weight_g. A single index on both columns (Composite Index) is more efficient than two separate indexes. The order matters: put the column you filter by with equality (=) first. 
- **Task**: Create a composite index (a multi-column index) named `idx_prod_cat_weight` on products that covers `product_category_name` first, then `product_weight_g`.

In [None]:
%%sql
-- Write your solution here

## Challenge 9: The Index Only Scan
- **Context**: If an index contains all the data required by a query, the database doesn't even need to look at the main table (the Heap). This is called an "Index Only Scan" and is extremely fast. 
- **Task**: We previously created an index on `customers(customer_zip_code_prefix)` (Challenge 3). Write a query that counts how many customers are in the zip code prefix '01001'. Do not select any other columns (like name or city), as that would force the database to leave the index.

In [None]:
%%sql
-- Write your solution here

## Results Discussion
It's easy to dismiss the results as both queries were incredibly fast (because we are not working with a lot of data). But, if you take the percentage change between cost and time, the result percentage will demonstrate how powerful this optimization was.

In my case, my initial count stats were:
- Aggregate (cost=2692.03)
- Seq Scan
- Execution Time 29.256 ms

My after index stats were:
- Aggregate (cost=4.45)
- Index Only Scan
- Execution time 0.111


In terms of cost, the index scan was 99.83% more efficient while execution time was 99.62% faster. That is an incredible change for such a simple database modification.