# Module 6 — Aggregations & Grouped Analytics (All-in-One)

This notebook consolidates all Module 6 topics into a single file, following the flow of your document.
It includes teaching markdown cells and executable Python+SQL cells against the `farmers_market` database.

## Setup — Connect to `farmers_market` and Helper

In [3]:
import mysql.connector
import pandas as pd

# Establish connection to local MySQL server
conn = mysql.connector.connect(
    host="localhost",
    user="root",       # instructional only; never use root in production
    password="William2025!!",   # replace with your secure password
    database="farmers_market"
)
print(f"Connected to {conn.database}!")

def run_query(sql: str, params: tuple = None, preview: int = 10):
    cur = conn.cursor()
    cur.execute(sql, params or ())
    rows = cur.fetchall()
    cols = [d[0] for d in cur.description] if cur.description else []
    cur.close()
    df = pd.DataFrame(rows, columns=cols)
    return df.head(preview) if preview is not None else df

Connected to farmers_market!


---
## 1) GROUP BY Basics

Apply `GROUP BY` to summarize rows into groups and compute aggregates (e.g., totals per vendor per market date).

### Before Grouping — Raw rows from `vendor_inventory`

In [6]:
run_query('''
SELECT market_date,
vendor_id,
product_id, quantity, original_price
FROM vendor_inventory
ORDER BY market_date, vendor_id, product_id
LIMIT 20;
''')

Unnamed: 0,market_date,vendor_id,product_id,quantity,original_price
0,2019-04-03,7,4,40.0,4.0
1,2019-04-03,8,5,16.0,6.5
2,2019-04-03,8,7,8.0,18.0
3,2019-04-03,8,8,10.0,18.0
4,2019-04-06,7,4,40.0,4.0
5,2019-04-06,8,5,23.0,6.5
6,2019-04-06,8,7,8.0,18.0
7,2019-04-06,8,8,8.0,18.0
8,2019-04-10,7,4,30.0,4.0
9,2019-04-10,8,5,23.0,6.5


### After Grouping — Totals per vendor per market date

In [5]:
run_query('''
SELECT 
    market_date,
    vendor_id,
    SUM(quantity)                          AS total_quantity,
    COUNT(*)                               AS rows_count,
    COUNT(DISTINCT product_id)             AS distinct_products
FROM vendor_inventory
GROUP BY market_date, vendor_id
ORDER BY market_date, vendor_id;
''')

Unnamed: 0,market_date,vendor_id,total_quantity,rows_count,distinct_products
0,2019-04-03,7,40.0,1,1
1,2019-04-03,8,34.0,3,3
2,2019-04-06,7,40.0,1,1
3,2019-04-06,8,39.0,3,3
4,2019-04-10,7,30.0,1,1
5,2019-04-10,8,37.0,3,3
6,2019-04-13,7,30.0,1,1
7,2019-04-13,8,38.0,3,3
8,2019-04-17,7,40.0,1,1
9,2019-04-17,8,39.0,3,3


**Interpretation:** Each row represents one **(market_date, vendor_id)** group with totals and counts.

---
## 2) Aggregate Functions — COUNT() and SUM()

Use core aggregate functions to summarize grouped data.

### COUNT() — How many products is each vendor offering each market date?

In [8]:
run_query('''
SELECT
    market_date,
    vendor_id,
    COUNT(*)                   AS rows_per_vendor_date,
    COUNT(DISTINCT product_id) AS unique_products
FROM vendor_inventory
GROUP BY market_date, vendor_id
ORDER BY market_date, vendor_id;
''')

Unnamed: 0,market_date,vendor_id,rows_per_vendor_date,unique_products
0,2019-04-03,7,1,1
1,2019-04-03,8,3,3
2,2019-04-06,7,1,1
3,2019-04-06,8,3,3
4,2019-04-10,7,1,1
5,2019-04-10,8,3,3
6,2019-04-13,7,1,1
7,2019-04-13,8,3,3
8,2019-04-17,7,1,1
9,2019-04-17,8,3,3



**Interpretation:**

- `COUNT(DISTINCT product_id)`:
  This aggregate function counts the number of unique `product_id` values within each group. It measures the **assortment breadth**, which refers to the variety or diversity of products offered by a vendor on a specific market date. A higher count indicates a wider range of products available.


### SUM() — Total quantity per vendor per market date

In [10]:
run_query('''
SELECT
    market_date,
    vendor_id,
    SUM(quantity) AS total_quantity
FROM vendor_inventory
GROUP BY market_date, vendor_id
ORDER BY market_date, vendor_id;
''')

Unnamed: 0,market_date,vendor_id,total_quantity
0,2019-04-03,7,40.0
1,2019-04-03,8,34.0
2,2019-04-06,7,40.0
3,2019-04-06,8,39.0
4,2019-04-10,7,30.0
5,2019-04-10,8,37.0
6,2019-04-13,7,30.0
7,2019-04-13,8,38.0
8,2019-04-17,7,40.0
9,2019-04-17,8,39.0



**Interpretation:**

- `SUM(quantity)`:
  This aggregate function calculates the total quantity of all products within each group. It measures the **volume**, which represents the total number of items sold or available. A higher sum indicates a larger volume of goods being handled by the vendor on that market date.

Together, these metrics provide insights into both the diversity of products (breadth) and the scale of operations (volume) for each vendor on a given

---
## 3) Calculations Inside Aggregates (Inventory Value)

Compute monetary value with arithmetic inside aggregates (e.g., `SUM(quantity * original_price)`).

### Inventory Value per vendor per market date

In [15]:
run_query('''
SELECT
    market_date,
    vendor_id,
    ROUND(SUM(quantity * IFNULL(original_price,0)), 2) AS inventory_value
FROM vendor_inventory
GROUP BY market_date, vendor_id
ORDER BY market_date, vendor_id;
''')

Unnamed: 0,market_date,vendor_id,inventory_value
0,2019-04-03,7,160.0
1,2019-04-03,8,428.0
2,2019-04-06,7,160.0
3,2019-04-06,8,437.5
4,2019-04-10,7,120.0
5,2019-04-10,8,401.5
6,2019-04-13,7,120.0
7,2019-04-13,8,396.5
8,2019-04-17,7,160.0
9,2019-04-17,8,449.0


### Add vendor names for readability (JOIN)

In [17]:
run_query('''
SELECT
    vi.market_date,
    v.vendor_name,
    ROUND(SUM(vi.quantity * IFNULL(vi.original_price,0)), 2) AS inventory_value
FROM vendor_inventory AS vi
JOIN vendor AS v
  ON vi.vendor_id = v.vendor_id
GROUP BY vi.market_date, v.vendor_name
ORDER BY vi.market_date, inventory_value DESC;
''')

Unnamed: 0,market_date,vendor_name,inventory_value
0,2019-04-03,Annie's Pies,428.0
1,2019-04-03,Marco's Peppers,160.0
2,2019-04-06,Annie's Pies,437.5
3,2019-04-06,Marco's Peppers,160.0
4,2019-04-10,Annie's Pies,401.5
5,2019-04-10,Marco's Peppers,120.0
6,2019-04-13,Annie's Pies,396.5
7,2019-04-13,Marco's Peppers,120.0
8,2019-04-17,Annie's Pies,449.0
9,2019-04-17,Marco's Peppers,160.0


**Interpretation:** Arithmetic inside `SUM` allows business metrics like inventory value.

---
## 4) Aggregation with Multiple Tables

First JOIN to verify row-level correctness, then add `GROUP BY` to summarize.

### Join without aggregation — sanity check

In [18]:
run_query('''
SELECT
    vi.market_date,
    v.vendor_name,
    vi.product_id,
    vi.quantity,
    vi.original_price
FROM vendor_inventory AS vi
JOIN vendor AS v ON vi.vendor_id = v.vendor_id
ORDER BY vi.market_date, v.vendor_name, vi.product_id
LIMIT 30;
''')

Unnamed: 0,market_date,vendor_name,product_id,quantity,original_price
0,2019-04-03,Annie's Pies,5,16.0,6.5
1,2019-04-03,Annie's Pies,7,8.0,18.0
2,2019-04-03,Annie's Pies,8,10.0,18.0
3,2019-04-03,Marco's Peppers,4,40.0,4.0
4,2019-04-06,Annie's Pies,5,23.0,6.5
5,2019-04-06,Annie's Pies,7,8.0,18.0
6,2019-04-06,Annie's Pies,8,8.0,18.0
7,2019-04-06,Marco's Peppers,4,40.0,4.0
8,2019-04-10,Annie's Pies,5,23.0,6.5
9,2019-04-10,Annie's Pies,7,6.0,18.0


### Join with aggregation — totals per vendor per day

In [22]:
run_query('''
SELECT
    vi.market_date,
    v.vendor_name,
    COUNT(DISTINCT vi.product_id)                        AS distinct_products,
    ROUND(SUM(vi.quantity), 2)                           AS total_qty,
    ROUND(SUM(vi.quantity * IFNULL(vi.original_price,0)), 2) AS inventory_value
FROM vendor_inventory AS vi
JOIN vendor AS v ON vi.vendor_id = v.vendor_id
GROUP BY vi.market_date, v.vendor_name
ORDER BY vi.market_date, inventory_value DESC;
''')

Unnamed: 0,market_date,vendor_name,distinct_products,total_qty,inventory_value
0,2019-04-03,Annie's Pies,3,34.0,428.0
1,2019-04-03,Marco's Peppers,1,40.0,160.0
2,2019-04-06,Annie's Pies,3,39.0,437.5
3,2019-04-06,Marco's Peppers,1,40.0,160.0
4,2019-04-10,Annie's Pies,3,37.0,401.5
5,2019-04-10,Marco's Peppers,1,30.0,120.0
6,2019-04-13,Annie's Pies,3,38.0,396.5
7,2019-04-13,Marco's Peppers,1,30.0,120.0
8,2019-04-17,Annie's Pies,3,39.0,449.0
9,2019-04-17,Marco's Peppers,1,40.0,160.0


**Interpretation:** Joining expands context (e.g., names), then aggregation produces per-group summaries.

---
## 5) Summary Statistics — MIN / MAX / AVG

Compute per-group summary stats from `customer_purchases` on `cost_to_customer_per_qty`.

### Show a few rows from `customer_purchases`

In [24]:
run_query('''
SELECT market_date, transaction_time, customer_id, product_id, quantity, cost_to_customer_per_qty
FROM customer_purchases
ORDER BY market_date, customer_id, transaction_time
LIMIT 30;
''')

Unnamed: 0,market_date,transaction_time,customer_id,product_id,quantity,cost_to_customer_per_qty
0,2019-04-03,0 days 18:44:00,3,4,1.0,4.0
1,2019-04-03,0 days 18:09:00,4,4,1.0,4.0
2,2019-04-03,0 days 18:41:00,5,8,1.0,18.0
3,2019-04-03,0 days 18:54:00,5,4,3.0,4.0
4,2019-04-03,0 days 17:22:00,6,5,1.0,6.5
5,2019-04-03,0 days 18:49:00,6,4,4.0,4.0
6,2019-04-03,0 days 17:59:00,7,4,5.0,4.0
7,2019-04-03,0 days 16:17:00,9,8,2.0,18.0
8,2019-04-03,0 days 16:20:00,9,7,1.0,18.0
9,2019-04-03,0 days 16:40:00,9,5,1.0,6.5


### Per customer per market_date stats (rounded)

In [21]:
run_query('''
SELECT
    market_date,
    customer_id,
    ROUND(MIN(cost_to_customer_per_qty), 2) AS min_price,
    ROUND(MAX(cost_to_customer_per_qty), 2) AS max_price,
    ROUND(AVG(cost_to_customer_per_qty), 2) AS avg_price
FROM customer_purchases
GROUP BY market_date, customer_id
ORDER BY market_date, customer_id;
''')

Unnamed: 0,market_date,customer_id,min_price,max_price,avg_price
0,2019-04-03,3,4.0,4.0,4.0
1,2019-04-03,4,4.0,4.0,4.0
2,2019-04-03,5,4.0,18.0,11.0
3,2019-04-03,6,4.0,6.5,5.25
4,2019-04-03,7,4.0,4.0,4.0
5,2019-04-03,9,6.5,18.0,13.4
6,2019-04-03,10,18.0,18.0,18.0
7,2019-04-03,11,18.0,18.0,18.0
8,2019-04-03,12,4.0,6.5,5.25
9,2019-04-03,16,4.0,6.5,5.25


**Interpretation:** Shows price dispersion customers faced on each market date.

---
## 6) COUNT vs COUNT DISTINCT

Compare unique-entity counts vs total event counts.

### Unique customers per market date

In [25]:
run_query('''
SELECT
    market_date,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM customer_purchases
GROUP BY market_date
ORDER BY market_date;
''')

Unnamed: 0,market_date,unique_customers
0,2019-04-03,12
1,2019-04-06,13
2,2019-04-10,10
3,2019-04-13,13
4,2019-04-17,13
5,2019-04-20,14
6,2019-04-24,14
7,2019-04-27,15
8,2019-05-01,12
9,2019-05-04,15


### Total purchases per market date

In [26]:
run_query('''
SELECT
    market_date,
    COUNT(*) AS purchase_count
FROM customer_purchases
GROUP BY market_date
ORDER BY market_date;
''')

Unnamed: 0,market_date,purchase_count
0,2019-04-03,23
1,2019-04-06,25
2,2019-04-10,23
3,2019-04-13,21
4,2019-04-17,26
5,2019-04-20,26
6,2019-04-24,26
7,2019-04-27,20
8,2019-05-01,26
9,2019-05-04,29


**Interpretation:** `COUNT(DISTINCT ...)` measures breadth; `COUNT(*)` measures volume.

---
## 7) Filtering with HAVING

Filter groups **after** aggregation (e.g., inventory value > $900).

### Vendors with inventory value > $900 on each market date

In [27]:
run_query('''
SELECT
    vi.market_date,
    v.vendor_name,
    ROUND(SUM(vi.quantity * IFNULL(vi.original_price,0)), 2) AS inventory_value
FROM vendor_inventory AS vi
JOIN vendor AS v ON vi.vendor_id = v.vendor_id
GROUP BY vi.market_date, v.vendor_name
HAVING SUM(vi.quantity * IFNULL(vi.original_price,0)) > 900
ORDER BY vi.market_date, inventory_value DESC;
''')

Unnamed: 0,market_date,vendor_name,inventory_value
0,2019-11-20,Annie's Pies,1008.5
1,2019-11-23,Annie's Pies,971.0
2,2019-11-30,Annie's Pies,907.5
3,2019-12-18,Annie's Pies,1113.5
4,2019-12-28,Annie's Pies,992.5


**Interpretation:** `HAVING` filters groups by aggregate conditions; unlike `WHERE`, it runs **after** grouping.

---
## 8) Categorizing with Aggregation Using CASE

Apply `CASE` within aggregated results to label groups as A/B/C based on total inventory value.

### Category rules
- **A**: inventory_value > 900
- **B**: 400 ≤ inventory_value ≤ 900
- **C**: inventory_value < 400

In [16]:
run_query('''
SELECT
    vi.market_date,
    v.vendor_name,
    ROUND(SUM(vi.quantity * IFNULL(vi.original_price,0)), 2) AS inventory_value,
    CASE
        WHEN SUM(vi.quantity * IFNULL(vi.original_price,0)) > 900 THEN 'A'
        WHEN SUM(vi.quantity * IFNULL(vi.original_price,0)) BETWEEN 400 AND 900 THEN 'B'
        ELSE 'C'
    END AS inventory_category
FROM vendor_inventory AS vi
JOIN vendor AS v ON vi.vendor_id = v.vendor_id
GROUP BY vi.market_date, v.vendor_name
ORDER BY vi.market_date, v.vendor_name;
''')

Unnamed: 0,market_date,vendor_name,inventory_value,inventory_category
0,2019-04-03,Annie's Pies,428.0,B
1,2019-04-03,Marco's Peppers,160.0,C
2,2019-04-06,Annie's Pies,437.5,B
3,2019-04-06,Marco's Peppers,160.0,C
4,2019-04-10,Annie's Pies,401.5,B
5,2019-04-10,Marco's Peppers,120.0,C
6,2019-04-13,Annie's Pies,396.5,C
7,2019-04-13,Marco's Peppers,120.0,C
8,2019-04-17,Annie's Pies,449.0,B
9,2019-04-17,Marco's Peppers,160.0,C


**Interpretation:** `CASE` assigns labels from aggregate values for downstream analytics or reporting.

### Explanation of the Query

This query calculates the **inventory value** for each vendor on each market date by multiplying the quantity of products by their original price and summing the results. The key components of the query are:

- **`ROUND(SUM(quantity * IFNULL(original_price, 0)), 2)`**:
  - Multiplies the `quantity` of each product by its `original_price`.
  - Uses `IFNULL(original_price, 0)` to handle cases where the price is `NULL`, replacing it with `0`.
  - Sums up the results for all products within each group.
  - Rounds the final value to 2 decimal places for readability.

- **`GROUP BY market_date, vendor_id`**:
  - Groups the data by `market_date` and `vendor_id`, ensuring that the calculations are performed separately for each vendor on each market date.

- **`ORDER BY market_date, vendor_id`**:
  - Sorts the results by `market_date` and then by `vendor_id` for better organization and readability.

The output provides the total monetary value of the inventory for each vendor on each market date, which is useful for analyzing vendor performance and inventory management.