# Week 9 - SQL Basics II

# Chapter 7: SQL Basics (Continued)

## 7-8 Aggregate Processing

* **Concept:** Often, database questions require processing collections of rows as a single unit, rather than row-by-row. This is done using **aggregate functions**. 
* **Defining Characteristic:** Aggregate functions take a collection (set) of rows and reduce it to a single summary row. 

### 7-8a Aggregate Functions

* **Purpose:** Perform mathematical summaries on data sets (e.g., count, min, max, sum, average). 
* **Common Functions:** 
    * `COUNT`: Number of non-null values.
    * `MIN`: Minimum value in a column.
    * `MAX`: Maximum value in a column.
    * `SUM`: Sum of values in a column.
    * `AVG`: Arithmetic mean (average) of values in a column.
* **Usage:** Typically used in the `SELECT` column list. 

#### `COUNT`

* **Function:** Tallies the number of rows containing non-null values for a specified attribute. 
    ```sql
    -- Count all products based on the primary key (P_CODE)
    SELECT COUNT(P_CODE) FROM PRODUCT; -- Result: 16 

    -- Count how many products have a V_CODE assigned (ignores NULLs)
    SELECT COUNT(V_CODE) FROM PRODUCT; -- Result: 14 
    ```
* **`COUNT(*)`:** Counts all rows in the collection, regardless of NULLs in specific columns. 
    ```sql
    -- Count total number of rows in PRODUCT table
    SELECT COUNT(*) FROM PRODUCT;
    ```
* **`COUNT(DISTINCT column_name)`:** Counts the number of unique non-null values in a column. 
    ```sql
    -- Count how many distinct vendors are represented in the PRODUCT table
    SELECT COUNT(DISTINCT V_CODE) AS "COUNT DISTINCT"
    FROM PRODUCT; -- Result: 6 (ignores NULLs and duplicate V_CODEs) 
    ```
    * *(Note: MS Access may require subqueries for `COUNT(DISTINCT)`)* 

#### `MIN` and `MAX`

* **Function:** Find the minimum (`MIN`) or maximum (`MAX`) value in a column. 
* **Usage:** Can be used on numeric and date columns (older dates are "smaller"). 
    ```sql
    -- Find the highest and lowest product prices
    SELECT MAX(P_PRICE) AS MAXPRICE, MIN(P_PRICE) AS MINPRICE
    FROM PRODUCT; -- 

    -- Find the oldest inventory date
    -- SELECT MIN(P_INDATE) FROM PRODUCT; 
    ```

#### `SUM` and `AVG`

* **`SUM` Function:** Computes the total sum for a numeric attribute or expression. 
    ```sql
    -- Calculate the total balance owed by all customers
    SELECT SUM(CUS_BALANCE) AS TOTBALANCE
    FROM CUSTOMER; -- 

    -- Calculate the total value of all inventory (Quantity * Price)
    SELECT SUM(P_QOH * P_PRICE) AS TOTVALUE
    FROM PRODUCT; -- 
    ```
* **`AVG` Function:** Computes the arithmetic mean (average) for a numeric attribute or expression. 
    ```sql
    -- Calculate the average product price
    SELECT AVG(P_PRICE) AS AVGPRICE
    FROM PRODUCT; -- 
    ```

### 7-8b Grouping Data (`GROUP BY` Clause)

* **Purpose:** Divides rows from a `SELECT` statement into smaller groups based on the values in one or more specified columns. Aggregate functions then operate on each group independently. 
* **Syntax:** Follows `FROM` and `WHERE`, precedes `ORDER BY`. 
    ```sql
    SELECT columnlist...
    FROM tablelist...
    [WHERE conditionlist...]
    GROUP BY columnlist...
    [ORDER BY columnlist...]
    ```
* **Interaction with Aggregates:** `GROUP BY` forms the collections; aggregate functions reduce each collection to a single summary row. 
    ```sql
    -- Calculate the average product price for each vendor
    SELECT V_CODE, AVG(P_PRICE) AS AVGPRICE
    FROM PRODUCT
    GROUP BY V_CODE; -- Groups rows by V_CODE before calculating AVG 
    ```
* **NULLs in `GROUP BY`:** Rows with `NULL` in the grouping column(s) are grouped together into one collection. 
* **`SELECT` List Restriction:** When using `GROUP BY`, any column in the `SELECT` list must either be:
    1.  An aggregate function (e.g., `COUNT()`, `AVG()`, `SUM()`).
    2.  Part of the `GROUP BY` clause.
    * **Reason:** The DBMS must be able to determine a single value for each column in the resulting summary row for each group. If a column isn't aggregated or grouped, the DBMS doesn't know which value from the group to display. 
    * **Error Example:** `SELECT V_CODE, V_NAME, P_QOH, COUNT(P_CODE)... GROUP BY V_CODE, V_NAME;` will fail because `P_QOH` is not aggregated and not in the `GROUP BY`. 
    * **Fixes:** Either apply an aggregate to `P_QOH` (e.g., `SUM(P_QOH)`) or add `P_QOH` to the `GROUP BY` clause (which changes the grouping logic and results). 
* **Adding to `GROUP BY`:** Including additional columns in `GROUP BY` can change the number of groups and the aggregate results if those columns have varying values within the original groups. If the added column has the same value within each original group, the results might not change. 

### 7-8c `HAVING` Clause

* **Purpose:** Filters the results of a `GROUP BY` operation. It restricts which *groups* are returned, similar to how `WHERE` restricts individual rows. 
* **Syntax:** Follows `GROUP BY`, precedes `ORDER BY`. 
    ```sql
    SELECT columnlist...
    FROM tablelist...
    [WHERE conditionlist...]
    GROUP BY columnlist...
    HAVING group_conditionlist...
    [ORDER BY columnlist...]
    ```
* **`HAVING` vs. `WHERE`:** 
    * `WHERE` filters *individual rows* **before** grouping occurs. It cannot contain aggregate functions because groups don't exist yet. 
    * `HAVING` filters *entire groups* **after** grouping occurs. It commonly uses aggregate functions because it operates on the summary results of the groups. 
* **Example:** List vendors whose average product price is less than $10.
    ```sql
    SELECT V_CODE, COUNT(P_CODE) AS NUMPRODS
    FROM PRODUCT
    GROUP BY V_CODE
    HAVING AVG(P_PRICE) < 10 -- Filter groups based on the aggregate AVG() 
    ORDER BY V_CODE;
    ```
* **Using Both:** A query can have both `WHERE` (to filter rows before grouping) and `HAVING` (to filter groups after grouping). 
    ```sql
    -- Example combining WHERE and HAVING
    SELECT
        V_CODE, V_NAME, SUM(P_QOH * P_PRICE) AS TOTCOST
    FROM
        PRODUCT JOIN VENDOR ON PRODUCT.V_CODE = VENDOR.V_CODE
    WHERE
        P_DISCOUNT > 0 -- Filter rows BEFORE grouping
    GROUP BY
        V_CODE, V_NAME
    HAVING
        SUM(P_QOH * P_PRICE) > 500 -- Filter groups AFTER grouping 
    ORDER BY
        SUM(P_QOH * P_PRICE) DESC; -- Order by the aggregate
    ```
* **Syntax Note:** When referring to computed/aggregated columns in `HAVING` or `ORDER BY`, use the expression itself (e.g., `SUM(P_QOH * P_PRICE)`) rather than the alias (e.g., `TOTCOST`), as alias support varies across RDBMSs. 

## 7-9 Subqueries

* **Concept:** A query ( `SELECT` statement) nested inside another SQL query. 
* **Purpose:** Allows processing data based on intermediate results obtained from another query. 
* **Terminology:** 
    * **Outer Query:** The main, first query.
    * **Inner Query (Subquery):** The query inside parentheses.
    * **Nested Query:** The entire SQL statement containing outer and inner queries.
* **Execution Flow:** The inner query executes first; its output is then used as input for the outer query. 
* **Example Use Cases:**
    * Finding values based on an unknown aggregate (e.g., products priced above the average price). 
    * Checking for existence or non-existence in another table (e.g., vendors who do *not* supply products). 
* **Syntax Examples:**
    ```sql
    -- Find vendors who do NOT supply products
    SELECT V_CODE, V_NAME
    FROM VENDOR
    WHERE V_CODE NOT IN (SELECT V_CODE FROM PRODUCT WHERE V_CODE IS NOT NULL);
    -- Inner query finds vendors *in* PRODUCT, outer query selects vendors *NOT IN* that list. 

    -- Find products with price >= average product price
    SELECT P_CODE, P_PRICE
    FROM PRODUCT
    WHERE P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT);
    -- Inner query calculates AVG(P_PRICE), outer query uses that value for comparison. 
    ```