# Lesson 9 - SQL Aggregations - Part 1 of 2

## NULL values - `IS NULL` and `IS NOT NULL`

Nulls are different from zeros or spaces - they are cells where no data exists. When identifying null values in a `WHERE` calause, we write `IS NULL` or `IS NOT NULL`. We don't use `=` because `NULL` isn't considered a value in SQL. Rather, it is a property of data.

Nulls frequently occur because of:
- `LEFT` or `RIGHT JOINS`
- missing data in the database


## `COUNT`

To return the count of the number of rows that conform to the expression specified. You can use this against non-numeric data too. `COUNT` **will not count the null entries.** This is useful for identifying rows with missing data. It operates along a column, not a row.

*Example:*

`SELECT COUNT(accounts.id)
FROM accounts;`



## `SUM`
Operates like a normal sum function to add numbers in Excel. However, **it will ignore null values.** It operates along a column, not a row.

*Examples:*

Find the total amount of poster_qty paper ordered in the orders table.

`SELECT SUM(poster_qty) sum_poster_qty
FROM orders;`


Find the total dollar amount of sales using the total_amt_usd in the orders table.

`SELECT SUM(total_amt_usd) AS total_dollar_sales
FROM orders;`


Find the total amount spent on standard_amt_usd and gloss_amt_usd paper for each order in the orders table. This should give a dollar amount for each order in the table.

`SELECT standard_amt_usd + gloss_amt_usd AS total_standard_gloss
FROM orders;`

Find the standard_amt_usd per unit of standard_qty paper. Your solution should use both an aggregation and a mathematical operator.

`SELECT SUM(standard_amt_usd)/SUM(standard_qty) AS standard_price_per_unit
FROM orders;`

## `MIN` & `MAX`

Finds the minimum and maximum values like in Excel. It also **ignores null values.** It can also be used on non-numeric data such as dates (earlier and latest dates), letters early or latest in the alphabet, etc. 

*Examples*:



## `AVG`

Returns the mean of all data. This aggregate function again **ignores null values** in both the numerator and denominator. Note that calculating the median is difficult to find in SQL alone.


*Examples:*

When was the earliest order ever placed? You only need to return the date.

`SELECT MIN(occurred_at) 
FROM orders`

Try performing the same query as in question 1 without using an aggregation function. 

`SELECT occurred_at
FROM orders 
ORDER BY occurred_at ASC
LIMIT 1`

When did the most recent (latest) web_event occur?

`SELECT MAX(occurred_at)
FROM web_events`


Try to perform the result of the previous query without using an aggregation function.

`SELECT occurred_at
FROM web_events
ORDER BY occurred_at DESC
LIMIT 1`


Find the mean (AVERAGE) amount spent per order on each paper type, as well as the mean amount of each paper type purchased per order. Your final answer should have 6 values - one for each paper type for the average number of sales, as well as the 
average amount.

`SELECT AVG(standard_qty) mean_standard, AVG(gloss_qty) mean_gloss, 
           AVG(poster_qty) mean_poster, AVG(standard_amt_usd) mean_standard_usd, 
           AVG(gloss_amt_usd) mean_gloss_usd, AVG(poster_amt_usd) mean_poster_usd
FROM orders;`


Via the video, you might be interested in how to calculate the MEDIAN. Though this is more advanced than what we have covered so far try finding - what is the MEDIAN total_usd spent on all orders?

`SELECT *
FROM (SELECT total_amt_usd
      FROM orders
      ORDER BY total_amt_usd
      LIMIT 3457) AS Table1
ORDER BY total_amt_usd DESC
LIMIT 2;`

Then average the above two values returned. The example above demonstrates the use of a sub query.


## `GROUP BY`

You can aggregate data into groups based on a unique value (eg. sales id). Key takeaways:

1. GROUP BY can be used to aggregate data within subsets of the data. For example, grouping for different accounts, different regions, or different sales representatives.

2. Any column in the SELECT statement that is not within an aggregator must be in the GROUP BY clause.

3. The GROUP BY always goes between WHERE and ORDER BY.

4. ORDER BY works like SORT in spreadsheet software.

5. GROUP BY is always placed before the LIMIT clause.


*Examples:*

<img src="../SQL/ERD DAND.jpg" width="600" height="400">


Which account (by name) placed the earliest order? Your solution should have the account name and the date of the order.

`SELECT a.name, o.occurred_at
FROM accounts a
JOIN orders o
ON a.id = o.account_id
ORDER BY occurred_at
LIMIT 1;`


Find the total sales in usd for each account. You should include two columns - the total sales for each company's orders in usd and the company name.

`SELECT a.name, o.occurred_at
FROM accounts a
JOIN orders o
ON a.id = o.account_id
ORDER BY occurred_at
LIMIT 1;`


Via what channel did the most recent (latest) web_event occur, which account was associated with this web_event? Your query should return only three values - the date, channel, and account name.

`SELECT w.occurred_at, w.channel, a.name
FROM web_events w
JOIN accounts a
ON w.account_id = a.id 
ORDER BY w.occurred_at DESC
LIMIT 1;`


Find the total number of times each type of channel from the web_events was used. Your final table should have two columns - the channel and the number of times the channel was used.

`SELECT w.channel, COUNT(*)
FROM web_events w
GROUP BY w.channel`


Who was the primary contact associated with the earliest web_event? 

`SELECT a.primary_poc
FROM web_events w
JOIN accounts a
ON a.id = w.account_id
ORDER BY w.occurred_at
LIMIT 1;`


What was the smallest order placed by each account in terms of total usd. Provide only two columns - the account name and the total usd. Order from smallest dollar amounts to largest.

`SELECT a.name, MIN(total_amt_usd) smallest_order
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY smallest_order;`


Find the number of sales reps in each region. Your final table should have two columns - the region and the number of sales_reps. Order from fewest reps to most reps.

`SELECT r.name, COUNT(*) num_reps
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
GROUP BY r.name
ORDER BY num_reps;`



## `GROUP BY` and `ORDER BY` with multiple columns

Key takeaways:

- You can `GROUP BY` multiple columns at once, as we showed here. This is often useful to aggregate across a number of different segments. 

- The **order of columns listed in the `ORDER BY` clause does make a difference.** You are ordering the columns from left to right.

- The **order of column names in your `GROUP BY` clause doesn’t matter** — the results will be the same regardless. If we run the same query and reverse the order in the GROUP BY clause, you can see we get the same results.

- As with `ORDER BY`, you can substitute numbers for column names in the `GROUP BY` clause. It’s generally recommended to do this only when you’re grouping many columns, or if something else is causing the text in the `GROUP BY` clause to be excessively long.

- **A reminder here that any column that is not within an aggregation must show up in your `GROUP BY` statement.** If you forget, you will likely get an error. However, in the off chance that your query does work, you might not like the results!


*Examples:*

<img src="../SQL/ERD DAND.jpg" width="600" height="400">

For each account, determine the average amount of each type of paper they purchased across their orders. Your result should have four columns - one for the account name and one for the average quantity purchased for each of the paper types for each account. 

`SELECT a.name, AVG(o.standard_qty) mean_standard_qty, AVG(o.poster_qty) mean_poster_qty, AVG(o.gloss_qty) mean_glossy_qty
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.name`


For each account, determine the average amount spent per order on each paper type. Your result should have four columns - one for the account name and one for the average amount spent on each paper type.

`SELECT a.name, AVG(o.standard_amt_usd) mean_standard_amt, AVG(o.poster_amt_usd) mean_poster_amt, AVG(o.gloss_amt_usd) mean_glossy_amt
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.name`


Determine the number of times a particular channel was used in the web_events table for each sales rep. Your final table should have three columns - the name of the sales rep, the channel, and the number of occurrences. Order your table with the highest number of occurrences first.

`SELECT s.name, w.channel, COUNT(*) num_events
FROM web_events w
JOIN accounts a
ON w.account_id = a.id
JOIN sales_reps s
ON s.id = a.sales_rep_id
GROUP BY s.name, w.channel
ORDER BY num_events DESC`


Determine the number of times a particular channel was used in the web_events table for each region. Your final table should have three columns - the region name, the channel, and the number of occurrences. Order your table with the highest number of occurrences first.

`SELECT r.name, w.channel, COUNT(*) num_events
FROM web_events w
JOIN accounts a
ON w.account_id = a.id
JOIN sales_reps s
ON s.id = a.sales_rep_id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name, w.channel
ORDER BY num_events DESC`

