# SQL Aggregation
Combining data in columns


Null type - empty cells in the column can happen and when it does due to deleted or missing entry will be NULL.
    The only other time that you have empty cells for null is when doing a LEFT or RIGHT JOIN.

## Count function

Will find the count of non-NULL rows in a table or column

```sql
SELECT COUNT(*)
FROM accounts;
```

```sql
SELECT COUNT(accounts.id)
FROM accounts;
```

Can count cells of rows with non number cells due to it just looking for anything other than null

## SUM function

Used to add the rows in a column

Questions.
1. Find the total amount of poster_qty paper ordered in the orders table.
```sql
SELECT COUNT(o.poster_qty) poster_qty_total
FROM orders o;
```
2. Find the total amount of standard_qty paper ordered in the orders table.
```sql
SELECT COUNT(o.standard_qty) standard_qty_total
FROM orders o;
```
3. Find the total dollar amount of sales using the total_amt_usd in the orders table.
```sql
SELECT COUNT(o.total_amt_usd) total
FROM orders o
```
4. Find the total amount spent on standard_amt_usd and gloss_amt_usd paper for each order in the orders table. This should give a dollar amount for each order in the table.
```sql
SELECT Co.standard_amt_usd + o.gloss_amt_usd total_standard_gloss
FROM orders o
```
5. Find the standard_amt_usd per unit of standard_qty paper. Your solution should use both an aggregation and a mathematical operator.
```sql
SELECT COUNT(o.standard_amt_usd)/COUNT(standard_qty) standard_per_unit
FROM orders o
```


## MIN and MAX functions

These functions are used to give the smallest and largest number from the column rows.

```sql
SELECT MIN(standard_qty) standard_min,
MAX(standard_qty) standard_max
FROM orders
```


## AVG function

Will return the average of the rows in a column must be numerical column.

Questions:

1. When was the earliest order ever placed? You only need to return the date.
```sql
SELECT MIN(occurred_at) earliest_order
FROM orders;
```
2. Try performing the same query as in question 1 without using an aggregation function.
```sql
SELECT occurred_at earliest_order
FROM orders
ORDER BY occurred_at
LIMIT 1;
```
3. When did the most recent (latest) web_event occur?
```sql
SELECT MAX(occurred_at) latest_event
FROM web_events;
```
4. Try to perform the result of the previous query without using an aggregation function.
```sql
SELECT occurred_at latest_event
FROM web_events
ORDER BY occurred_at DESC
LIMIT 1;
```
5. Find the mean (AVERAGE) amount spent per order on each paper type, as well as the mean amount of each paper type purchased per order.
```sql
SELECT AVG(standard_amt_usd) mean_standard_usd,
AVG(gloss_amt_usd) mean_gloss_usd,
AVG(poster_amt_usd) mean_poster_usd,
AVG(standard_qty) mean_standard,
AVG(gloss_qty) mean_gloss,
AVG(poster_qty) mean_poster
FROM orders
```
6. what is the MEDIAN total_usd spent on all orders?
```sql
SELECT *
FROM (SELECT total_amt_usd
      FROM orders
      ORDER BY total_amt_usd
      LIMIT 3457) AS Table1
ORDER BY total_amt_usd DESC
LIMIT 2;
```

## GROUP BY clause

Goes between the WHERE and the ORDER clause

Will take the aggregated data and spread it against another non aggregated column

Below we have the account id column used as the spread, we get out the ids squished together so there is only one of each then the sums of the standard, gloss, and poster are put in the following columns that are related to that account id.

Example:
```sql
SELECT account_id,
SUM(standard_qty) standard_sum,
SUM(gloss_qty) gloss_sum,
SUM(poster_qty) poster_sum
FROM orders
GROUP BY account_id
ORDER BY account_id;
```
- GROUP BY can be used to aggregate data within subsets of the data. For example, grouping for different accounts, different regions, or different sales representatives.
- Any column in the SELECT statement that is not within an aggregator must be in the GROUP BY clause.
- The GROUP BY always goes between WHERE and ORDER BY.
- ORDER BY works like SORT in spreadsheet software.

Test Questions:
1. Which account (by name) placed the earliest order? Your solution should have the account name and the date of the order.
```sql
SELECT act.name account, MIN(o.occurred_at) earliest_order
FROM orders o
JOIN accounts act
ON act.id = o.account_id
GROUP BY act.name
ORDER BY earliest_order
LIMIT 1;
```
2. Find the total sales in usd for each account. You should include two columns - the total sales for each company's orders in usd and the company name.
```sql
SELECT act.name company_name, SUM(total_amt_usd) total_usd_sales
FROM orders o
JOIN accounts act
ON act.id = o.account_id
GROUP BY act.name
ORDER BY company_name;
```
3. Via what channel did the most recent (latest) web_event occur, which account was associated with this web_event? Your query should return only three values - the date, channel, and account name.
```sql
SELECT MAX(we.occurred_at) latest_event, we.channel, act.name
FROM web_events we
JOIN accounts act ON we.account_id = act.id
GROUP BY we.channel, act.name
ORDER BY latest_event DESC
LIMIT 1;
```
4. Find the total number of times each type of channel from the web_events was used. Your final table should have two columns - the channel and the number of times the channel was used.
```sql
SELECT COUNT(we.channel) channel_use_count, we.channel
FROM web_events we
WHERE we.channel IS NOT NULL
GROUP BY we.channel
ORDER BY channel_use_count DESC;
```
5. Who was the primary contact associated with the earliest web_event?
```sql
SELECT MIN(we.occurred_at) earliest_event, act.primary_poc
FROM web_events we
JOIN accounts act
ON account_id = act.id
GROUP BY act.primary_poc
ORDER BY earliest_event
LIMIT 1;
```
6. What was the smallest order placed by each account in terms of total usd. Provide only two columns - the account name and the total usd. Order from smallest dollar amounts to largest.
```sql
SELECT act.name, SUM(o.total_amt_usd) total_amt_usd
FROM orders o
JOIN accounts act ON o.account_id = act.id
GROUP BY act.name
ORDER BY total_amt_usd
LIMIT 1;
```
7. Find the number of sales reps in each region. Your final table should have two columns - the region and the number of sales_reps. Order from fewest reps to most reps.
```sql
SELECT COUNT(sr.name) number_of_sales_rep, r.name region_name
FROM sales_reps sr
JOIN region r ON sr.region_id = r.id
GROUP BY r.name;
```

Second Set of Questions:
1. For each account, determine the average amount of each type of paper they purchased across their orders. Your result should have four columns - one for the account name and one for the average quantity purchased for each of the paper types for each account.
```sql
SELECT act.name account,
ROUND(AVG(o.standard_qty), 2) std_qty_avg,
ROUND(AVG(o.gloss_qty), 2) gloss_qty_avg,
ROUND(AVG(o.poster_qty), 2) poster_qty_avg
FROM orders o
JOIN accounts act ON act.id = o.account_id
GROUP BY act.name
ORDER BY act.name;
```
2. For each account, determine the average amount spent per order on each paper type. Your result should have four columns - one for the account name and one for the average amount spent on each paper type.
```sql
SELECT act.name account,
ROUND(AVG(o.standard_amt_usd), 2) std_usd_avg,
ROUND(AVG(o.gloss_amt_usd), 2) gloss_usd_avg,
ROUND(AVG(o.poster_amt_usd), 2) poster_usd_avg
FROM orders o
JOIN accounts act ON act.id = o.account_id
GROUP BY act.name
ORDER BY act.name;
```
3. Determine the number of times a particular channel was used in the web_events table for each sales rep. Your final table should have three columns - the name of the sales rep, the channel, and the number of occurrences. Order your table with the highest number of occurrences first.
```sql
SELECT sr.name rep, we.channel,
COUNT(act.id) order_count
FROM accounts act
JOIN web_events we ON act.id = we.account_id
JOIN sales_reps sr ON act.sales_rep_id = sr.id
GROUP BY sr.name, we.channel
ORDER BY order_count DESC;
```
4. Determine the number of times a particular channel was used in the web_events table for each region. Your final table should have three columns - the region name, the channel, and the number of occurrences. Order your table with the highest number of occurrences first.
```sql
SELECT r.name region, we.channel,
COUNT(act.id) order_count
FROM accounts act
JOIN web_events we ON act.id = we.account_id
JOIN sales_reps sr ON act.sales_rep_id = sr.id
JOIN region r ON sr.region_id = r.id
GROUP BY r.name, we.channel
ORDER BY order_count DESC;
```


## DISTINCT statement
Used in the SELECT statement, will give the unique rows for all columns within the select statement.

Correct use:
```sql
SELECT DISTINCT column1, column2, column3
FROM table1;
```
Incorrect use:
```sql
SELECT DISTINCT column1, DISTINCT column2, DISTINCT column3
FROM table1;
```

TEST Questions:
1. Use DISTINCT to test if there are any accounts associated with more than one region.
```sql
SELECT DISTINCT a.name account_name, r.name region
FROM accounts a
JOIN sales_reps s ON a.sales_rep_id = s.id
JOIN region r ON s.region_id = r.id
ORDER BY account_name;
```
2. Have any sales reps worked on more than one account? YES
```sql
SELECT DISTINCT s.name rep, a.name account_name
FROM accounts a
JOIN sales_reps s ON a.sales_rep_id = s.id
ORDER BY rep;
```



## HAVING CLAUSE
Like while clause but works with aggregated data.  
Example: 
```sql
SELECT account_id,
SUM(total_amt_usd) sum_total_amt_usd
FROM orders
GROUP BY 1
HAVING SUM(total_amt_usd) >= 250000;
```
Test Questions:
1. How many of the sales reps have more than 5 accounts that they manage? 34
```sql
SELECT COUNT(id) order_count, sales_rep_id rep_id
FROM accounts
GROUP BY rep_id
HAVING COUNT(id) > 5
ORDER BY rep_id;
```
2. How many accounts have more than 20 orders? 120
```sql
SELECT COUNT(id) order_count, account_id account
FROM orders
GROUP BY account
HAVING COUNT(id) > 20
ORDER BY account
```
3. Which account has the most orders? 3411
```sql
SELECT COUNT(id) order_count, account_id account
FROM orders
GROUP BY account
HAVING COUNT(id) > 20
ORDER BY order_count DESC;
```
4. Which accounts spent more than 30,000 usd total across all orders?
```sql
SELECT account_id, SUM(total_amt_usd) sum_total_amt_usd
FROM orders
GROUP BY account_id
HAVING SUM(total_amt_usd) > 30000
ORDER BY sum_total_amt_usd DESC;
```
5. Which accounts spent less than 1,000 usd total across all orders?
```sql
SELECT account_id, SUM(total_amt_usd) sum_total_amt_usd
FROM orders
GROUP BY account_id
HAVING SUM(total_amt_usd) < 1000
ORDER BY sum_total_amt_usd DESC;
```
6. Which account has spent the most with us? 4211
```sql
SELECT account_id, SUM(total_amt_usd) sum_total_amt_usd
FROM orders
GROUP BY account_id
ORDER BY sum_total_amt_usd DESC
LIMIT 1;
```
7. Which account has spent the least with us? 1901
```sql
SELECT account_id, SUM(total_amt_usd) sum_total_amt_usd
FROM orders
GROUP BY account_id
ORDER BY sum_total_amt_usd DESC
LIMIT 1;
```
8. Which accounts used facebook as a channel to contact customers more than 6 times?
```sql
SELECT a.id, a.name, w.channel, COUNT(*) channel_count
FROM accounts a
JOIN web_events w ON a.id = w.account_id
GROUP BY a.id, a.name, w.channel
HAVING COUNT(*) > 6 AND w.channel = 'facebook'
ORDER BY channel_count;
```
9.  Which account used facebook most as a channel?
```sql
SELECT a.id, a.name, w.channel, COUNT(*) channel_count
FROM accounts a
JOIN web_events w ON a.id = w.account_id
GROUP BY a.id, a.name, w.channel
HAVING COUNT(*) > 6 AND w.channel = 'facebook'
ORDER BY channel_count DESC
LIMIT 1;
```
10. Which channel was most frequently used by most accounts? direct
```sql
SELECT channel, COUNT(account_id) AS num_accounts
FROM web_events
GROUP BY channel
ORDER BY num_accounts DESC
```

## DATE functions
Allows the easy of ordering and grouping dates.

- **DATE_TRUNK** - function that will truncate your date to a particular part of your date-time column. Common trunctions are, *day*, *month*, and *year*
- **DATE_PART** - Pulls a certain part of the date, like just day, just month, but looses the rest of the date for sorting, so you can get all the months but don't know what year to sort by.

dow - used for Day of the Week, sun-sat.

Test questions:
1. Find the sales in terms of total dollars for all orders in each year, ordered from greatest to least. Do you notice any trends in the yearly sales totals?
```sql
SELECT DATE_PART('year', occurred_at) purchace_year, SUM(total_amt_usd) year_total
FROM orders
GROUP BY purchace_year
ORDER BY year_total DESC;
```
2. Which month did Parch & Posey have the greatest sales in terms of total dollars? Are all months evenly represented by the dataset?
```sql
SELECT DATE_PART('month', occurred_at) purchace_month, SUM(total_amt_usd) month_total
FROM orders
GROUP BY purchace_month
ORDER BY month_total DESC;
```
3. Which year did Parch & Posey have the greatest sales in terms of total number of orders? Are all years evenly represented by the dataset?
```sql
SELECT DATE_PART('year', occurred_at) purchace_year, COUNT(*) year_order_total
FROM orders
GROUP BY purchace_year
ORDER BY year_order_total DESC;
```
4. Which month did Parch & Posey have the greatest sales in terms of total number of orders? Are all months evenly represented by the dataset?
```sql
SELECT DATE_PART('month', occurred_at) purchace_month, COUNT(*) month_order_total
FROM orders
GROUP BY purchace_month
ORDER BY month_order_total DESC;
```
5. In which month of which year did Walmart spend the most on gloss paper in terms of dollars?
```sql
SELECT DATE_TRUNC('month', o.occurred_at) ord_date, SUM(o.gloss_amt_usd) tot_spent
FROM orders o 
JOIN accounts a
ON a.id = o.account_id
WHERE a.name = 'Walmart'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;
```


## CASE statement
Like a if statement to make a new column goes in the SELECT statement  
Used with the WHEN statement, Else is optional  
After the WHEN a THEN is used to return  
Must include WHEN, THEN, and END

Example: 
```sql
SELECT account_id, CASE WHEN standard_qty = 0 OR standard_qty IS NULL THEN 0
                        ELSE standard_amt_usd/standard_qty END AS unit_price
FROM orders
LIMIT 10;
```
Questions:
1. Write a query to display for each order, the account ID, total amount of the order, and the level of the order - ‘Large’ or ’Small’ - depending on if the order is $3000 or more, or smaller than $3000.
```sql
SELECT account_id, total_amt_usd,
CASE WHEN total_amt_usd >= 3000 THEN 'Large'
    ELSE 'Small' END AS level
FROM orders;
```
2. Write a query to display the number of orders in each of three categories, based on the total number of items in each order. The three categories are: 'At Least 2000', 'Between 1000 and 2000' and 'Less than 1000'.
```sql
SELECT CASE WHEN total >= 2000 THEN 'At Least 2000'
WHEN total < 2000 AND total >= 1000 THEN 'Between 1000 and 2000'
ELSE 'Less than 1000' END AS total_lvl, COUNT(*) as total_count
FROM orders
GROUP BY total_count;
```
3. We would like to understand 3 different levels of customers based on the amount associated with their purchases. The top level includes anyone with a Lifetime Value (total sales of all orders) greater than 200,000 usd. The second level is between 200,000 and 100,000 usd. The lowest level is anyone under 100,000 usd. Provide a table that includes the level associated with each account. You should provide the account name, the total sales of all orders for the customer, and the level. Order with the top spending customers listed first.
```sql
SELECT a.name account, SUM(o.total_amt_usd) total_sales,
CASE WHEN SUM(o.total_amt_usd) >= 200000 THEN 'High'
When SUM(o.total_amt_usd) < 200000 AND SUM(o.total_amt_usd) > 100000 THEN 'Med'
ELSE 'Low' END AS lifetime_level
FROM orders o
JOIN accounts a ON o.account_id = a.id
GROUP BY account
ORDER BY total_sales DESC;
```
4. We would now like to perform a similar calculation to the first, but we want to obtain the total amount spent by customers only in 2016 and 2017. Keep the same levels as in the previous question. Order with the top spending customers listed first.
```sql
SELECT account_id, total_amt_usd,
CASE WHEN total_amt_usd >= 3000 THEN 'Large'
    ELSE 'Small' END AS level
FROM orders
WHERE DATE_PART('year', occurred_at) = 2016 AND DATE_PART('year', occurred_at) = 2016;
```
5. We would like to identify top performing sales reps, which are sales reps associated with more than 200 orders. Create a table with the sales rep name, the total number of orders, and a column with top or not depending on if they have more than 200 orders. Place the top sales people first in your final table.
```sql
SELECT s.name rep_name, COUNT(o.*) total_orders,
CASE WHEN COUNT(o.*) > 200 THEN 'yes'
ELSE 'no' END AS top
FROM orders o
JOIN accounts a ON o.account_id = a.id
JOIN sales_reps s ON a.sales_rep_id = s.id
GROUP BY rep_name
ORDER BY total_orders DESC;
```
6. The previous didn't account for the middle, nor the dollar amount associated with the sales. Management decides they want to see these characteristics represented as well. We would like to identify top performing sales reps, which are sales reps associated with more than 200 orders or more than 750000 in total sales. The middle group has any rep with more than 150 orders or 500000 in sales. Create a table with the sales rep name, the total number of orders, total sales across all orders, and a column with top, middle, or low depending on this criteria. Place the top sales people based on dollar amount of sales first in your final table. You might see a few upset sales people by this criteria!
```sql
SELECT s.name rep_name, COUNT(o.*) total_orders, SUM(o.total_amt_usd) total_sales_usd,
CASE WHEN COUNT(o.*) > 200 OR SUM(o.total_amt_usd) > 750000 THEN 'TOP'
WHEN COUNT(o.*) <= 200 AND COUNT(o.*) > 150 OR SUM(o.total_amt_usd) <= 750000 AND SUM(o.total_amt_usd) > 500000 THEN 'MID'
ELSE 'LOW' END AS top
FROM orders o
JOIN accounts a ON o.account_id = a.id
JOIN sales_reps s ON a.sales_rep_id = s.id
GROUP BY rep_name
ORDER BY total_sales_usd DESC;
```





In [None]:
SELECT s.name rep_name, COUNT(o.*) total_orders, SUM(o.total_amt_usd) total_sales_usd,
CASE WHEN COUNT(o.*) > 200 OR SUM(o.total_amt_usd) > 750000 THEN 'TOP'
WHEN COUNT(o.*) <= 200 AND COUNT(o.*) > 150 OR SUM(o.total_amt_usd) <= 750000 AND SUM(o.total_amt_usd) > 500000 THEN 'MID'
ELSE 'LOW' END AS top
FROM orders o
JOIN accounts a ON o.account_id = a.id
JOIN sales_reps s ON a.sales_rep_id = s.id
GROUP BY rep_name
ORDER BY total_sales_usd DESC;