<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#First-Subquery" data-toc-modified-id="First-Subquery-1">First Subquery</a></span><ul class="toc-item"><li><span><a href="#Subquery-that-returns-a-table" data-toc-modified-id="Subquery-that-returns-a-table-1.1">Subquery that returns a table</a></span></li><li><span><a href="#Subquery--that-returns-a-single-value-or-set-of-values-(or-a-column)" data-toc-modified-id="Subquery--that-returns-a-single-value-or-set-of-values-(or-a-column)-1.2">Subquery  that returns a single value or set of values (or a column)</a></span></li></ul></li><li><span><a href="#More-on-Subqueries" data-toc-modified-id="More-on-Subqueries-2">More on Subqueries</a></span><ul class="toc-item"><li><span><a href="#Subquey-Mania---Quiz" data-toc-modified-id="Subquey-Mania---Quiz-2.1">Subquey Mania - Quiz</a></span></li></ul></li><li><span><a href="#WITH-for-CTE" data-toc-modified-id="WITH-for-CTE-3"><code>WITH</code> for CTE</a></span><ul class="toc-item"><li><span><a href="#Subquery-Mania-using-WITH---Quiz" data-toc-modified-id="Subquery-Mania-using-WITH---Quiz-3.1">Subquery Mania using <code>WITH</code> - Quiz</a></span></li><li><span><a href="#Subqueries-and-CTE-are-widely-used" data-toc-modified-id="Subqueries-and-CTE-are-widely-used-3.2">Subqueries and CTE are widely used</a></span></li></ul></li></ul></div>

Both subqueries and table expressions are methods for being able to write a query that creates a table, and then write a query that interacts with this newly created table. 


# First Subquery


## Subquery that returns a table

1. Write a subquery to find the number of events that occur for each day for each channel

2. Then find the average number of events for each channel. Since you broke out by day in the last step, that is giving you an average per day.

```SQL
SELECT channel, AVG(num_events)
FROM (SELECT DATE_TRUNC('day', occurred_at) AS day,
             channel,
             COUNT(*) num_events
      FROM web_events
      GROUP BY 1, 2
      ORDER BY 3 DESC) AS sub
GROUP BY channel
```

## Subquery  that returns a single value or set of values (or a column)

Subqueries can be used everywhere and, in addition to returning a table, subqueries can return a single value or a set of values of an entire column.

```SQL
/**Return the orders that occured at the same month of the first order**/
SELECT * 
FROM orders
WHERE DATE_TRUNC('month', occurred_at) = 
/*The month of the earliest order */
(SELECT DATE_TRUNC('month', MIN(occurred_at)) min_month FROM orders)
ORDER BY occurred_at
```
>- Note that you should not include an alias when you write a subquery in a conditional statement. This is because the subquery is treated as an individual value (or set of values in the `IN` case) rather than as a table.

>- If we are returning an entire table, then we must use an `ALIAS` for the table, and perform additional logic on the entire table.


# More on Subqueries

1. Use `DATE_TRUNC` to pull month level information about the first order ever placed in the `orders` table

2. Use the result of the previous query to find only the orders that took place in the same month and year as the first order, and then pull the average for each type of paper `qty` in this month

```SQL
SELECT AVG(standard_qty) standard, 
	   AVG(gloss_qty) gloss, 
       AVG(poster_qty) poster
FROM orders
WHERE DATE_TRUNC('month', occurred_at) = (
    SELECT DATE_TRUNC('month', MIN(occurred_at)) min_month 
    FROM orders)
```

## Subquey Mania - Quiz

1. Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.

```SQL
SELECT t3.rep, t2.region, t2.max_total_sum
FROM
 /*Pull the max for each region, and then we can use this to pull  those rows in our final result.*/
(SELECT region, MAX(total_sum) max_total_sum
 FROM
 /*find the total_amt_usd totals associated with each sales rep of each region*/
 (SELECT s.name rep, 
	     r.name region, 
         SUM(o.total_amt_usd) total_sum
  FROM orders o
  JOIN accounts a ON a.id = o.account_id
  JOIN sales_reps s ON s.id = a.sales_rep_id
  JOIN region r ON r.id = s.region_id
  GROUP BY 2, 1) t1
  GROUP BY 1) t2
/* Essentially, this is a JOIN of the two tables, where the region and amount match.*/
JOIN
(SELECT s.name rep, r.name region, SUM(total_amt_usd) total_amt
 FROM orders o
 JOIN accounts a ON o.account_id = a.id
 JOIN sales_reps s ON a.sales_rep_id = s.id
 JOIN region r ON s.region_id = r.id
 GROUP BY s.name, r.name) t3
ON t2.region = t3.region AND t2.max_total_sum = t3.total_amt
ORDER BY 3 DESC
```


2. For the region with the largest (sum) of sales total_amt_usd, how many total (count) orders were placed?

```SQL
SELECT r.name region, COUNT(*) total_orders
FROM orders o
JOIN accounts a ON o.account_id = a.id
JOIN sales_reps s ON a.sales_rep_id = s.id
JOIN region r ON s.region_id = r.id
GROUP BY r.name
HAVING r.name = (/* the region with the largest sum of sales */
    SELECT region FROM
       (SELECT r.name region, sum(total_amt_usd)
        FROM orders o
        JOIN accounts a ON o.account_id = a.id
        JOIN sales_reps s ON a.sales_rep_id = s.id
        JOIN region r ON s.region_id = r.id
        GROUP BY region
        ORDER BY 2 DESC
        LIMIT 1) t1)
```

3. (a) How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?

```SQL
SELECT COUNT(*) 
FROM 
(SELECT a.name
 FROM orders o 
 JOIN accounts a ON o.account_id = a.id
 GROUP BY a.name
 HAVING SUM(o.total) > /*return the total purchases made by the account that has bought the most standard_qty paper: */
      (SELECT total_sum 
       FROM (SELECT a.name, 
                    SUM(o.standard_qty) total_std,
                    SUM(o.total) total_sum
             FROM orders o
             JOIN accounts a ON o.account_id = a.id
             GROUP BY a.name
             ORDER BY total_std DESC
             LIMIT 1) sub1 
      ) /*- no need for alias here, because this subquery only return a value, not a table */
) sub2 
```



3. (b) Classify accounts that have bought mostly standard_qty paper throughout their lifetime as a customer

```SQL
SELECT a.name, 
       CASE WHEN SUM(o.standard_qty) > SUM(o.gloss_qty) AND SUM(o.standard_qty) > SUM(o.poster_qty) THEN 'Yes' 
       ELSE 'No' END AS most_standard_ord
FROM orders o
JOIN accounts a ON o.account_id = a.id
GROUP BY 1
```


4. For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?

```SQL
/* My solution */
SELECT a.name, w.channel, COUNT(w.*)
FROM web_events w
JOIN accounts a 
ON w.account_id = a.id
GROUP BY 1, 2
HAVING a.name = ( 
    /*Return the account that spent the most*/
    SELECT name
    FROM ( SELECT a.name, SUM(o.total_amt_usd) total_sum
           FROM orders o
           JOIN accounts a 
           ON o.account_id = a.id
           GROUP BY 1
           ORDER BY 2 DESC
           LIMIT 1) sub
     )
ORDER BY 3 DESC


/* Solution from Udacity */
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id =  (SELECT id
                     FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
                           FROM orders o
                           JOIN accounts a
                           ON a.id = o.account_id
                           GROUP BY a.id, a.name
                           ORDER BY 3 DESC
                           LIMIT 1) inner_table)
GROUP BY 1, 2
ORDER BY 3 DESC;
```


5. What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?

```SQL
SELECT AVG(total_sum)
FROM (
  SELECT SUM(total_amt_usd) total_sum
  FROM orders
  GROUP BY account_id
  ORDER BY 1 DESC
  LIMIT 10) sub
```

6. What is the lifetime average amount spent in terms of total_amt_usd, including only the companies that spent more per order, on average, than the average of all orders.

```SQL
SELECT AVG(avg_total)
FROM
( SELECT account_id, AVG(total_amt_usd) avg_total
  FROM orders
  GROUP BY account_id
  HAVING AVG(total_amt_usd) > (SELECT AVG(total_amt_usd) FROM orders)
) sub
```

# `WITH` for CTE

The `WITH` statement is often called a **Common Table Expression** or **CTE**. 
- Essentially a `WITH` statement performs the same task as a Subquery. you just write any of the queries into a temp table using `WITH`. It can help make your query cleaner to read.

```SQL
WITH table1 AS (
          SELECT *
          FROM web_events),

     table2 AS (
          SELECT *
          FROM accounts)


SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;
```

## Subquery Mania using `WITH` - Quiz

1. Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.

```SQL

WITH 
t1 AS (SELECT s.name rep, 
              r.name region, 
              SUM(o.total_amt_usd) total_sum
       FROM orders o
       JOIN accounts a ON a.id = o.account_id
       JOIN sales_reps s ON s.id = a.sales_rep_id
       JOIN region r ON r.id = s.region_id
       GROUP BY 2, 1), 
  
t2 AS (SELECT region, MAX(total_sum) max_total_sum
       FROM t1
       GROUP BY 1)
         
SELECT t1.rep, t1.region, t1.total_sum
FROM t1
JOIN t2
ON t1.region = t2.region AND t1.total_sum = t2.max_total_sum
ORDER BY 3 DESC
```


2. For the region with the largest (sum) of sales total_amt_usd, how many total (count) orders were placed?

```SQL
WITH 
t1 AS (SELECT r.name region, sum(total_amt_usd) total_amt
       FROM orders o
       JOIN accounts a ON o.account_id = a.id
       JOIN sales_reps s ON a.sales_rep_id = s.id
       JOIN region r ON s.region_id = r.id
       GROUP BY region
       ORDER BY 2 DESC
       LIMIT 1),
       
t2 AS (SELECT MAX(total_amt) max_total
       FROM t1)
       
SELECT r.name region, COUNT(*) total_orders
FROM orders o
JOIN accounts a ON o.account_id = a.id
JOIN sales_reps s ON a.sales_rep_id = s.id
JOIN region r ON s.region_id = r.id
GROUP BY r.name
HAVING SUM(total_amt_usd) = (SELECT * FROM t2); /*Need to import the value from table t2, cannot use t2.max_total directly*/
```

3. (a) How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?

```SQL
WITH 
t1 AS (SELECT a.name, 
              SUM(o.standard_qty) total_std,
              SUM(o.total) total_sum
       FROM orders o
       JOIN accounts a ON o.account_id = a.id
       GROUP BY a.name
       ORDER BY total_std DESC
       LIMIT 1), 
       
t2 AS (SELECT a.name
       FROM orders o 
       JOIN accounts a ON o.account_id = a.id
       GROUP BY a.name
       HAVING SUM(o.total) > (SELECT total_sum FROM t1))
      
    
SELECT COUNT(*) 
FROM t2
```



3. (b) Classify accounts that have bought mostly standard_qty paper throughout their lifetime as a customer


```SQL
SELECT a.name, 
       CASE WHEN SUM(o.standard_qty) > SUM(o.gloss_qty) AND SUM(o.standard_qty) > SUM(o.poster_qty) THEN 'Yes' 
       ELSE 'No' END AS most_standard_ord
FROM orders o
JOIN accounts a ON o.account_id = a.id
GROUP BY 1
```


4. For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?

```SQL

/* My solution */
WITH 
t1 AS (SELECT a.name, SUM(o.total_amt_usd) total_sum
       FROM orders o
       JOIN accounts a 
       ON o.account_id = a.id
       GROUP BY 1
       ORDER BY 2 DESC
       LIMIT 1)
    
SELECT a.name, w.channel, COUNT(w.*)
FROM web_events w
JOIN accounts a 
ON w.account_id = a.id
GROUP BY 1, 2
HAVING a.name = ( SELECT name FROM t1)
ORDER BY 3 DESC


/* Solution from Udacity */
WITH
t1 AS (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
       FROM orders o
       JOIN accounts a
       ON a.id = o.account_id
       GROUP BY a.id, a.name
       ORDER BY 3 DESC
       LIMIT 1)
       
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id =  (SELECT id FROM t1)
GROUP BY 1, 2
ORDER BY 3 DESC;
```


5. What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?

```SQL
WITH 
temp AS (SELECT SUM(total_amt_usd) total_sum
         FROM orders
         GROUP BY account_id
         ORDER BY 1 DESC
         LIMIT 10)
         
SELECT AVG(total_sum)
FROM temp
```

6. What is the lifetime average amount spent in terms of total_amt_usd, including only the companies that spent more per order, on average, than the average of all orders.

```SQL

WITH
temp AS (SELECT account_id, AVG(total_amt_usd) avg_total
         FROM orders
         GROUP BY account_id
         HAVING AVG(total_amt_usd) > (SELECT AVG(total_amt_usd) FROM orders))
       
SELECT AVG(avg_total)
FROM temp
```

## Subqueries and CTE are widely used

Arguably, the advanced features of Subqueries and CTEs are the most widely used in an analytics role within a company. Being able to break a problem down into the necessary tables and finding a solution using the resulting table is very useful in practice.