# Lesson 10 - SQL Subqueries and Temporary Tables

## Subqueries or Inner queries - `SUB`

This allows to create a table from existing columns, and then query the derived table. The steps to do this are:
1. The original query goes in the `FROM` statement.
2. An `*` is used in the `SELECT` statement to pull all of the data from the original query.
3. You MUST use an alias for the table you nest within the outer query.


<img src="../SQL/ERD DAND.jpg" width="600" height="400">

For example, if you wanted to find  the average number of events for each day for each channel. The first table will provide us the number of events for each day and channel, and then we will need to average these values together using a second query.

`SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
             channel, COUNT(*) as events
      FROM web_events 
      GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;`

Also pay attention to the formatting of the query above for readability. Put intendentations where they pertain to the outer query or inner/subquery.



## Subqueries that return a single value

Where you're looking to return a single result from a logical statement like `WHERE`, `HAVING`, or even `SELECT` - the value could be nested within a `CASE` statement. **Note that you should not include an alias when you write a subquery in a conditional statement.** This is because the subquery is tereated as an individual value (or set of values in the `IN`) case rather than as a table. 

If you return an entire column from your subquery, you need to use `IN` to perform a logical argument. If you returned an entire table, then you must use an **ALIAS** for the table, and perform additional logic on the entire table.

_**Example:**_

The average amount of standard paper sold on the first month that any order was placed in the orders table (in terms of quantity).   Answer: 268
    
The average amount of gloss paper sold on the first month that any order was placed in the orders table (in terms of quantity).  Answer: 209
 
The average amount of poster paper sold on the first month that any order was placed in the orders table (in terms of quantity).  Answer: 112

The total amount spent on all orders on the first month that any order was placed in the orders table (in terms of usd).
Answer: 377331

`SELECT AVG(standard_qty) mean_standard, AVG(gloss_qty) mean_gloss, AVG(poster_qty) mean_poster, SUM(total_amt_usd) sum_total_spent
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
	(SELECT DATE_TRUNC('month', MIN(occurred_at)) AS 	min_month
	FROM orders)`




## Compex subqueries examples (Video 10 - Subquery Mania):



<img src="../SQL/ERD DAND.jpg" width="600" height="400">


**Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.**

`SELECT t3.rep_name, t3.region_name, t3.total_amt
FROM(SELECT region_name, MAX(total_amt) total_amt
     FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
             FROM sales_reps s
             JOIN accounts a
             ON a.sales_rep_id = s.id
             JOIN orders o
             ON o.account_id = a.id
             JOIN region r
             ON r.id = s.region_id
             GROUP BY 1, 2) t1
     GROUP BY 1) t2
JOIN (SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
     FROM sales_reps s
     JOIN accounts a
     ON a.sales_rep_id = s.id
     JOIN orders o
     ON o.account_id = a.id
     JOIN region r
     ON r.id = s.region_id
     GROUP BY 1,2
     ORDER BY 3 DESC) t3
ON t3.region_name = t2.region_name AND t3.total_amt = t2.total_amt;`



**For the region with the largest (sum) of sales total_amt_usd, how many total (count) orders were placed?**

Answer: 2357

`SELECT r.name, COUNT(o.total) total_orders
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name
HAVING SUM(o.total_amt_usd) = (
      SELECT MAX(total_amt)
      FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
              FROM sales_reps s
              JOIN accounts a
              ON a.sales_rep_id = s.id
              JOIN orders o
              ON o.account_id = a.id
              JOIN region r
              ON r.id = s.region_id
              GROUP BY r.name) sub);`
              
**For the name of the account that purchased the most (in total over their lifetime as a customer) standard_qty paper, how many accounts still had more in total purchases?**

Answer: 3

`SELECT COUNT(*)
FROM (SELECT a.name
      FROM orders o
      JOIN accounts a
      ON a.id = o.account_id
      GROUP BY 1
      HAVING SUM(o.total) > (SELECT total 
                  FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
                        FROM accounts a
                        JOIN orders o
                        ON o.account_id = a.id
                        GROUP BY 1
                        ORDER BY 2 DESC
                        LIMIT 1) inner_tab)
            ) counter_tab;`

**For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?**

`SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id =  (SELECT id
                     FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
                           FROM orders o
                           JOIN accounts a
                           ON a.id = o.account_id
                           GROUP BY a.id, a.name
                           ORDER BY 3 DESC
                           LIMIT 1) inner_table)
GROUP BY 1, 2
ORDER BY 3 DESC;`

**What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?**

`SELECT AVG(tot_spent)
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
      FROM orders o
      JOIN accounts a
      ON a.id = o.account_id
      GROUP BY a.id, a.name
      ORDER BY 3 DESC
       LIMIT 10) temp;`


**What is the lifetime average amount spent in terms of total_amt_usd for only the companies that spent more than the average of all orders.**

`SELECT AVG(avg_amt)
FROM (SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
    FROM orders o
    GROUP BY 1
    HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
                               FROM orders o
                               JOIN accounts a
                               ON a.id = o.account_id)) temp_table;`

## `WITH`

The `WITH` statement is often called a **Common Table Expression** or **CTE**. Though these expressions serve the exact same purpose as subqueries, they are more common in practice, as they tend to be cleaner for a future reader to follow the logic.

These are effectively like writing functions in programming, and calling them when they need to be run. Instead of a function name, you just use an `ALIAS` name.

**Comparison of subquery vs `WITH`:**

*Subquery:*

    SELECT channel, AVG(events) AS average_events
    FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                 channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2) sub
    GROUP BY channel
    ORDER BY 2 DESC;


*`WITH:`*

    WITH events AS (SELECT DATE_TRUNC('day',occurred_at) AS day, 
                            channel, COUNT(*) as events
              FROM web_events 
              GROUP BY 1,2)

    SELECT channel, AVG(events) AS average_events
    FROM events
    GROUP BY channel
    ORDER BY 2 DESC;
    


## Creating multiple `WITH` tables

When having multiple `WITH` tables, you don't list WITH on each one. Only the first table. 

    WITH table1 AS (
              SELECT *
              FROM web_events),

         table2 AS (
              SELECT *
              FROM accounts)


`SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;`


***Subquery Mania Examples re-written with `WITH`:***


**Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.**

`WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;`



**For the region with the largest sales total_amt_usd, how many total orders were placed?**

`WITH t1 AS (
    SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
        FROM sales_reps s
        JOIN accounts a
        ON a.sales_rep_id = s.id
        JOIN orders o
        ON o.account_id = a.id
        JOIN region r
        ON r.id = s.region_id
        GROUP BY 1
        ORDER BY 2 DESC
        LIMIT 1 )

SELECT r.name, SUM(o.total)
FROM sales_reps s
JOIN region r
ON r.id = s.region_id
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
HAVING r.name = (SELECT region_name FROM t1)`



**For the name of the account that purchased the most (in total over their lifetime as a customer) standard_qty paper, how many accounts still had more in total purchases? **

`WITH t1 AS (
  SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
  FROM accounts a
  JOIN orders o
  ON o.account_id = a.id
  GROUP BY 1
  ORDER BY 2 DESC
  LIMIT 1), 
t2 AS (
  SELECT a.name
  FROM orders o
  JOIN accounts a
  ON a.id = o.account_id
  GROUP BY 1
  HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;`


**For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?**


`WITH t1 AS (
    SELECT a.name, SUM(o.total_amt_usd)
    FROM accounts a
    JOIN orders o
    ON o.account_id = a.id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1 )
SELECT a.name, w.channel, COUNT(*)
FROM web_events w
JOIN accounts a
ON a.id = w.account_id
WHERE a.name = (SELECT t1.name FROM t1)
GROUP BY 1,2
ORDER BY 3 DESC;`


**What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?**

`WITH t1 AS (
    SELECT a.name, SUM(o.total_amt_usd) total_spend
    FROM accounts a
    JOIN orders o
    ON o.account_id = a.id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 10 )
SELECT AVG(total_spend) FROM t1`


**What is the lifetime average amount spent in terms of total_amt_usd for only the companies that spent more than the average of all accounts.**


`WITH t1 AS (
   SELECT AVG(o.total_amt_usd) avg_all
   FROM orders o
   JOIN accounts a
   ON a.id = o.account_id),
t2 AS (
   SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
   FROM orders o
   GROUP BY 1
   HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;`