## Subqueries

In [None]:
-- Start with the subquery, then aggregate
SELECT   
month_num,
AVG(differential) AS avg_differential
MIN(differential) AS most_differential
FROM (
    SELECT  
    month_num,     
    windchill - temperature AS differential
FROM weather 
WHERE season = 'Winter' AND     
temperature < 32
)
GROUP BY month_num;



SELECT    
todays_date, 
temperature,   
status 
FROM weather 
WHERE todays_date IN (  -- Filter by all days with home games that were won
                        SELECT     
                        game_date 
                        FROM game_schedule 
                        WHERE stadium = 'Home' AND did_win = TRUE
);

### Are jazz songs long?
Your manager came to you with a question; how does track length differ by genre? To answer this question, you'll want to use a subquery to join the track and genre tables. Good luck!

In [None]:
SELECT
	-- Find the genre name and average milliseconds
    genre_name,
    AVG(milliseconds) AS average_milliseconds
-- Retrieve records from the result of the subquery
FROM (
    SELECT
        genre.name AS genre_name,
        track.genre_id,
        track.milliseconds
    FROM store.track
    JOIN store.genre ON track.genre_id = genre.genre_id
)
-- Group the results by the genre name
GROUP BY genre_name;

### Identifying large transactions
When customers buy a lot, you want to know why! In this exercise, you'll practice using subqueries to retrieve details about transactions with more than 10 line items.

In [None]:
SELECT
    invoice_id,
    COUNT(invoice_id) AS total_invoice_lines
FROM store.invoiceline
GROUP BY invoice_id
-- Only pull records with more than 10 total invoice lines
HAVING total_invoice_lines > 10;

In [None]:
SELECT
	billing_country,
    SUM(total) AS total_invoice_amount
FROM store.invoice
WHERE invoice_id IN (
  SELECT
      invoice_id,
  FROM store.invoiceline
  GROUP BY invoice_id
  HAVING COUNT(invoice_id) > 10
)
GROUP BY billing_country;

## Common Table Expressions

In [None]:
WITH at_risk AS (
    SELECT     
    student_id  
    course_name,  
    teacher_name,   
    grade 
    FROM student_courses 
WHERE grade < 70 AND is_required 
)
SELECT  
students.student_name, 
at_risk.* 
FROM at_risk 
JOIN students ON at_risk.student_id = students.id;

In [None]:
-- Start with the subquery, then aggregate
SELECT   
month_num,
AVG(differential) AS avg_differential
MIN(differential) AS most_differential
FROM (
    SELECT  
    month_num,     
    windchill - temperature AS differential
FROM weather 
WHERE season = 'Winter' AND     
temperature < 32
)
GROUP BY month_num;

-- CTE
WITH daily_temperature_differential AS (
    SELECT        
    month_num,    
    windchill - temperature AS differential
    FROM weather 
    WHERE     
    season = 'Winter' AND       
    temperature <32
    )
    
SELECT   
    month_num,
    AVG(differential) AS avg_differential
    MIN(differential) AS most_differential 
FROM daily_temperature_differential 
GROUP BY month_num;

### Analyzing track length
Previously, you used a subquery to find the average length of songs, in milliseconds, for each genre. Now, you're going to do something similar with a common table expression, but this time, with a bit more attention to detail. Let's get to it!

In [None]:
-- Create a CTE named track_lengths
WITH track_lengths AS (
	SELECT
        genre.name,
        track.genre_id,
        track.milliseconds / 1000 AS num_seconds
    FROM store.track
    JOIN store.genre ON track.genre_id = genre.genre_id
)

SELECT
    track_lengths.name,
    -- Find the average length of each track in seconds
    AVG(track_lengths.num_seconds) AS avg_track_length
FROM track_lengths
GROUP BY track_lengths.name
-- Sort the results by average track_length
ORDER BY avg_track_length DESC;

### Finding the most efficient composer
Here's a fun one! You're chatting with your coworker, and you decide that you want to find the artist that (on average) prices their songs the most per second. To do this, you'll use the tracks table and a Common Table Expression.

In [None]:
-- Create a CTE called track_metrics, convert milliseconds to seconds
WITH track_metrics AS (
    SELECT 
        composer,
        milliseconds / 1000 AS num_seconds,
        unit_price
    FROM store.track
 	-- Retrieve records where composer is not NULL
    WHERE composer IS NOT NULL
)

SELECT
    composer,
    -- Find the average price-per-second
    AVG(unit_price / num_seconds) AS cost_per_second
    FROM track_metrics
GROUP BY composer
ORDER BY cost_per_second DESC;

## Advanced Common Table Expressions

### Building a detailed invoice
In the store schema, you have the invoice, invoiceline, and track table. However, these don't quite stand on their own. To get a better idea of customer behavior, we're going to "recreate" each of these invoices with a bit more detail. We'll do this using two common table expressions, which you'll build from the ground up.

In [None]:
-- Create the cleaned_invoices CTE
WITH cleaned_invoices AS (
    SELECT
        invoice_id,
        invoice_date
    FROM store.invoice
    WHERE billing_country = 'Germany'
)

SELECT * FROM cleaned_invoices;

In [None]:
-- Create the cleaned_invoices CTE
WITH cleaned_invoices AS (
    SELECT
        invoice_id,
        invoice_date
    FROM store.invoice
    WHERE billing_country = 'Germany'

)

-- Create the detailed_invoice_lines CTE
, detailed_invoice_lines AS (
  
    SELECT
        invoiceline.invoice_id,
        invoiceline.invoice_line_id,
        track.name,
        invoiceline.unit_price,
        invoiceline.quantity
    FROM store.invoiceline
    LEFT JOIN store.track ON invoiceline.track_id = track.track_id
)

SELECT * FROM detailed_invoice_lines;

In [None]:
WITH cleaned_invoices AS (
    SELECT
        invoice_id,
        invoice_date
    FROM store.invoice
    WHERE billing_country = 'Germany'
), 

detailed_invoice_lines AS (
    SELECT
        invoiceline.invoice_id,
        invoiceline.invoice_line_id,
        track.name,
        invoiceline.unit_price,
        invoiceline.quantity,
    FROM store.invoiceline
    LEFT JOIN store.track ON invoiceline.track_id = track.track_id
)

SELECT
    ci.invoice_id,
    ci.invoice_date,
    dil.name,
    -- Find the total amount for the line
    dil.unit_price * dil.quantity AS line_amount
FROM detailed_invoice_lines AS dil

-- JOIN the cleaned_invoices and detailed_invoice_lines CTEs
LEFT JOIN cleaned_invoices AS ci ON dil.invoice_id = ci.invoice_id
ORDER BY ci.invoice_id, line_amount;

### Finding the most popular artists
Something that we haven't been able to do is tie track sales to an artist. Let's change that! You're about to get hands-on using common table expressions to take data from two tables and find the artist with the most minutes listened. Before you get started, make sure to take a peek at the album and artist tables in the output window.

In [None]:
-- Create an artist_info CTE, JOIN the artist and album tables
WITH artist_info AS (
    SELECT
        album.album_id,
        artist.name AS artist_name
    FROM store.album
    JOIN store.artist ON album.artist_id = artist.artist_id

-- Define a track_sales CTE to assign an album_id, name,
-- and number of seconds for each track
), track_sales AS (
    SELECT
        track.album_id,
        track.name,
        track.milliseconds / 1000 AS num_seconds
    FROM store.invoiceline
    JOIN store.track ON invoiceline.track_id = track.track_id
)

SELECT
    ai.artist_name,
    -- Calculate total minutes listed
    SUM(ts.num_seconds) / 60 AS minutes_listened
FROM track_sales AS ts
JOIN artist_info AS ai ON ts.album_id = ai.album_id
-- Group the results by the non-aggregated column
GROUP BY ai.artist_name
ORDER BY minutes_listened DESC;

### Albums driving sales
Your director came up to your desk today and gave you the inside scoop on a holiday promotion on "Greatest Hits" albums that will be dropping soon. However, to make sure the optimal albums are discounted, she wants to know which "Greatest Hits" albums drive the most sales. To do this, you'll put all the skills you learned to the test!

In [None]:
-- Define an album_map CTE to combine albums and artists
WITH album_map AS (
    SELECT
        album.album_id, album.title AS album_name, artist.name AS artist_name,
  		-- Determine if an album is a "Greatest Hits" album
        CASE 
            WHEN album_name ILIKE '%greatest%' THEN TRUE
            ELSE FALSE
        END AS is_greatest_hits
    FROM store.album
    JOIN store.artist ON album.artist_id = artist.artist_id
), trimmed_invoicelines AS (
    SELECT
        invoiceline.invoice_id, track.album_id, invoice.total
    FROM store.invoiceline
    LEFT JOIN store.invoice ON invoiceline.invoice_id = invoice.invoice_id
    LEFT JOIN store.track ON invoiceline.track_id = track.track_id
)

SELECT
    album_map.album_name,
    album_map.artist_name,
    SUM(ti.total) AS total_sales_driven
FROM trimmed_invoicelines AS ti
JOIN album_map ON ti.album_id = album_map.album_id
-- Use a subquery to only "Greatest Hits" records
WHERE ti.album_id IN (SELECT album_id FROM album_map WHERE is_greatest_hits)
GROUP BY album_map.album_name, album_map.artist_name, is_greatest_hits
ORDER BY total_sales_driven DESC;