### Introduction

In this lesson, we'll see how to use common table expressions (CTEs) in SQL.  CTEs allow us to create a temporary query which 

### Loading our Data

In [6]:
import sqlite3
conn = sqlite3.connect('movie_films_actors.db')
cursor = conn.cursor()

Now, we can see from the above, that in the table, we have the following columns:

In [14]:
cursor.execute('PRAGMA table_info(movies)')
cursor.fetchall()

[(0, 'index', 'INTEGER', 0, None, 0),
 (1, 'title', 'TEXT', 0, None, 0),
 (2, 'studio', 'TEXT', 0, None, 0),
 (3, 'runtime', 'REAL', 0, None, 0),
 (4, 'description', 'TEXT', 0, None, 0),
 (5, 'release_date', 'TEXT', 0, None, 0),
 (6, 'year', 'INTEGER', 0, None, 0)]

In [13]:
pd.read_sql('SELECT * FROM movies LIMIT 2;', conn)

Unnamed: 0,index,title,studio,runtime,description,release_date,year
0,0,!Women Art Revolution,Zeitgeist Films,83.0,"Through intimate interviews, art, and rarely s...",2011-06-01 00:00:00,2011
1,1,#Horror,Lowland Pictures,90.0,You've got followers... Cyberbullying goes off...,2015-11-20 00:00:00,2015


### A two step problem

Now let's say that we want to find the years where movies have an average length greater than 120.  Now one way to do this would be to simply using the `HAVING` clause.

In [34]:
query = '''SELECT AVG(runtime) as avg_runtime, year FROM movies
GROUP BY year HAVING avg_runtime > 120;'''

In [33]:
pd.read_sql(query, conn)

Unnamed: 0,avg_runtime,year
0,181.0,1914
1,133.5,1915
2,175.0,1916


And we can see a mix of movies from the 1910s.  Now another way to do this is with a common table expression.  With a common table expression, we can create a temporary table. Above, we'll use this to first group our movies by year and create a temporary table that has a column of average runtimes per year.  Then, in a separate step we'll select those years that meet our threshold.

In [35]:
query = """WITH movie_years AS (
SELECT
        AVG(runtime) as avg_runtime,
        year
FROM movies GROUP BY year
)
SELECT     avg_runtime,
        year
FROM movie_years WHERE avg_runtime > 120;"""

In [36]:
pd.read_sql(query, conn)

Unnamed: 0,avg_runtime,year
0,181.0,1914
1,133.5,1915
2,175.0,1916


Ok, let's break this down below.  

1. The `WITH movie_years` creates a new temporary table called `movies_years`.  It's populated with the results of the SELECT statement.

2. Then in the second SELECT statement we select from our newly created table `movie_years`.  So notice that we do not need to using the HAVING clause in the second SELECT statement, because we `avg_runtime` is a column in our temporary table. 

```sql
WITH movie_years AS (
  SELECT AVG(runtime) as avg_runtime, year
  FROM movies GROUP BY year
)

SELECT avg_runtime, year FROM movie_years WHERE avg_runtime > 120;
```

So we can see that in CTEs, we create a temporary table with the following syntax:

```SQL
WITH table_name AS (
   SELECT ...
)

SELECT ... FROM table_name;
```

### CTEs and MultiLevel Aggregations

So above we saw how we can use CTEs to create a temporary table, and we can then query from that temporary table.  But so far, we have not used CTE's for anything that we cannot accomplish with the HAVING clause.  

One good usecase for a CTE is performing multilevel aggregations.  For example, let's say that we want to find per year, the studio with the highest average runtime.  

To do this, we can start with just a SELECT statement that groups our movies by `year` and `studio`, and returns the average runtime per studio per year.

In [55]:
sql = """
  SELECT AVG(runtime) as avg_runtime_per_studio_year, studio, year
  FROM movies GROUP BY year, studio LIMIT 3;
"""

cursor.execute(sql)
cursor.fetchall()

[(181.0, 'Itala Film', 1914),
 (75.0, 'Box Office Attractions', 1915),
 (192.0, 'Gravitas', 1915)]

But if we now wish to find the studios that had the highest average runtime per year, we can use a CTE to build off of our query above. 

In [58]:
sql = """WITH movie_studio_years AS (
  SELECT AVG(runtime) as avg_runtime_per_studio_year, studio, year
  FROM movies GROUP BY year, studio
)

SELECT max(avg_runtime_per_studio_year), studio, year FROM movie_studio_years 
GROUP BY year LIMIT 5;
"""

cursor.execute(sql)
cursor.fetchall()

[(181.0, 'Itala Film', 1914),
 (192.0, 'Gravitas', 1915),
 (175.0, 'Cohen Media Group', 1916),
 (102.0, 'Kino on Video', 1919),
 (107.0, 'Kino Lorber', 1920)]

So above, we create a new temporary table `movie_studio_years` that calculates the average runtime per studio per year -- just like before.  And then, we we query that table to group by year, and find the maximum average runtime per year.

We can see that for each year, we are returned with the studio that had the highest average runtime.

### Summary

In this lesson we learned about CTEs.  The CTE allows us to create a temporary table and then query from that table.  We write our CTE with the following syntax:

```SQL
WITH table_name AS (
   SELECT ...
)

SELECT ... FROM table_name;
```

In general, we use CTEs to break our code up in to multiple steps.  For example, we saw how we can use CTEs to perform a multilevel aggregation, where we first calculated the average movie runtime per studio and year, and then from there found the studio with the highest average runtime per year.   

```sql
WITH movie_studio_years AS (
  SELECT AVG(runtime) as avg_runtime_per_studio_year, studio, year
  FROM movies GROUP BY year, studio
)

SELECT max(avg_runtime_per_studio_year), studio, year FROM movie_studio_years 
GROUP BY year;
```