# Advanced Aggregates

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

We will use the DVD Rental database.

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


**NOTE**: These queries are more complex that the others.
If you get stuck on one, skip and come back to it later.

**NOTE**: For this notebook, it is desired that you construct solutions using advanced aggregates and derived tables.

In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

### 1
### What is the average, variance, and standard deviation of the film length?


In [2]:
%%sql
EXPLAIN
SELECT  avg(length)
        ,variance(length)
        ,stddev(length)
FROM    film;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=71.51..71.52 rows=1 width=96)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=2)


In [4]:
%%sql
SELECT  avg(length)
        ,variance(length)
        ,stddev(length)
FROM    film;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


avg,variance,stddev
115.272,1634.2883043043043,40.426331818559845


### 2
### What is the average, variance, and standard deviation of the film length; broken down by film category.

In [10]:
%%sql
EXPLAIN
SELECT  avg(f.length) OVER (PARTITION BY fc.category_id)
        ,variance(f.length) OVER (PARTITION BY fc.category_id)
        ,stddev(f.length) OVER (PARTITION BY fc.category_id)
FROM    film as f JOIN film_category as fc
        USING(film_id);

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
8 rows affected.


QUERY PLAN
WindowAgg (cost=144.97..167.47 rows=1000 width=98)
-> Sort (cost=144.97..147.47 rows=1000 width=4)
Sort Key: fc.category_id
-> Hash Join (cost=76.50..95.14 rows=1000 width=4)
Hash Cond: (fc.film_id = f.film_id)
-> Seq Scan on film_category fc (cost=0.00..16.00 rows=1000 width=4)
-> Hash (cost=64.00..64.00 rows=1000 width=6)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=6)


In [8]:
%%sql
SELECT  avg(f.length) OVER (PARTITION BY fc.category_id)
        ,variance(f.length) OVER (PARTITION BY fc.category_id)
        ,stddev(f.length) OVER (PARTITION BY fc.category_id)
FROM    film as f JOIN film_category as fc
        USING(film_id);

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1000 rows affected.


avg,variance,stddev
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323
111.609375,1848.3687996031745,42.99265983401323


[Helpful Hints Video](https://youtu.be/jy9H2KLI4Iw) 

### 3
### A movie's "cumulative rented duration" is the sum of all rentals from rental table.  What is the average _cumulative rented duration_ per store (inventory.store_id).

In [None]:
%%sql
EXPLAIN






In [None]:
%%sql







[Helpful Hints Video](https://youtu.be/Scyn7exzUcY)  

### 4
### Which three categories of film have the highest average number of actors per film?

In [None]:
%%sql
EXPLAIN






In [None]:
%%sql







### 5
### For each staff member, list their average daily payment amount processed.

In [None]:
%%sql
EXPLAIN






In [None]:
%%sql







### 6
### What is the statistical correlation between film length and rental rate?

In [None]:
%%sql
EXPLAIN






In [None]:
%%sql







[Helpful Hints Video](https://youtu.be/3d2vgLn9KVs)  

# Save your Notebook, then `File > Close and Halt`