# Basic Statistics Practice

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

# 1

### List the average length of the movies (film.length) in the database.

In [2]:
%%sql
EXPLAIN
SELECT  ROUND(AVG(length), 2) AS avg_length
FROM    film;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=66.50..66.52 rows=1 width=32)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=2)


In [3]:
%%sql
SELECT  ROUND(AVG(length), 2) AS avg_length
FROM    film;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


avg_length
115.27


[Helpful Hints](https://youtu.be/yMrP0cr_rqo)  
 

--- 

# 2

### List the number of rows in the payment table.

In [4]:
%%sql
EXPLAIN
SELECT  COUNT(*) AS payment_records
FROM    payment;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=290.45..290.46 rows=1 width=8)
-> Seq Scan on payment (cost=0.00..253.96 rows=14596 width=0)


In [5]:
%%sql
SELECT  COUNT(*) AS payment_records
FROM    payment;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


payment_records
14596


# 3

### List each category (`category.name`) and the number of films in that category.

In [6]:
%%sql
EXPLAIN
SELECT  c.name AS category
        ,COUNT(f.film_id) AS films
FROM    category AS c JOIN film_category AS fc
        ON c.category_id = fc.category_id
        JOIN film AS f
        ON fc.film_id = f.film_id
GROUP BY c.category_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
11 rows affected.


QUERY PLAN
HashAggregate (cost=104.81..104.97 rows=16 width=80)
Group Key: c.category_id
-> Hash Join (cost=77.86..99.81 rows=1000 width=76)
Hash Cond: (fc.film_id = f.film_id)
-> Hash Join (cost=1.36..20.67 rows=1000 width=74)
Hash Cond: (fc.category_id = c.category_id)
-> Seq Scan on film_category fc (cost=0.00..16.00 rows=1000 width=4)
-> Hash (cost=1.16..1.16 rows=16 width=72)
-> Seq Scan on category c (cost=0.00..1.16 rows=16 width=72)
-> Hash (cost=64.00..64.00 rows=1000 width=4)


In [7]:
%%sql
SELECT  c.name AS category
        ,COUNT(f.film_id) AS films
FROM    category AS c JOIN film_category AS fc
        ON c.category_id = fc.category_id
        JOIN film AS f
        ON fc.film_id = f.film_id
GROUP BY c.category_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


category,films
Classics,57
Sci-Fi,61
Children,60
Games,61
Drama,62
New,63
Foreign,73
Action,64
Comedy,58
Animation,66


[Helpful Hints](https://youtu.be/YRMI8myh9WY)  
 

--- 

# 4

### List each film title and the number of actors in that film.

In [8]:
%%sql
EXPLAIN
SELECT  f.title AS film
        ,COUNT(a.actor_id) AS actors
FROM    film AS f JOIN film_actor as fa
        ON f.film_id = fa.film_id
        JOIN actor as a
        ON fa.actor_id = a.actor_id
GROUP BY f.film_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
11 rows affected.


QUERY PLAN
HashAggregate (cost=223.96..233.96 rows=1000 width=27)
Group Key: f.film_id
-> Hash Join (cost=83.00..196.65 rows=5462 width=23)
Hash Cond: (fa.actor_id = a.actor_id)
-> Hash Join (cost=76.50..175.51 rows=5462 width=21)
Hash Cond: (fa.film_id = f.film_id)
-> Seq Scan on film_actor fa (cost=0.00..84.62 rows=5462 width=4)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=19)
-> Hash (cost=4.00..4.00 rows=200 width=4)


In [9]:
%%sql
SELECT  f.title AS film
        ,COUNT(a.actor_id) AS actors
FROM    film AS f JOIN film_actor as fa
        ON f.film_id = fa.film_id
        JOIN actor as a
        ON fa.actor_id = a.actor_id
GROUP BY f.film_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
997 rows affected.


film,actors
Pajama Jawbreaker,5
Effect Gladiator,7
Balloon Homeward,6
Voyage Legally,5
Stallion Sundance,7
Bikini Borrowers,2
Garden Island,3
Saints Bride,10
Luck Opus,6
Tadpole Park,8


# 5

### List each film title and the number of actors in that film, for films with more than 10 actors.

In [10]:
%%sql
EXPLAIN
SELECT  f.title AS film
        ,COUNT(a.actor_id) AS actors
FROM    film AS f JOIN film_actor as fa
        ON f.film_id = fa.film_id
        JOIN actor as a
        ON fa.actor_id = a.actor_id
GROUP BY f.film_id
HAVING  COUNT(a.actor_id) > 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
12 rows affected.


QUERY PLAN
HashAggregate (cost=237.62..247.62 rows=1000 width=27)
Group Key: f.film_id
Filter: (count(a.actor_id) > 10)
-> Hash Join (cost=83.00..196.65 rows=5462 width=23)
Hash Cond: (fa.actor_id = a.actor_id)
-> Hash Join (cost=76.50..175.51 rows=5462 width=21)
Hash Cond: (fa.film_id = f.film_id)
-> Seq Scan on film_actor fa (cost=0.00..84.62 rows=5462 width=4)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=19)


$\color{blue}{\text{I tried using the actors alias in the HAVING clause but learned that in postgreSQL it is evaluated before SELECT. Oh well.}}$

In [11]:
%%sql
SELECT  f.title AS film
        ,COUNT(a.actor_id) AS actors
FROM    film AS f JOIN film_actor as fa
        ON f.film_id = fa.film_id
        JOIN actor as a
        ON fa.actor_id = a.actor_id
GROUP BY f.film_id
HAVING  COUNT(a.actor_id) > 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
27 rows affected.


film,actors
Lonely Elephant,12
Sky Miracle,12
Lambs Cincinatti,15
Random Go,13
Image Princess,11
Dracula Crystal,13
Arabia Dogma,12
Fiddler Lost,11
Instinct Airport,11
Rings Heartbreakers,11


[Helpful Hints](https://youtu.be/4dZJoRfP7Kw)  
 

--- 

# 6

### List the average length of the movies in the database, per `language_id`

In [12]:
%%sql
EXPLAIN
SELECT  l.language_id AS language
        ,ROUND(AVG(f.length), 2) AS avg_length
FROM    film AS f JOIN language as l
        ON f.language_id = l.language_id
GROUP BY l.language_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=74.62..74.71 rows=6 width=36)
Group Key: l.language_id
-> Hash Join (cost=1.14..69.62 rows=1000 width=6)
Hash Cond: (f.language_id = l.language_id)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=4)
-> Hash (cost=1.06..1.06 rows=6 width=4)
-> Seq Scan on language l (cost=0.00..1.06 rows=6 width=4)


In [13]:
%%sql
SELECT  l.language_id AS language
        ,ROUND(AVG(f.length), 2) AS avg_length
FROM    film AS f JOIN language as l
        ON f.language_id = l.language_id
GROUP BY l.language_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


language,avg_length
1,115.27


In [14]:
%sql select distinct(name), language_id from language

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
6 rows affected.


name,language_id
English,1
German,6
Japanese,3
Italian,2
French,5
Mandarin,4


In [15]:
%sql select distinct(language_id) from film

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


language_id
1


# 7

### List the average length of the movies in the database, per language name

In [16]:
%%sql
EXPLAIN
SELECT  l.name AS language
        ,ROUND(AVG(f.length), 2) AS avg_length
FROM    film AS f JOIN language as l
        ON f.language_id = l.language_id
GROUP BY l.name;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=74.62..74.71 rows=6 width=116)
Group Key: l.name
-> Hash Join (cost=1.14..69.62 rows=1000 width=86)
Hash Cond: (f.language_id = l.language_id)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=4)
-> Hash (cost=1.06..1.06 rows=6 width=88)
-> Seq Scan on language l (cost=0.00..1.06 rows=6 width=88)


In [17]:
%%sql
SELECT  l.name AS language
        ,ROUND(AVG(f.length), 2) AS avg_length
FROM    film AS f JOIN language as l
        ON f.language_id = l.language_id
GROUP BY l.name;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


language,avg_length
English,115.27


# 8

### List each film title and its average rental duration in days.

**HINT** `return_date::date` casts the return _timestamp_ to date.  PostgreSQL can do date math natively.

In [24]:
%%sql
EXPLAIN
SELECT  f.title AS film
        ,AVG(r.return_date - r.rental_date) AS avg_rental_duration
FROM    rental AS r JOIN inventory AS i
        ON r.inventory_id = i.inventory_id
        JOIN film AS f
        ON f.film_id = i.film_id
GROUP BY f.title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
11 rows affected.


QUERY PLAN
HashAggregate (cost=719.80..732.30 rows=1000 width=31)
Group Key: f.title
-> Hash Join (cost=204.57..599.47 rows=16044 width=31)
Hash Cond: (i.film_id = f.film_id)
-> Hash Join (cost=128.07..480.67 rows=16044 width=18)
Hash Cond: (r.inventory_id = i.inventory_id)
-> Seq Scan on rental r (cost=0.00..310.44 rows=16044 width=20)
-> Hash (cost=70.81..70.81 rows=4581 width=6)
-> Seq Scan on inventory i (cost=0.00..70.81 rows=4581 width=6)
-> Hash (cost=64.00..64.00 rows=1000 width=19)


In [25]:
%%sql
SELECT  f.title AS film
        ,AVG(r.return_date - r.rental_date) AS avg_rental_duration
FROM    rental AS r JOIN inventory AS i
        ON r.inventory_id = i.inventory_id
        JOIN film AS f
        ON f.film_id = i.film_id
GROUP BY f.title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
958 rows affected.


film,avg_rental_duration
Graceland Dynamite,"5 days, 7:15:00"
Opus Ice,"4 days, 22:39:10.909091"
Braveheart Human,"4 days, 0:28:36"
Wonderful Drop,"6 days, 0:31:20"
Rush Goodfellas,"4 days, 11:33:52.258064"
Purple Movie,"3 days, 23:57:46.153846"
Minority Kiss,"4 days, 23:39:49.090909"
Luke Mummy,"5 days, 0:10:42.857142"
Fantasy Troopers,"5 days, 5:32:46.153846"
Grinch Massage,"5 days, 5:10:04.285715"


# 9

### List the film title and the number of times it has been rented in each city (name and country).

In [31]:
%%sql
EXPLAIN
SELECT  f.title AS film
        ,ci.city
        ,co.country
        ,COUNT(r.rental_id) AS times_rented
FROM    film AS f JOIN inventory AS i
        ON f.film_id = i.film_id
        JOIN rental AS r
        ON r.inventory_id = i.inventory_id
        JOIN customer AS cu
        ON r.customer_id = cu.customer_id
        JOIN address AS a
        ON cu.address_id = a.address_id
        JOIN city AS ci
        ON a.city_id = ci.city_id
        JOIN country as co
        ON ci.country_id = co.country_id
GROUP BY ci.city, co.country, f.title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
27 rows affected.


QUERY PLAN
HashAggregate (cost=996.90..1157.34 rows=16044 width=41)
"Group Key: ci.city, co.country, f.title"
-> Hash Join (cost=270.57..836.46 rows=16044 width=37)
Hash Cond: (ci.country_id = co.country_id)
-> Hash Join (cost=267.12..789.26 rows=16044 width=30)
Hash Cond: (a.city_id = ci.city_id)
-> Hash Join (cost=248.62..728.34 rows=16044 width=21)
Hash Cond: (cu.address_id = a.address_id)
-> Hash Join (cost=227.05..664.36 rows=16044 width=21)
Hash Cond: (r.customer_id = cu.customer_id)


In [32]:
%%sql 
SELECT  f.title AS film
        ,ci.city
        ,co.country
        ,COUNT(r.rental_id) AS times_rented
FROM    film AS f JOIN inventory AS i
        ON f.film_id = i.film_id
        JOIN rental AS r
        ON r.inventory_id = i.inventory_id
        JOIN customer AS cu
        ON r.customer_id = cu.customer_id
        JOIN address AS a
        ON cu.address_id = a.address_id
        JOIN city AS ci
        ON a.city_id = ci.city_id
        JOIN country as co
        ON ci.country_id = co.country_id
GROUP BY ci.city, co.country, f.title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
15827 rows affected.


film,city,country,times_rented
Cowboy Doom,Enshi,China,1
Vacation Boondock,Tama,Japan,1
Philadelphia Wife,Valencia,Venezuela,1
Pond Seattle,Apeldoorn,Netherlands,1
Rocky War,Ciomas,Indonesia,1
Bringing Hysterical,Nuuk,Greenland,1
Natural Stock,Santa Brbara dOeste,Brazil,1
Show Lord,Tychy,Poland,1
Bill Others,Lincoln,United States,1
Hunting Musketeers,Mwanza,Tanzania,1


# 10

### List the film title, number of times it has been rented, and the most recent rental date, in order of least recently rented, then most rentals.

In [34]:
%%sql
EXPLAIN
SELECT  f.title AS film
        ,COUNT(r.rental_id) AS times_rented
        ,MAX(r.rental_date) AS latest_rental_date
FROM    film AS f JOIN inventory AS i
        ON f.film_id = i.film_id
        JOIN rental AS r
        ON r.inventory_id = i.inventory_id
GROUP BY f.title
ORDER BY COUNT(r.rental_id);

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
13 rows affected.


QUERY PLAN
Sort (cost=779.62..782.12 rows=1000 width=31)
Sort Key: (count(r.rental_id))
-> HashAggregate (cost=719.80..729.80 rows=1000 width=31)
Group Key: f.title
-> Hash Join (cost=204.57..599.47 rows=16044 width=27)
Hash Cond: (i.film_id = f.film_id)
-> Hash Join (cost=128.07..480.67 rows=16044 width=14)
Hash Cond: (r.inventory_id = i.inventory_id)
-> Seq Scan on rental r (cost=0.00..310.44 rows=16044 width=16)
-> Hash (cost=70.81..70.81 rows=4581 width=6)


In [37]:
%%sql
SELECT  f.title AS film
        ,COUNT(r.rental_id) AS times_rented
        ,MAX(r.rental_date) AS latest_rental_date
FROM    film AS f JOIN inventory AS i
        ON f.film_id = i.film_id
        JOIN rental AS r
        ON r.inventory_id = i.inventory_id
GROUP BY f.title
ORDER BY COUNT(r.rental_id);

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
958 rows affected.


film,times_rented,latest_rental_date
Mixed Doors,4,2005-08-21 18:34:21
Train Bunch,4,2005-08-23 21:56:04
Hardly Robbers,4,2005-08-22 20:44:06
Mannequin Worst,5,2005-08-22 17:49:35
Braveheart Human,5,2005-08-20 16:05:11
Bunch Minds,5,2005-08-21 16:05:11
Private Drop,5,2005-08-19 23:48:23
Traffic Hobbit,5,2005-08-21 16:24:45
Mussolini Spoilers,5,2005-08-19 20:53:43
Seven Swarm,5,2005-08-23 04:28:25


# Save your Notebook, then `File > Close and Halt`

---