# Subquery Practices

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://indigo.sgn.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).  

In [None]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

### Some Tips 

#### How to comment out SQL code?

In [None]:
%%sql 

-- select * from category
/*
select c. category_id, count(c. category_id)
from category c, film_category
group by c.category_id
order by count(*)
*/

select count(*) from film_category;

#### How to save the results to a pandas data frame?

In [None]:
import pandas as pd

In [None]:
df = %sql select * from category

In [None]:
df

In [None]:
%%sql

Select * from film 
where title like 'God%';

In [None]:
-- This will not work
df = %%sql

Select * from film 
where title like 'God%';

In [None]:
res = _

In [None]:
df = pd.DataFrame(res)

In [None]:
df.head()

In [None]:
%%sql

select * 
From film natural join inventory
limit 10

### 1. Find names of customers who’ve rented film id 364 (Godfather Diary)

#### A. Using Join

In [None]:
%%sql 

select c.customer_id, c.first_name
from customer c join rental r using(customer_id) 
    join inventory i using(inventory_id)
    join film f using(film_id)
where f.title like 'Godfather Diary'  
order by c.customer_id;


#### B. Using Type-I Subquery.

Q. Why the following inner query is uncorrelated?

In [None]:
%%sql
select c.customer_id, c.first_name
from customer c 
where c.customer_id in (
    select customer_id 
    from rental r 
    join inventory i using(inventory_id)
    join film f using(film_id)
    where f.title like 'Godfather Diary'  
)
order by c.customer_id;

### C. Using Type 2 Query

Q. Why the following inner query is correlated?

In [None]:
%%sql
select c.customer_id, c.first_name
from customer c 
where exists (
    select * 
    from rental r 
    join inventory i using(inventory_id)
    join film f using(film_id)
    where f.title like 'Godfather Diary' and r.customer_id = c.customer_id
)
order by c.customer_id;

### 1.1 Count the names of customers who didn't rent film id 364 (Godfather Diary)¶

#### A. Using Join

In [None]:
%%sql 

select count(distinct c.customer_id)
from customer c join rental r using(customer_id) 
    join inventory i using(inventory_id)
    join film f using(film_id)
where  f.title <> 'Godfather Diary'  


**Q. Is this correct?**

Ans: No. When we join customer with all of his rentals then the above query will not discards those customer who rented not only 'Godfather Diary' but also other films. Only the rows that have film 'Godfather Diary' as title will be discarded. 

In [None]:
%%sql
select count(*) from customer;


This above count shows that 8 customers from Q1 are not discarded in the join operation. 

#### B. Type-I subquery

In [None]:
%%sql

select count(*)
from customer c 
where c.customer_id not in (
    select customer_id 
    from rental r 
    join inventory i using(inventory_id)
    join film f using(film_id)
    where f.title = 'Godfather Diary'  
)

#### C. Using Type-II Subquery

In [None]:
%%sql
select count(*)
from customer c
where NOT exists (
    -- this is a comment  
    select * 
    from rental r 
    join inventory i using(inventory_id)
    join film f using(film_id)
    where f.title like 'Godfather Diary' and r.customer_id = c.customer_id
);

### 2. Find active customers who have rented movies priced $9.99

#### A. Using Join

In [None]:
%%sql

SELECT distinct customer_id, first_name, last_name, active, email 
FROM customer c join payment p using (customer_id) 
WHERE c.customer_id = p.customer_id
    AND p.amount=9.99    
    AND c.active=1;

#### B. Using Type I

In [None]:
%%sql

SELECT customer_id, first_name, last_name, active, email 
FROM customer 
WHERE customer_id IN (
    SELECT customer_id 
    FROM payment 
    WHERE amount=9.99
) AND active=1;

#### C. Using Type II

In [None]:
%%sql

SELECT c.customer_id, c.first_name, c.last_name, c.active, c.email 
FROM customer c
WHERE EXISTS(
    SELECT customer_id 
    FROM payment p
    WHERE amount=9.99 and p.customer_id = c.customer_id
) AND active=1;

### 3. Find films whose rental rate is greater than that of the film Godfather Diary :

In [None]:
%%sql

SELECT film_id, title
FROM Film F
WHERE F.rental_rate > (
    SELECT F2.rental_rate
    FROM Film F2
    WHERE F2.title='Godfather Diary'
);

### 4. Find all films whose rental_rates are greater than the lowest rental_rate of every movie category

**Step 1: Get the lowest rental rate of each category**

In [None]:
%%sql

SELECT 
    category_id, MIN(rental_rate)
FROM
    film join film_category using (film_id)
GROUP BY category_id
ORDER BY category_id, MIN(rental_rate) DESC;

**Step 2: Use step 1 as a inner subquery**

In [None]:
%%sql

SELECT 
    film_id, title, rental_rate
FROM
    film
WHERE
    rental_rate > ALL (
        
SELECT 
    MIN(rental_rate)
FROM
    film join film_category using (film_id)
GROUP BY category_id
ORDER BY category_id, MIN(rental_rate) DESC
        
)
ORDER BY film_id , title;


### 5. Find all films whose rental_rates are greater than or equal to the highest rental_rate of some category

In [None]:
%%sql

SELECT 
    film_id, title, rental_rate
FROM
    film
WHERE
    rental_rate >= ANY (

-- Inner query 
    SELECT 
        MAX(rental_rate)
    FROM
        film join film_category using (film_id)
    GROUP BY category_id
    ORDER BY category_id, MIN(rental_rate) DESC
        
)
ORDER BY film_id , title;


### 6. Find customers in store 1 that spent less than 2.99 on individual rentals, but have spent a total higher than $5.

In [None]:
%%sql

SELECT customer_id, SUM(amount) 
FROM (
    SELECT payment_id, customer_id, amount 
    FROM payment a 
    WHERE a.staff_id=1
) sub 
WHERE sub.amount < 2.99 
GROUP BY sub.customer_id 
HAVING SUM(sub.amount) >= 5;

### 7. Find the names of all films that are either Sci-Fi or Travel movies.

#### A. Using Join

In [None]:
%%sql

select f.film_id, f.title
from film f join film_category using (film_id) 
    join category c using (category_id)
where c.name = 'Sci-Fi'or c.name = 'Travel'

#### B. Using Type I

In [None]:
%%sql
select f.film_id, f.title
from film f 
where f.film_id in (
    select film_id
    from category c join film_category using (category_id) 
    where c.name = 'Sci-Fi'or c.name = 'Travel')

#### C. Using Type II

In [None]:
%%sql
select f.film_id, f.title
from film f 
where exists (
    select *
    from category c join film_category fc using (category_id) 
    where (c.name = 'Sci-Fi'or c.name = 'Travel') and fc.film_id = f.film_id)

### 8. Find the maximum average rental_rate for the film categories

This practice shows the use of nested table expression, which occurs in the FROM clause. 

**Step 1: Find the average rental_rate per category**

In [None]:
%%sql 

SELECT category_id, AVG(rental_rate)
FROM
    film join film_category using (film_id)
GROUP BY category_id
ORDER BY category_id, MIN(rental_rate) DESC

**Step 2: Use the result as a nested table**

In [None]:
%%sql 

select c.category_id, c.name, CatAvgRate.rate_avg
from category c join (
    
    SELECT category_id, AVG(rental_rate) as rate_avg
    FROM film join film_category using (film_id)
    GROUP BY category_id
    ORDER BY category_id, MIN(rental_rate) DESC
    
    ) as CatAvgRate USING (category_id)  -- Nested table expression
    

**Step 3: Get the max value from the resultant table of the Step 2 join expression**

In [None]:
%%sql 

select max(rate_avg)
from category c join (
    SELECT category_id, AVG(rental_rate) as rate_avg
    FROM film join film_category using (film_id)
    GROUP BY category_id
    ORDER BY category_id, MIN(rental_rate) DESC)
    as CatAvgRate USING (category_id) -- Nested table expression
    

### 8.1 How to select max avg rate along with other colmns of the record.

In [None]:
%%sql 

select c.category_id, c.name, a.rate_avg
from category c join (
    SELECT category_id, AVG(rental_rate) as rate_avg
    FROM film join film_category using (film_id)
    GROUP BY category_id
    ORDER BY category_id, MIN(rental_rate) DESC
    ) as a USING (category_id)  -- Nested table expression

where a.rate_avg = (

select max(rate_avg)
from category c join (
    SELECT category_id, AVG(rental_rate) as rate_avg
    FROM film join film_category using (film_id)
    GROUP BY category_id
    ORDER BY category_id, MIN(rental_rate) DESC)
    as CatAvgRate USING (category_id) -- Nested table expression
)

### 8.2 Add a column "total rent per category" to the above CatAvgRate table in Step 2 of Q8.**

**Step 1: Get total rental count per category**

In [15]:
%%sql 

SELECT category_id, count(*)
FROM
    film join film_category using (film_id)
    join inventory using (film_id)
    join rental using (inventory_id)
GROUP BY category_id
ORDER BY category_id

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


category_id,count
1,1112
2,1166
3,945
4,939
5,941
6,1050
7,1060
8,1096
9,1033
10,969


**Step 2: Combine two NTE**

In [16]:
%%sql

select *
from category c join (
    -- Nested table expression 1
    SELECT category_id, AVG(rental_rate) as rate_avg
    FROM film join film_category using (film_id)
    GROUP BY category_id
    ORDER BY category_id, MIN(rental_rate) DESC)
    as CatAvgRate USING (category_id)
    
    join (
    -- Nested table expression 2    
    SELECT category_id, count(*) as total_rental
    FROM film join film_category using (film_id)
        join inventory using (film_id)
        join rental using (inventory_id)
    GROUP BY category_id
    ORDER BY category_id
        
    ) as CatTotal using (category_id)
    

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


category_id,name,last_update,rate_avg,total_rental
1,Action,2006-02-15 09:46:27,2.64625,1112
2,Animation,2006-02-15 09:46:27,2.8081818181818186,1166
3,Children,2006-02-15 09:46:27,2.89,945
4,Classics,2006-02-15 09:46:27,2.7443859649122806,939
5,Comedy,2006-02-15 09:46:27,3.1624137931034486,941
6,Documentary,2006-02-15 09:46:27,2.666470588235294,1050
7,Drama,2006-02-15 09:46:27,3.0222580645161288,1060
8,Family,2006-02-15 09:46:27,2.758115942028985,1096
9,Foreign,2006-02-15 09:46:27,3.0995890410958906,1033
10,Games,2006-02-15 09:46:27,3.2522950819672127,969
