# SQL-3

## Setup the environment

In [None]:
pip install ipython-sql psycopg2

In [1]:
%load_ext sql

In [2]:
%sql postgresql://postgres@localhost:5432/testdb

---

## NULL values in SQL

* Tuples may have a null value for some of their attributes, denoted by **null**
* A null value can imply
    * Missing value
    * N/A

**Q: Find return_date of rental with rental_id = 11496. (Note: Here return_date is null!)**

In [None]:
%sql select rental_id, return_date from rental where rental_id = 11496;

**Q: What is the where condition evaluating to?(True or False?)**

In [None]:
%%sql select customer.first_name 
from customer join rental on rental.customer_id = customer.customer_id 
where rental.return_date IN (select return_date from rental where rental.rental_id = 11496)

In [None]:
%%sql select rental_id, return_date 
from rental 
where rental_id = 11496 and return_date < '2005-5-30';

**Q: The following two queries are the same in that where clause has condition that returns true, but why are the results different?**

In [None]:
%sql select count(*) from rental where true;

In [None]:
%%sql select count(*) 
from rental 
where return_date > '2005-05-30' or return_date <= '2005-05-30';

### Testing for NULL
* value IS NULL
* value IS NOT NULL

In [None]:
%%sql select count(*) 
from rental 
where return_date > '2005-05-30' or return_date <= '2005-05-30' 
or return_date is NULL;

### When comparing with NULL

* SQL treats as **unknown** the result of any comparison involving a null value (other than predicates **is null** and **is not null**)

* SQL follows a 3-valued logic
    1. TRUE = 1
    2. FALSE = 0
    3. UNKNOWN = 0.5

In the above example, `return_date < '2005-5-30';` is unknown if return_date is NULL!

* A resulting tuple is **only** produced if its truth value in where clause is **TRUE**
    - A AND B $\implies$ min(truth_value(A), truth_value(B))
    - A OR B $\implies$ max(truth_value(A), truth_value(B))
    - NOT A $\implies$ 1 - truth_value(A)

* Examples:
    - **WHERE** rental_id = 11496 and return_date < '2005-5-30'
    - min(1, 0.5) = 0.5, therefore the expression is unknown!

    - **WHERE** return_date > '2005-05-30' or return_date <= '2005-05-30'
    -- It will not count rows with null! (see query above)
    

In [None]:
%sql select 1=1 and 2=2 result;

In [None]:
%sql select 1=1 and 1=0 result;

In [None]:
%sql select 1=2 and 1=3 result;

In [None]:
%sql select 1=1 and 1 = null result;

In [None]:
%sql select 1 = 2 and 1 = null result;

In [None]:
%sql select null and null = 1 result;

**EXERCISE**

**Q: Try out the truth table for `OR` and `NOT` (like above)**

---

## More on Outer joins

**Q: Find number of films in each language. Make a note on behaviour of the specific join operator!**

In [None]:
%%sql select language.name, count(film.film_id) 
from language inner join film on film.language_id = language.language_id 
group by language.name;

In [None]:
%%sql select language.name, count(film.film_id) 
from language 
right join film on language.language_id = film.language_id 
group by language.name; 

In [None]:
%%sql select language.name, count(film.film_id) 
from language 
left outer join film on language.language_id = film.language_id 
group by language.name; 

In [None]:
%%sql select language.name, count(film.film_id) 
from language 
full outer join film on language.language_id = film.language_id 
group by language.name; 

---

## WITH CLAUSE & COMMON TABLE EXPRESSIONS (CTE)
* Provide temporary table to build complex and "neat" SQL queries
* refer to table within other queries
* Can write **recursive queries**
* Syntax
    
    WITH [RECURSIVE]  CTE_NAME [(CTE_COLUMNS)] AS ( CDE_DEFINITION ) PRIMARY_STATEMENT;

**Q: Compare the number of films an actor has acted in to the average number of films that a actor acts in.**

In [None]:
%%sql with avg_film_actor as (
    select avg(a.num_of_films) average 
    from (
        select actor_id, count(film_id) num_of_films
        from film_actor 
        group by actor_id
    ) as a
)
select actor_id, count(film_id) num_of_films, average avg_per_actor
from film_actor 
cross join avg_film_actor
group by actor_id, average
limit 10;

**Q: Compare a film's rent to the average rent of films that have the same rating as the film.**

In [None]:
%%sql with file_rating_avg as (
  select rating, avg(rental_rate) rating_average
  from film
  group by rating
)
select f.film_id, f.title, f.rating, f.rental_rate, a.rating_average
from film f, file_rating_avg a
where f.rating = a.rating
LIMIT 10;


### Using `RECURSIVE` in `WITH` query can refer to its own output

### Basic form of recursive queries

with recursive T as (\
    base_query \
    union all  \
    recursive_query involving T \
    )\
    query_involving T;


**Q: sum of numbers from 1 to 10**

In [None]:
%%sql with recursive t(n) as (
    values (1)
    union all  
    select n+1 from t where n <10
)
select sum(n) from t;

**Q: List numbers from 1 to 10**

In [None]:
%%sql with recursive list as (
    select 1 as n
    union all
    select n+1 from list where n < 10
)
select * from list;

**Q: Writing fibonacci numbers in SQL**

In [None]:
%%sql with recursive fibonacci as (
    select 0 as n_1, 1 as n_2
    union all 
    select n_2, n_1 + n_2 
    from fibonacci
    where n_2 < 5
)
select n_2 as fibanacci_numbers  from fibonacci;


**Q: Write a recursive SQL query to compute the factorial (n!) of numbers up to N**

**Lets us create a new table as follows.**

In [None]:
%sql drop table if exists new_category;


In [None]:
%%sql create table new_category (
    category_id integer NOT NULL,
    name character varying(25) NOT NULL,
    parent_category_id integer,
    last_update timestamp without time zone DEFAULT now() NOT NULL,
    primary key(category_id),
    constraint foreign_key_new_category
        foreign key(parent_category_id) references new_category(category_id)

)

In [None]:
%%sql insert into new_category (category_id, name, parent_category_id)
values 
(1, 'ALL', NULL),
(2, 'Action/Adventure', 1),
(3, 'Comedy/Musical', 1),
(4, 'Drama', 1),
(5, 'Action', 2),
(6, 'Adventure', 2),
(12, 'Disaster', 5),
(13, 'Sports', 5),
(14, 'Historial', 6),
(15, 'Wild', 6),
(7, 'Comedy', 3),
(8, 'Musical', 3),
(9, 'Social', 4),
(10, 'Political', 4),
(11, 'Court Room', 4),
(16, 'Slapstic', 7),
(17, 'Black', 7),
(18, 'Romantic', 7),
(19, 'Theater', 8),
(20, 'Dance', 8)

**Q: Lets find all subcategories of films under 'Comedy/Musical' genre.**

In [None]:
%%sql with recursive sub_categories as (
    select category_id, name, parent_category_id, null::varchar as parent_name
    from new_category 
    where name = 'Comedy/Musical'

    union all

    select c.category_id, c.name, c.parent_category_id, sc.name
    from new_category c join sub_categories sc on c.parent_category_id = sc.category_id
)
select name as category, parent_name as parent_category from sub_categories;

**Q: Write a (recursive) query that counts total number of subcategories under each category**

**Q: Extend the above query to also output the category names**

**Q: Using the film_actor and the film relations, write a recursive query to find chain of actors who worked together via films. The output relation should have attributes actor_1, actor_2, film_name.**