# Lecture 02: SQL Review, Extra

This extra notebook contains a lot of extra SQL practice. Many variations of queries shown below are either in the slides or in course notes.

## First load in the data into the database

In [None]:
!unzip -u data/imdb_lecture.zip -d data/

In [None]:
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_lecture' 
!psql -h localhost -d imdb_lecture -f data/imdb_lecture.sql

## `jupysql` setup

In [None]:
%reload_ext sql

In [None]:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture

## CAST

What's wrong with the following query?

In [None]:
%%sql
SELECT primary_title, type,
     premiered AS release_year,
     runtime_minutes,
     runtime_minutes/60 AS 
         runtime_hours
FROM titles
WHERE premiered >= 2020 AND
      premiered <= 2023;

## CASE statements

What's wrong with the following query?

In [None]:
%%sql
SELECT
    person_id, name,
    died, born,
    died - born AS age                                                                     
FROM people;

## Null values and boolean expressions

Compare/contrast the following three queries:

We are going to be using the `jupysql` library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.

In [None]:
%%sql
SELECT born
FROM people;

In [None]:
%%sql
SELECT born
FROM people
WHERE born < 2023 OR
    born IS NULL;

In [None]:
%%sql
SELECT born
FROM people
WHERE born < 2023;

## String matching and COUNT(*)

In [None]:
%%sql
SELECT *
FROM people
WHERE name LIKE 'Chris%';

## Multiple relations, Aliasing

In [None]:
%%sql
SELECT *
FROM
    akas, titles
WHERE
    titles.title_id = 
        akas.title_id;

In [None]:
%%sql
SELECT *
FROM
    akas AS A,
    titles T
WHERE
    A.title_id = T.title_id;

In [None]:
%%sql
SELECT *
FROM akas A
  INNER JOIN titles T
    ON A.title_id = T.title_id

# IMDB exercise 1

What does each record represent in the below result? Why?

In [None]:
%%sql
SELECT *
FROM titles

  INNER JOIN crew 
    ON crew.title_id = 
       titles.title_id


  INNER JOIN people
	 ON people.person_id = 
       crew.person_id;

How do we modify the above query so that it gets the
titles and IDs of Michelle Yeoh movies?


Let's cache your query string using some fancy `jupysql` formatting:

In [None]:
# write your query below
# while it's bad style, for this to work,
# don't end with a semicolon.
ex1_query = """
-- fill in your query here --
"""

In [None]:
%%sql
{{ex1_query}};

# IMDB exercise 2

How do we write a query that gets the names of Michelle Yeoh movies
that have a rating of at least 8.0?

First, let's create a view called `yeoh_movies`. More in a bit.

In [None]:
%%sql
CREATE VIEW yeoh_movies AS (
  {{ex1_query}}
);

In [None]:
%sql SELECT * FROM yeoh_movies;

In [None]:
%%sql
SELECT primary_title
FROM ratings
INNER JOIN yeoh_movies
  ON ratings.title_id = yeoh_movies.title_id
WHERE rating >= 7.0;

## Quick Peek: The Natural Join

In [None]:
%%sql
SELECT *
FROM ratings
INNER JOIN yeoh_movies
  ON ratings.title_id = 
   yeoh_movies.title_id
WHERE rating >= 7.0;

In [None]:
%%sql
SELECT *
FROM ratings
NATURAL JOIN yeoh_movies
WHERE rating >= 7.0;

In [None]:
%reload_ext sql

In [None]:
%%sql
SELECT DISTINCT titles.primary_title, titles.title_id
FROM titles
    INNER JOIN crew
        ON crew.title_id = titles.title_id
    INNER JOIN people
        ON people.person_id = crew.person_id
WHERE people.name = 'Morgan Freeman' AND titles.type = 'movie';

In [None]:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture

# Tricky Queries

What do these queries do?

In [None]:
%%sql
SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\1') 
         as firstname,
       COUNT(*) as countname
FROM people
GROUP BY firstname
ORDER BY countname DESC;

In [None]:
%%sql
SELECT type, 
AVG (CASE WHEN premiered < 2000 THEN runtime_minutes              
     ELSE NULL
     END) AS pre2k_avg, AVG (CASE WHEN premiered >= 2000 THEN runtime_minutes             
     ELSE NULL
     END) AS post2k_avg 
FROM titles
GROUP BY type;