# Lecture 02: SQL Review

In [1]:
# Run this cell to set up imports
import numpy as np
import pandas as pd

## First load in the data into the database

In [5]:
!unzip -u data/imdb_lecture.zip -d data/
!export PATH=$PATH:/Library/PostgreSQL/16/bin

Archive:  data/imdb_lecture.zip


In [6]:
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_lecture' 
!psql -h localhost -d imdb_lecture -f data/imdb_lecture.sql

/bin/bash: psql: command not found
/bin/bash: psql: command not found
/bin/bash: psql: command not found


# Using `psql` in Terminal

`psql` is a command-line PostgreSQL interactive client.

I find it useful to keep the Terminal up while I'm working on notebooks for the following:
* **meta-commands**: `psql` commands to query information (generally metadata) about the databse
* **writing interactive SQL queries**: `psql` shows me a few rows at a time, and I can quit whenever. Avoids Jupyter notebooks running out of space if the query result relation is huge.

In [1]:
%load_ext sql

In [6]:
%%sql
SELECT *
FROM crew;

UsageError: No active connection.

To fix it:

Pass a valid connection string:
    Example: %sql postgresql://username:password@hostname/dbname

OR

Set the environment variable $DATABASE_URL

Documentation: https://jupysql.ploomber.io/en/latest/connecting.html
If you need help solving this issue, send us a message: https://ploomber.io/community


To launch `psql` and connect to a specific database, say, the `imdb_lecture` database we just created on `localhost`, open up a Terminal and type in:

```
psql postgresql://127.0.0.1:5432/imdb_lecture
```

Note the postgres server is on localhost (i.e., IP address `127.0.0.0.1`) and network port `5432`.

Troubleshooting:
* You do not have to be in a particular directory to launch the `psql` client!
* If you cannot connect or you do not see any relations with `\d`, make sure you have created/loaded in the database with the `!psql` commands in the previous section.
* If your interactive query is not executing, check to see if you have ended with a semicolon (necessary and also generally good style!).

Quick reference:
* `\l` list all databases available on this server
* `\d` list all relations in this database
* `\d tablename` list schema of tablename relation
* `\q` quit psql
* `\?` help
* `<ctrl>-c` cancel
* `<ctrl>-a`, `<ctrl>-e` jump to the front and back of a line, respectively
* `<ctrl>-<left>`, `<ctrl>-<right`> jump one word previous and forward, respectively
* (when in query result buffer) `<space>` to advance a page, `q` to quit and exit out

# Using `jupysql` in Jupyter Notebook

We are going to be using the `jupysql` library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.

In [4]:
%reload_ext sql

Note we did not do `import jupysql` (this will throw an error). You should always load `jupysql` as the `sql` cell magic, as shown above.

<br/>

`jupysql` helps us create a client connection directly from our Notebook. However, just like before, we first need to connect to our database before we start issuing any queries:

In [2]:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture

In [4]:
%%sql
SELECT P2.name, P2.born, P1.name, P1.born
FROM People AS P1,
     People AS P2
WHERE P1.born > P2.born -- born earlier than P1's name
   AND P1.name = 'Michelle Yeoh';

name,born,name_1,born_1
Dave Hardman,1960,Michelle Yeoh,1962
William Sadler,1950,Michelle Yeoh,1962
Ralph Abernathy,1926,Michelle Yeoh,1962
Nehemiah Persoff,1919,Michelle Yeoh,1962
Fernando Arribas,1940,Michelle Yeoh,1962
Roberto Gómez Bolaños,1929,Michelle Yeoh,1962
Ang Lee,1954,Michelle Yeoh,1962
Richard Roundtree,1942,Michelle Yeoh,1962
Svetlana Orlova,1956,Michelle Yeoh,1962
Ernest Truex,1889,Michelle Yeoh,1962


<br/>

---

See the slides for most of the queries, which we executed in the interactive `psql` client.

We included just one query here; note we've truncated the result by randomly selecting 10 rows. This is generally good for debugging in Jupyter Notebooks (that being said, `jupysql` is smart enough to truncate most results).

Be sure to remove those debugging lines before submitting any final queries for projects!

## CAST

What's wrong with the following query?

In [6]:
%%sql
SELECT primary_title, type,
     premiered AS release_year,
     runtime_minutes,
     runtime_minutes/60 AS 
         runtime_hours
FROM titles
WHERE premiered >= 2020 AND
      premiered <= 2023;

primary_title,type,release_year,runtime_minutes,runtime_hours
Blood of Zeus,tvSeries,2020,30.0,0.0
Gods & Heroes,tvSeries,2020,30.0,0.0
Shaq Life,tvSeries,2020,,
What's After,tvSeries,2020,,
Utmark,tvSeries,2020,,
La Femme Anjola,movie,2021,140.0,2.0
Mr. Corman,tvSeries,2021,285.0,4.0
Player Vs Player with Trevor Noah,tvSeries,2021,,
Run for Young,tvSeries,2020,,
Poker Nights,tvSeries,2021,6.0,0.0


## CASE statements

What's wrong with the following query?

In [7]:
%%sql
SELECT
    person_id, name,
    died, born,
    died - born AS age                                                                     
FROM people;

person_id,name,died,born,age
nm0384214,Dwayne Hill,,,
nm0362443,Dave Hardman,,1960.0,
nm1560888,Rich Pryce-Jones,,,
nm0006669,William Sadler,,1950.0,
nm1373094,Giada De Laurentiis,,1970.0,
nm7316782,Janine Hartmann,,,
nm8671663,Tereza Taliánová,,2005.0,
nm10480297,Chris Heywood,,,
nm10803545,Chengao Zhou,,,
nm9849414,Mark Langley,,,


## Null values and boolean expressions

Compare/contrast the following three queries:

We are going to be using the `jupysql` library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.

In [8]:
%%sql
SELECT born
FROM people;

born
""
1960.0
""
1950.0
1970.0
""
2005.0
""
""
""


In [9]:
%%sql
SELECT born
FROM people
WHERE born < 2023 OR
    born IS NULL;

born
""
1960.0
""
1950.0
1970.0
""
2005.0
""
""
""


In [10]:
%%sql
SELECT born
FROM people
WHERE born < 2023;

born
1960
1950
1970
2005
1980
1926
1975
1919
1940
1929


## String matching and COUNT(*)

In [11]:
%%sql
SELECT *
FROM people
WHERE name LIKE 'Chris%';

person_id,name,born,died
nm10480297,Chris Heywood,,
nm9115948,Chris Bratt,,
nm6699360,Chris Evans,,
nm12363226,Chris Daniels,,
nm2653742,Chris Longo,,
nm5237685,Chris Evans,,
nm3074588,Christian Bland,,
nm5646425,Chris Evans,,
nm1470079,Chris Boiling,,
nm11632011,Chris Angold,,


## Multiple relations, Aliasing

In [12]:
%%sql
SELECT *
FROM
    akas, titles
WHERE
    titles.title_id = 
        akas.title_id;

title_id,title,region,language,types,attributes,is_original_title,title_id_1,type,primary_title,original_title,is_adult,premiered,ended,runtime_minutes,genres
tt0909144,46,US,,,,0,tt0909144,tvEpisode,46,46,0,1971,,,"Comedy,Family"
tt0909144,46,US,,,,0,tt0909144,tvEpisode,46,46,0,1971,,,"Comedy,Family"
tt3719148,Gustavo Lopez/Abel Pintos,AR,,,,0,tt3719148,tvEpisode,Gustavo Lopez/Abel Pintos,Gustavo Lopez/Abel Pintos,0,2012,,,"Comedy,Talk-Show"
tt3719148,Gustavo Lopez/Abel Pintos,AR,,,,0,tt3719148,tvEpisode,Gustavo Lopez/Abel Pintos,Gustavo Lopez/Abel Pintos,0,2012,,,"Comedy,Talk-Show"
tt9047618,The Women in the Sand,GB,,imdbDisplay,,0,tt9047618,movie,The Women in the Sand,The Women in the Sand,0,2017,,73.0,Documentary
tt9047618,The Women in the Sand,GB,,imdbDisplay,,0,tt9047618,movie,The Women in the Sand,The Women in the Sand,0,2017,,73.0,Documentary
tt1259521,La cabaña del terror,AR,,imdbDisplay,,0,tt1259521,movie,The Cabin in the Woods,The Cabin in the Woods,0,2011,,95.0,Horror
tt1259521,La cabaña del terror,AR,,imdbDisplay,,0,tt1259521,movie,The Cabin in the Woods,The Cabin in the Woods,0,2011,,95.0,Horror
tt5557622,एपिसोड #1.15,IN,hi,,,0,tt5557622,tvEpisode,Episode #1.15,Episode #1.15,0,2014,,59.0,Drama
tt5557622,एपिसोड #1.15,IN,hi,,,0,tt5557622,tvEpisode,Episode #1.15,Episode #1.15,0,2014,,59.0,Drama


In [13]:
%%sql
SELECT *
FROM
    akas AS A,
    titles T
WHERE
    A.title_id = T.title_id;

title_id,title,region,language,types,attributes,is_original_title,title_id_1,type,primary_title,original_title,is_adult,premiered,ended,runtime_minutes,genres
tt0909144,46,US,,,,0,tt0909144,tvEpisode,46,46,0,1971,,,"Comedy,Family"
tt0909144,46,US,,,,0,tt0909144,tvEpisode,46,46,0,1971,,,"Comedy,Family"
tt3719148,Gustavo Lopez/Abel Pintos,AR,,,,0,tt3719148,tvEpisode,Gustavo Lopez/Abel Pintos,Gustavo Lopez/Abel Pintos,0,2012,,,"Comedy,Talk-Show"
tt3719148,Gustavo Lopez/Abel Pintos,AR,,,,0,tt3719148,tvEpisode,Gustavo Lopez/Abel Pintos,Gustavo Lopez/Abel Pintos,0,2012,,,"Comedy,Talk-Show"
tt9047618,The Women in the Sand,GB,,imdbDisplay,,0,tt9047618,movie,The Women in the Sand,The Women in the Sand,0,2017,,73.0,Documentary
tt9047618,The Women in the Sand,GB,,imdbDisplay,,0,tt9047618,movie,The Women in the Sand,The Women in the Sand,0,2017,,73.0,Documentary
tt1259521,La cabaña del terror,AR,,imdbDisplay,,0,tt1259521,movie,The Cabin in the Woods,The Cabin in the Woods,0,2011,,95.0,Horror
tt1259521,La cabaña del terror,AR,,imdbDisplay,,0,tt1259521,movie,The Cabin in the Woods,The Cabin in the Woods,0,2011,,95.0,Horror
tt5557622,एपिसोड #1.15,IN,hi,,,0,tt5557622,tvEpisode,Episode #1.15,Episode #1.15,0,2014,,59.0,Drama
tt5557622,एपिसोड #1.15,IN,hi,,,0,tt5557622,tvEpisode,Episode #1.15,Episode #1.15,0,2014,,59.0,Drama


In [14]:
%%sql
SELECT *
FROM akas A
  INNER JOIN titles T
    ON A.title_id = T.title_id

title_id,title,region,language,types,attributes,is_original_title,title_id_1,type,primary_title,original_title,is_adult,premiered,ended,runtime_minutes,genres
tt0909144,46,US,,,,0,tt0909144,tvEpisode,46,46,0,1971,,,"Comedy,Family"
tt0909144,46,US,,,,0,tt0909144,tvEpisode,46,46,0,1971,,,"Comedy,Family"
tt3719148,Gustavo Lopez/Abel Pintos,AR,,,,0,tt3719148,tvEpisode,Gustavo Lopez/Abel Pintos,Gustavo Lopez/Abel Pintos,0,2012,,,"Comedy,Talk-Show"
tt3719148,Gustavo Lopez/Abel Pintos,AR,,,,0,tt3719148,tvEpisode,Gustavo Lopez/Abel Pintos,Gustavo Lopez/Abel Pintos,0,2012,,,"Comedy,Talk-Show"
tt9047618,The Women in the Sand,GB,,imdbDisplay,,0,tt9047618,movie,The Women in the Sand,The Women in the Sand,0,2017,,73.0,Documentary
tt9047618,The Women in the Sand,GB,,imdbDisplay,,0,tt9047618,movie,The Women in the Sand,The Women in the Sand,0,2017,,73.0,Documentary
tt1259521,La cabaña del terror,AR,,imdbDisplay,,0,tt1259521,movie,The Cabin in the Woods,The Cabin in the Woods,0,2011,,95.0,Horror
tt1259521,La cabaña del terror,AR,,imdbDisplay,,0,tt1259521,movie,The Cabin in the Woods,The Cabin in the Woods,0,2011,,95.0,Horror
tt5557622,एपिसोड #1.15,IN,hi,,,0,tt5557622,tvEpisode,Episode #1.15,Episode #1.15,0,2014,,59.0,Drama
tt5557622,एपिसोड #1.15,IN,hi,,,0,tt5557622,tvEpisode,Episode #1.15,Episode #1.15,0,2014,,59.0,Drama


# IMDB exercise 1

What does each record represent in the below result? Why?

In [15]:
%%sql
SELECT *
FROM titles

  INNER JOIN crew 
    ON crew.title_id = 
       titles.title_id


  INNER JOIN people
	 ON people.person_id = 
       crew.person_id;

title_id,type,primary_title,original_title,is_adult,premiered,ended,runtime_minutes,genres,title_id_1,person_id,category,job,person_id_1,name,born,died
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,70.0,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,70.0,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,70.0,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0008572,movie,The Silent Master,The Silent Master,0,1917,,70.0,"Crime,Drama",tt0008572,nm0913094,actor,,nm0913094,Robert Warwick,1878,1964
tt0009202,movie,The House of Glass,The House of Glass,0,1918,,50.0,Drama,tt0009202,nm0154352,director,,nm0154352,Emile Chautard,1864,1934
tt0009202,movie,The House of Glass,The House of Glass,0,1918,,50.0,Drama,tt0009202,nm0154352,director,,nm0154352,Emile Chautard,1864,1934


How do we modify the above query so that it gets the
titles and IDs of Michelle Yeoh movies?


Let's cache your query string using some fancy `jupysql` formatting:

In [16]:
# write your query below
# while it's bad style, for this to work,
# don't end with a semicolon.
ex1_query = """
-- fill in your query here --
"""

In [None]:
%%sql
{{ex1_query}};

# IMDB exercise 2

How do we write a query that gets the names of Michelle Yeoh movies
that have a rating of at least 8.0?

First, let's create a view called `yeoh_movies`. More in a bit.

In [None]:
%%sql
CREATE VIEW yeoh_movies AS (
  {{ex1_query}}
);

In [None]:
%sql SELECT * FROM yeoh_movies;

In [None]:
%%sql
SELECT primary_title
FROM ratings
INNER JOIN yeoh_movies
  ON ratings.title_id = yeoh_movies.title_id
WHERE rating >= 7.0;

## Quick Peek: The Natural Join

In [19]:
%%sql
SELECT *
FROM ratings
INNER JOIN yeoh_movies
  ON ratings.title_id = 
   yeoh_movies.title_id
WHERE rating >= 7.0;

title_id,rating,votes,primary_title,title_id_1
tt0190332,7.9,268227,"Crouching Tiger, Hidden Dragon",tt0190332
tt0397535,7.4,132457,Memoirs of a Geisha,tt0397535
tt0190332,7.8,241690,"Crouching Tiger, Hidden Dragon",tt0190332
tt0190332,7.9,268227,"Crouching Tiger, Hidden Dragon",tt0190332
tt0397535,7.4,132457,Memoirs of a Geisha,tt0397535
tt0190332,7.8,241690,"Crouching Tiger, Hidden Dragon",tt0190332


In [None]:
%%sql
SELECT *
FROM ratings
NATURAL JOIN yeoh_movies
WHERE rating >= 7.0;

In [None]:
%reload_ext sql

In [None]:
%%sql
SELECT DISTINCT titles.primary_title, titles.title_id
FROM titles
    INNER JOIN crew
        ON crew.title_id = titles.title_id
    INNER JOIN people
        ON people.person_id = crew.person_id
WHERE people.name = 'Morgan Freeman' AND titles.type = 'movie';

In [None]:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture

In [None]:
!pg_dump --encoding utf8 imdb_lecture -f imdb_lecture_final.sql 