# MySQL SELECT from Database

In [None]:
from sqlalchemy import create_engine

conn_string = 'mysql://{user}:{password}@{host}/{database}?charset=utf8'.format(
    host = 'mysql-techub-2300010003-spring.db', 
    user = 'dbreader',
    password = 'ub232023',
    database = 'imdb')

engine = create_engine(conn_string)
con = engine.connect()

In [None]:
# Prepare sql_magic library that enable to query to database easily.
%reload_ext sql_magic
%config SQL.conn_name = 'engine'

#### PLEASE RUN CELL BELOW. This cell limits the maximum number of records to obtain.

In [None]:
%%read_sql -n
SET sql_safe_updates=1, sql_select_limit=1000, max_join_size=1000000000;

If the query above runs, you will see an output like

```
Query started at 06:53:04 PM EDT; Query executed in 0.00 m
<sql_magic.exceptions.EmptyResult at 0x7fcc5c10a8e0>
```

#### Now we are all set! Let us start querying data from IMDB database.

In [None]:
%%read_sql
SHOW DATABASES;

Declare we are using imdb database.

In [None]:
%%read_sql
USE imdb;

This shows the list of tables "NameBasics, TitleAkas, TitleBasics..."

In [None]:
%%read_sql
SHOW TABLES;

In [None]:
%%read_sql
SELECT COUNT(*) FROM TitleBasics;

In [None]:
%%read_sql
DESCRIBE TitleBasics;

#### Exercise

Pick one of the other tables and **Count** the number of records in the table. The list of databases are as follows. 

1. NameBasics
2. TitleAkas
3. TitleCrew
4. TitleEpisode
5. TitleRatings

In [None]:
%%read_sql
# REMOVE THIS COMMENT AND ADD YOUR COUNT QUERY

#### Now that we have a big data! Let us select all the data. The following query would return all the data, but thanks to limitation we only obtain the first 1000 rows.

In [None]:
%%read_sql
SELECT * FROM TitleBasics;

#### LIMIT Clause restricts the number of records to get.

The following query retrieves 10 records (the first 10 movies in the imdb database).

In [None]:
%%read_sql
SELECT * FROM TitleBasics LIMIT 10;

#### Exercise 

Query all the data in `TitlePrincipals` and `TitleRatings` tables. 

In [None]:
%%read_sql
# Your SQL query here;

In [None]:
%%read_sql
# Your SQL query here;

#### Obtain the schema of "TitleBasics" table.

In [None]:
# This shows the table schema (structure) of "TitleBasics" table.
%%read_sql
# Your SQL query here.

WHILE `SELECT *...` returns all the columns of the records, `SELECT COLUMN1,COLUMN2...` returns specifiedcolumns.

In [None]:
# Obtain `originalTitle` column from TitleBasics table.
%%read_sql
# Your SQL query here.

For some reason, the first 1000 records are not very interesting. We do not care at this moment.

#### Exercise

obtain `startyear` and `endyear` from `TitleBasics` table.

In [None]:
%%read_sql
# Your SQL query here;

#### DISTINCT clause
Used to eliminate duplicates in the results.


In [None]:
%%read_sql
SELECT DISTINCT titleType 
FROM TitleBasics
;

#### Non-restricted select query returns all the data. In the most cases this does not make sense (I do not need all the movies!). In the following, we use WHERE clause to restrict the query.

In [None]:
# This query returns 'Frozen'
%%read_sql
SELECT *  
FROM TitleBasics
WHERE originalTitle = 'Frozen'
;

We find there are many movies / TVs with their title `Frozen`. Let us add the conditions of the query more.

In [None]:
# This query returns 'Frozen'
%%read_sql
SELECT *  
FROM TitleBasics
WHERE originalTitle = 'Frozen'
AND startYear = 2013
;

In [None]:
# Let us restrict our interest to movie
# This query returns the Disney's  Frozen.
%%read_sql
SELECT *  
FROM TitleBasics
WHERE originalTitle = 'Frozen'
AND startYear = 2013
AND titleType = 'movie'
;

#### Exercise
Find all the movies with its `runtimeMinutes` exceeds one day (24 x 60 mins). **Super-long** movies! Use comparison operator **`>`**.

In [None]:
%%read_sql
# Your SQL query here.

Find all the information about movies released (`startYear`) between 1895 and 1898.

In [None]:
%%read_sql
# Your SQL query here.

Find all the all information about *TVSeries* released (or starting) in 2021, of genre *Action*.


In [None]:
%%read_sql
# Your SQL query here.

#### Using Like: Now we are going to find `Frozen 2`, which released as a sequel of Disney's Frozen in 2019. I am not sure whether it is stored as `Frozen 2`, `Frozen II`, or any spaces there are. Using `Like` query can fix this problem. 

In [None]:
# This query returns 'Frozen' plus something. %% matches all the characters
%%read_sql
SELECT *  
FROM TitleBasics
WHERE originalTitle LIKE 'Frozen%%'
AND startYear = 2019
AND titleType = 'movie'
;

Confirm the information that we obtained above with the following URL 

> https://www.imdb.com/title/"tconst"

where "tconst" is replaced by the tconst of the movie.




#### The NULL value 

When columns do not have a value, they are assigned a `NULL` value, which is a special way that SQL handles the `empty value`. Check there are null values in each column.


In [None]:
# The following lists up all the movies with no endYear. We do not use equal, and instead we use 'is'. You can also confirm that ""endYear = NULL"" does not work.
%%read_sql
SELECT * 
FROM TitleBasics
WHERE endYear is NULL
;

#### ORDER BY

We can sort the records by using **ORDER BY** clause.

In [None]:
# The following lists up all the movies in 2020.
%%read_sql
SELECT * 
FROM TitleBasics
WHERE startYear = 2020
AND titleType = 'movie'
;

In [None]:
# The following lists up all the movies in 2020, sorted by primaryTitle
%%read_sql
SELECT * 
FROM TitleBasics
WHERE startYear = 2020
AND titleType = 'movie'
ORDER BY primaryTitle
;

In [None]:
# We can restrict our interest to the movies that start with Bad
%%read_sql
SELECT * 
FROM TitleBasics
WHERE startYear = 2020
AND titleType = 'movie'
AND primaryTitle LIKE 'Bad%%'
ORDER BY primaryTitle
;

#### IN clause

Fetch all info for the movies with its title **in** ("X-Men", "Spider-Man", "Captain Marvel"). Use “IN” clause.

In [None]:
# We can restrict our interest to the movies that start with Bad
%%read_sql
SELECT * 
FROM TitleBasics
WHERE 
titleType = 'movie'
AND primaryTitle in ("X-Men", "Spider-Man", "Captain Marvel")
ORDER BY primaryTitle
;

#### AS clause

Sometimes we want to rename a column to have a more descriptive name in the results.

In [None]:
%%read_sql
SELECT primaryTitle as Title
FROM TitleBasics
WHERE 
titleType = 'movie'
AND primaryTitle in ("X-Men", "Spider-Man", "Captain Marvel")
ORDER BY primaryTitle
;

#### Exercise

Find your **favorite** movie (or TV series) by using **SELECT** query with **WHERE** clause and (possibly) **LIKE** and **IN**.

In [None]:
%%read_sql
# Your query here.

#### Exercise

Find your **favorite** actors/actress/writer (or other people). They are in `NameBasics` table.

In [None]:
#Information of NameBasics table.
%%read_sql
DESCRIBE NameBasics;

In [None]:
%%read_sql
# Your SQL query here.

#### Exercise

Find the oscar-winning movie CODA (2021).

In [None]:
%%read_sql
# Your SQL query here.