# Gather Section (20 Questions, 40 Marks)

You should have access to a file called TMDB.db. The first step to answering this set of questions will be to connect to this db file to access the data. 

![SQL Architectures](https://raw.githubusercontent.com/Explore-AI/Public-Data/master/image/TMDB_ERD.JPG)

Before we start we need to load our SQL magic commands (we only need do this once per notebook):

In [2]:
%load_ext sql

*Note: we have to prepend a Jupyter notebook cell with `%%sql` in order to run a SQL query. Place your code in the '# Your code here' lines. If you experience trouble connecting to the .db file, please ensure that you have the `pymysql` and `ipython-sql` packages installed. Also ensure that `sqlalchemy` is pinned at a version <2.*


Based on that data, answer the following questions:


### Question 11
What is the code you can use to connect to the TMBD database that is saved in the same location with the Jupyter notebook?


In [3]:
%%sql 
sqlite:///Dataset/TMDB.db

### Question 12
What is the primary key for the table “movies”?



### Question 13
How many foreign keys does the “languagemap” table have?



### Question 14
What code would you use to set up a view of all movies that did not get released?


In [8]:
%%sql 
CREATE VIEW Not_Released AS SELECT * FROM movies WHERE release_status <> 'Released';

 * sqlite:///Dataset/TMDB.db
Done.


[]

### Question 15
How would you select only the title, release date, and release status columns from the view you created in the previous question?


In [9]:
%%sql 
Select title, release_date, release_status From Not_Released 

 * sqlite:///Dataset/TMDB.db
Done.


title,release_date,release_status
Little Big Top,2006-01-01 00:00:00.000000,Rumored
The Helix... Loaded,2005-01-01 00:00:00.000000,Rumored
Higher Ground,2011-08-26 00:00:00.000000,Post Production
Crying with Laughter,2009-06-01 00:00:00.000000,Rumored
The Harvest (La Cosecha),2011-07-29 00:00:00.000000,Rumored
The Naked Ape,2006-09-16 00:00:00.000000,Rumored
Brotherly Love,2015-04-24 00:00:00.000000,Post Production
Dancin' It's On,2015-10-16 00:00:00.000000,Post Production


### Question 16
How many movies are no longer using their original titles?



In [14]:
%%sql 
SELECT movie_id, title, release_date, budget, homepage,	original_language, original_title, 
    popularity, revenue, runtime, release_status, vote_average, vote_count -- removed tagline and overview
FROM movies LIMIT 3;

 * sqlite:///Dataset/TMDB.db
Done.


movie_id,title,release_date,budget,homepage,original_language,original_title,popularity,revenue,runtime,release_status,vote_average,vote_count
5,Four Rooms,1995-12-09 00:00:00.000000,4000000,,en,Four Rooms,22.87623,4300000.0,98.0,Released,6.5,530
11,Star Wars,1977-05-25 00:00:00.000000,11000000,http://www.starwars.com/films/star-wars-episode-iv-a-new-hope,en,Star Wars,126.393695,775398007.0,121.0,Released,8.1,6624
12,Finding Nemo,2003-05-30 00:00:00.000000,94000000,http://movies.disney.com/finding-nemo,en,Finding Nemo,85.688789,940335536.0,100.0,Released,7.6,6122


In [16]:
%%sql

WITH different_titles AS (
    SELECT movie_id, title, original_title
    FROM movies
    WHERE movies.title <> movies.original_title
)

SELECT COUNT(*) as 'number of movies with different titles' FROM different_titles;

 * sqlite:///Dataset/TMDB.db
Done.


number of movies with different titles
261


### Question 17
What is the most popular movie that was made after 01/01/2000 with a budget of more than $100 000 000? (Hint: Use the popularity field in the Movies table. Larger numbers are more popular.)


In [20]:
%%sql 

SELECT movie_id, title, release_date, budget, popularity FROM movies WHERE release_date > '01-01-2000' and budget > 100000000 ORDER BY popularity DESC LIMIT 10;

 * sqlite:///Dataset/TMDB.db
Done.


movie_id,title,release_date,budget,popularity
157336,Interstellar,2014-11-05 00:00:00.000000,165000000,724.247784
118340,Guardians of the Galaxy,2014-07-30 00:00:00.000000,170000000,481.098624
76341,Mad Max: Fury Road,2015-05-13 00:00:00.000000,150000000,434.278564
135397,Jurassic World,2015-06-09 00:00:00.000000,150000000,418.708552
22,Pirates of the Caribbean: The Curse of the Black Pearl,2003-07-09 00:00:00.000000,140000000,271.972889
119450,Dawn of the Planet of the Apes,2014-06-26 00:00:00.000000,170000000,243.791743
131631,The Hunger Games: Mockingjay - Part 1,2014-11-18 00:00:00.000000,125000000,206.227151
177572,Big Hero 6,2014-10-24 00:00:00.000000,165000000,203.73459
87101,Terminator Genisys,2015-06-23 00:00:00.000000,155000000,202.042635
271110,Captain America: Civil War,2016-04-27 00:00:00.000000,250000000,198.372395


### Question 18
How many movies are there that do not have English as their original language? 



In [22]:
%%sql 
SELECT COUNT(*) as 'movies without English as original language' from movies WHERE original_language <> 'en'

 * sqlite:///Dataset/TMDB.db
Done.


movies without English as original language
298


### Question 19
How many movies in the database were produced by Pixar Animation Studios?



In [33]:
%%sql 

PRAGMA table_info(movies);

 * sqlite:///Dataset/TMDB.db
Done.


cid,name,type,notnull,dflt_value,pk
0,movie_id,INTEGER,1,,1
1,title,varchar(500),0,,0
2,release_date,datetime(6),0,,0
3,budget,INTEGER,0,,0
4,homepage,varchar(500),0,,0
5,original_language,varchar(50),0,,0
6,original_title,varchar(500),0,,0
7,overview,varchar(5000),0,,0
8,popularity,double,0,,0
9,revenue,double,0,,0


In [38]:
%%sql
SELECT sql FROM sqlite_master WHERE type='table' AND name='movies';

 * sqlite:///Dataset/TMDB.db
Done.


sql
"CREATE TABLE `movies` (  `movie_id` integer NOT NULL , `title` varchar(500) DEFAULT NULL , `release_date` datetime(6) DEFAULT NULL , `budget` integer DEFAULT NULL , `homepage` varchar(500) DEFAULT NULL , `original_language` varchar(50) DEFAULT NULL , `original_title` varchar(500) DEFAULT NULL , `overview` varchar(5000) DEFAULT NULL , `popularity` double DEFAULT NULL , `revenue` double DEFAULT NULL , `runtime` double DEFAULT NULL , `release_status` varchar(50) DEFAULT NULL , `tagline` varchar(500) DEFAULT NULL , `vote_average` double DEFAULT NULL , `vote_count` integer DEFAULT NULL , PRIMARY KEY (`movie_id`) )"


In [37]:
%%sql
SELECT name FROM sqlite_master WHERE type='table';

 * sqlite:///Dataset/TMDB.db
Done.


name
actors
casts
genremap
genres
keywordmap
keywords
languagemap
languages
movies
oscars


In [41]:
%%sql
SELECT m.movie_id, m.title, pc.production_company_name as company_name from movies as m
LEFT JOIN productioncompanymap as pcm ON pcm.movie_id = m.movie_id
LEFT JOIN productioncompanies as pc ON pc.production_company_id = pcm.production_company_id
WHERE company_name = "Pixar Animation Studios"
LIMIT 3;

 * sqlite:///Dataset/TMDB.db
Done.


movie_id,title,company_name
12,Finding Nemo,Pixar Animation Studios
585,"Monsters, Inc.",Pixar Animation Studios
862,Toy Story,Pixar Animation Studios


In [42]:
%%sql
SELECT COUNT(*) as 'number of movies by Pixar Animations' from movies as m
LEFT JOIN productioncompanymap as pcm ON pcm.movie_id = m.movie_id
LEFT JOIN productioncompanies as pc ON pc.production_company_id = pcm.production_company_id
WHERE pc.production_company_name = "Pixar Animation Studios"
LIMIT 3;

 * sqlite:///Dataset/TMDB.db
Done.


number of movies by Pixar Animations
16


### Question 20
How many movies are in the database that are both a Romance and a Comedy?


In [78]:
%%sql 
--SELECT COUNT(*) as 'Movies that are both Romance and Comedy' from movies as m
SELECT m.movie_id, m.title, g.genre_name from movies as m
LEFT JOIN genremap as gm ON gm.movie_id = m.movie_id
LEFT JOIN genres as g ON g.genre_id = gm.genre_id
LEFT JOIN genremap as gm1 ON gm1.movie_id = m.movie_id
LEFT JOIN genres as g1 ON g1.genre_id = gm.genre_id
WHERE g.genre_name = "Romance" AND
WHERE g1.genre_name = "Comedy"
LIMIT 3;

 * sqlite:///Dataset/TMDB.db
(sqlite3.OperationalError) near "WHERE": syntax error
[SQL: --SELECT COUNT(*) as 'Movies that are both Romance and Comedy' from movies as m
SELECT m.movie_id, m.title, g.genre_name from movies as m
LEFT JOIN genremap as gm ON gm.movie_id = m.movie_id
LEFT JOIN genres as g ON g.genre_id = gm.genre_id
LEFT JOIN genremap as gm1 ON gm1.movie_id = m.movie_id
LEFT JOIN genres as g1 ON g1.genre_id = gm.genre_id
WHERE g.genre_name = "Romance" AND
WHERE g1.genre_name = "Comedy"
LIMIT 3;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


In [79]:
%%sql
-- This subquery finds all movies that are both "Romance" and "Comedy"
SELECT COUNT(*)  AS rom_com_movies
FROM (
    -- Select movie_id from genremap table as gm1
    SELECT gm1.movie_id
    FROM genremap gm1
    -- Join genres table as g1 to filter movies with genre "Romance"
    JOIN genres g1 ON gm1.genre_id = g1.genre_id
    -- Join genremap table again as gm2 to find the same movies with another genre
    JOIN genremap gm2 ON gm1.movie_id = gm2.movie_id
    -- Join genres table as g2 to filter movies with genre "Comedy"
    JOIN genres g2 ON gm2.genre_id = g2.genre_id
    -- Filter to keep only "Romance" and "Comedy" genres
    WHERE g1.genre_name = 'Romance' AND g2.genre_name = 'Comedy'
);


 * sqlite:///Dataset/TMDB.db
Done.


rom_com_movies
484


### Question 21
What is the most popular action movie that has some German in it? (Hint: The German word for German is Deutsch)

In [86]:
%%sql 
SELECT m.title, m.popularity, l.language_name from movies as m
INNER JOIN languagemap as lm ON lm.movie_id = m.movie_id
INNER JOIN languages as l ON l.iso_639_1 = lm.iso_639_1
WHERE l.language_name = "Deutsch"
ORDER BY m.popularity DESC
LIMIT 3

 * sqlite:///Dataset/TMDB.db
Done.


title,popularity,language_name
Captain America: Civil War,198.372395,Deutsch
Mission: Impossible - Rogue Nation,114.522237,Deutsch
The Fifth Element,109.528572,Deutsch


### Question 22
In how many movies did Tom Cruise portray the character Ethan Hunt?



In [92]:
%%sql 
SELECT c.characters, a.actor_name, m.title FROM movies as m
INNER JOIN casts as c ON c.movie_id = m.movie_id
INNER JOIN actors as a ON a.actor_id = c.actor_id
WHERE a.actor_name = "Tom Cruise" and c.characters = "Ethan Hunt"
LIMIT 10;

 * sqlite:///Dataset/TMDB.db
Done.


characters,actor_name,title
Ethan Hunt,Tom Cruise,Mission: Impossible
Ethan Hunt,Tom Cruise,Mission: Impossible II
Ethan Hunt,Tom Cruise,Mission: Impossible III
Ethan Hunt,Tom Cruise,Mission: Impossible - Ghost Protocol
Ethan Hunt,Tom Cruise,Mission: Impossible - Rogue Nation


### Question 23 
How many times was the actress Cate Blanchett nominated for an Oscar?

In [93]:
%%sql 
SELECT * from oscars as o
WHERE o.name = "Cate Blanchett"
LIMIT 10;

 * sqlite:///Dataset/TMDB.db
Done.


year,award,winner,name,film
1998,Actress in a Leading Role,,Cate Blanchett,Elizabeth
2004,Actress in a Supporting Role,1.0,Cate Blanchett,The Aviator
2006,Actress in a Supporting Role,,Cate Blanchett,Notes on a Scandal
2007,Actress in a Leading Role,,Cate Blanchett,Elizabeth: The Golden Age
2007,Actress in a Supporting Role,,Cate Blanchett,I'm Not There
2013,Actress in a Leading Role,1.0,Cate Blanchett,Blue Jasmine
2015,Actress in a Leading Role,,Cate Blanchett,Carol


### Question 24
How many movies contain at least one of the official South African Languages, Afrikaans or Zulu?

In [107]:
%%sql 

SELECT m.title, m.popularity, l.language_name from movies as m
INNER JOIN languagemap as lm ON lm.movie_id = m.movie_id
INNER JOIN languages as l ON l.iso_639_1 = lm.iso_639_1
WHERE l.language_name in ("Afrikaans", "isiZulu")
LIMIT 15

 * sqlite:///Dataset/TMDB.db
Done.


title,popularity,language_name
Tsotsi,2.504169,Afrikaans
Catch a Fire,4.052219,Afrikaans
Blood Diamond,52.792678,Afrikaans
District 9,63.13678,Afrikaans
Gangster's Paradise: Jerusalema,1.717376,Afrikaans
Safe House,34.773106,Afrikaans
Mandela: Long Walk to Freedom,15.50957,Afrikaans
Tsotsi,2.504169,isiZulu
Catch a Fire,4.052219,isiZulu
District 9,63.13678,isiZulu


In [105]:
%%sql

SELECT COUNT(DISTINCT m.movie_id) as movie_count
FROM movies as m
INNER JOIN languagemap as lm ON lm.movie_id = m.movie_id
INNER JOIN languages as l ON l.iso_639_1 = lm.iso_639_1
WHERE l.language_name IN ("Afrikaans", "isiZulu");

 * sqlite:///Dataset/TMDB.db
Done.


movie_count
8


### Question 25
Which of the movies mentioned above is the most popular?


In [114]:
%%sql
SELECT DISTINCT m.title, l.language_name, m.popularity
FROM movies AS m
INNER JOIN LanguageMap AS lm ON m.movie_id = lm.movie_id
INNER JOIN Languages AS l ON lm.iso_639_1 = l.iso_639_1
WHERE l.language_name = 'Afrikaans'
   OR l.language_name = 'isiZulu'
ORDER BY m.popularity DESC

 * sqlite:///Dataset/TMDB.db
Done.


title,language_name,popularity
District 9,Afrikaans,63.13678
District 9,isiZulu,63.13678
Blood Diamond,Afrikaans,52.792678
Safe House,Afrikaans,34.773106
Mandela: Long Walk to Freedom,Afrikaans,15.50957
Catch a Fire,Afrikaans,4.052219
Catch a Fire,isiZulu,4.052219
Tsotsi,Afrikaans,2.504169
Tsotsi,isiZulu,2.504169
Gangster's Paradise: Jerusalema,Afrikaans,1.717376


### Question 26
What would be the code to change the name of the language with the ‘zh’ iso code in the “language” table to ‘Chinese’?


In [116]:
%%sql 
SELECT * FROM languages where iso_639_1 = 'zh'

 * sqlite:///Dataset/TMDB.db
Done.


iso_639_1,language_name
zh,???


In [118]:
%%sql
UPDATE languages SET language_name = 'Chinese' WHERE iso_639_1 = 'zh'

 * sqlite:///Dataset/TMDB.db
1 rows affected.


[]

In [119]:
%%sql 
SELECT * FROM languages where iso_639_1 = 'zh'

 * sqlite:///Dataset/TMDB.db
Done.


iso_639_1,language_name
zh,Chinese


### Question 27
What would be the code to insert a new genre called ‘Sport’ with an id of 10? 


In [120]:
%%sql 
SELECT * from genres

 * sqlite:///Dataset/TMDB.db
Done.


genre_id,genre_name
12,Adventure
14,Fantasy
16,Animation
18,Drama
27,Horror
28,Action
35,Comedy
36,History
37,Western
53,Thriller


In [121]:
%%sql
INSERT INTO genres (genre_id, genre_name) Values (10, 'Sport') 

 * sqlite:///Dataset/TMDB.db
1 rows affected.


[]

In [122]:
%%sql 
SELECT * from genres

 * sqlite:///Dataset/TMDB.db
Done.


genre_id,genre_name
10,Sport
12,Adventure
14,Fantasy
16,Animation
18,Drama
27,Horror
28,Action
35,Comedy
36,History
37,Western


### Question 28 
You have just watched The Flintstones movie and did not find it very funny. What code would delete the entry that links The Flintstones to the Comedy genre?


In [127]:
%%sql
SELECT m.title, m.movie_id, g.genre_name, g.genre_id from movies as m
INNER JOIN genremap as gm ON gm.movie_id = m.movie_id
INNER JOIN genres as g ON g.genre_id = gm.genre_id
WHERE g.genre_name = "Comedy" and m.title like "%Flintstones%"

 * sqlite:///Dataset/TMDB.db
Done.


title,movie_id,genre_name,genre_id
The Flintstones,888,Comedy,35
The Flintstones in Viva Rock Vegas,889,Comedy,35


In [128]:
%%sql
DELETE FROM genremap WHERE genre_id = 35 and movie_id = 888

 * sqlite:///Dataset/TMDB.db
1 rows affected.


[]

### Question 29
What code will give the 10 most recently released movies in the database? 


In [135]:
%%sql 
SELECT title, release_date FROM movies ORDER BY release_date DESC LIMIT 10 

 * sqlite:///Dataset/TMDB.db
Done.


title,release_date
Growing Up Smith,2017-02-03 00:00:00.000000
Two Lovers and a Bear,2016-10-02 00:00:00.000000
Mr. Church,2016-09-16 00:00:00.000000
The Birth of a Nation,2016-09-09 00:00:00.000000
Kicks,2016-09-09 00:00:00.000000
Antibirth,2016-09-02 00:00:00.000000
Hands of Stone,2016-08-26 00:00:00.000000
Ben-Hur,2016-08-17 00:00:00.000000
Pete's Dragon,2016-08-10 00:00:00.000000
Suicide Squad,2016-08-02 00:00:00.000000


### Question 30
What code would you use to add a column to the language table that could be used for the English names of the different languages?

In [137]:
%%sql 
ALTER TABLE languages ADD language_english_name varchar(50) 

 * sqlite:///Dataset/TMDB.db
Done.


[]