# SQL Joins: TMDb Database
© Explore Data Science Academy

## Instructions to Students

This challenge is designed to determine how much you have learned so far and will test your knowledge on join SQL statements.

The answers for this challenge should be selected on Athena for each corresponding Multiple Choice Question. The questions are included in this notebook and are numbered according to the Athena Questions, the options to choose from for each question have also been included.

Do not add or remove cells in this notebook. Do not edit or remove the `%%sql` comment as it is required to run each cell.

**_Good Luck!_**

## Honour Code

I YOUR NAME, YOUR SURNAME, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the EDSA honour code (https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.

## The TMDb Database

In this challenge you will be exploring the [The Movie Database](https://www.themoviedb.org/) - an online movie and TV show database, which houses some of the most popular movies and TV shows at your finger tips. The TMDb database supports 39 official languages used in over 180 countries daily and dates all the way back to 2008. 


<img src="https://github.com/Explore-AI/Pictures/blob/master/sql_tmdb.jpg?raw=true" width=80%/>


Below is an Entity Relationship diagram(ERD) of the TMDb database:

<img src="https://github.com/Explore-AI/Pictures/blob/master/TMDB_ER_diagram.png?raw=true" width=70%/>

As can be seen from the ER diagram, the TMDb database consists of `12 tables` containing information about movies, cast, genre and so much more.  

Let's get started!

## Loading the database

To begin and start making use of SQL queries you need to prepare your SQL environment you can do this by loading in the magic command `%load_ext sql`, next you can go ahead and load in your database. To do this you will need to ensure you have downloaded the `TMDB.db`sqlite file from Athena and have stored it in a known location. Now that you have all the prerequisites you can go ahead and load it into the notebook. 

In [1]:
%load_ext sql

In [2]:
%%sql 

sqlite:///TMDB.db

## Questions on SQL Join Statements 

Use the given cell below each question to execute your SQL queries to find the correct input from the options provided for the multiple choice questions on Athena.

**Question 1**

What is the primary key for the table “movies”?

**Options:** 
 - title
 - movie_key
 - film_id
 - movie_id

**Solution**

In [13]:
%%sql 

--select * from sqlite_master where type = 'table' and name ='movies';

SELECT l.name FROM pragma_table_info("movies") as l WHERE l.pk = 1;

 * sqlite:///TMDB.db
Done.


name
movie_id


**Question 2**

How many foreign keys does the “LanguageMap” table have?

**Options:**

 - 0
 - 2
 - 3
 - 1

**Solution**

**Question 3**

How many movies in the database were produced by Pixar Animation Studios?

**Options:**
 - 16
 - 14
 - 18
 - 20

**Solution**

In [44]:
%%sql

select count(m.movie_id) as "no of movies", pm.production_company_id
from movies as m
left join productioncompanymap as pm
on m.movie_id = pm.movie_id
where pm.production_company_id = 3

 * sqlite:///TMDB.db
Done.


no of movies,production_company_id
16,3


In [38]:
%%sql
select pc.production_company_id, pc.production_company_name, count(pm.movie_id) as "no of movies"
from productioncompanies as pc
left join productioncompanymap as pm
on pc.production_company_id = pm.production_company_id
where pc.production_company_name = "Pixar Animation Studios"

 * sqlite:///TMDB.db
Done.


production_company_id,production_company_name,no of movies
3,Pixar Animation Studios,16


**Question 4**

What is the most popular action movie that has some German in it? (Hint: The German word for German is Deutsch)

**Options:**
 - The Bourne Identity
 - Mission: Impossible - Rogue Nation
 - Captain America: Civil War
 - Quantum of Solace

**Solution**

In [73]:
%%sql
select m.title, max(m.popularity) as "most popular", l.language_name
from movies as m
inner join languagemap as lm
on m.movie_id = lm.movie_id
inner join languages as l
on lm.iso_639_1 = l.iso_639_1
where l.language_name = "Deutsch"

 * sqlite:///TMDB.db
Done.


title,most popular,language_name
Captain America: Civil War,198.372395,Deutsch


**Question 5**

In how many movies did Tom Cruise portray the character Ethan Hunt? (Hint: Characters are listed in the Casts table.)

**Options:**
 - 4
 - 3
 - 6
 - 5

**Solution**

In [87]:
%%sql 

SELECT COUNT(DISTINCT(m.movie_id)) AS "No. of movies", c.characters
FROM movies as m
INNER JOIN casts as c
ON m.movie_id = c.movie_id
WHERE characters = "Ethan Hunt"

 * sqlite:///TMDB.db
Done.


No. of movies,characters
5,Ethan Hunt


**Question 6**

How many times was the actress Cate Blanchett nominated for an Oscar?
 
 **Options:**
 - 7
 - 4
 - 5
 - 2

**Solution**

In [90]:
%%sql 

SELECT COUNT(*) FROM oscars WHERE name = "Cate Blanchett"

 * sqlite:///TMDB.db
Done.


COUNT(*)
7


**Question 7**

How many movies were nominated for the Best Picture award at the Oscars?
 
**Options:**

 - 12
 - 16
 - 8
 - 18

**Solution**

In [100]:
%%sql 

SELECT o.award, o.name, m.title, count(m.movie_id) as "no of movies"
FROM oscars AS o
INNER JOIN movies AS m
ON o.name = m.title
-- GROUP BY o.award
-- LIMIT 20
WHERE o.award = "Best Picture"


 * sqlite:///TMDB.db
Done.


award,name,title,no of movies
Best Picture,The Big Short,The Big Short,8


**Question 8** 

How many movies contain at least one of the languages, Afrikaans or Zulu?

**Options:**
 - 10
 - 8
 - 12
 - 15

**Solution**

In [105]:
%%sql 

SELECT m.movie_id, l.language_name
FROM movies AS m
INNER JOIN languagemap AS lm
ON m.movie_id = lm.movie_id
INNER JOIN languages AS l
ON lm.iso_639_1 = l.iso_639_1
WHERE l.language_name = "Afrikaans" OR l.language_name = "Zulu"


 * sqlite:///TMDB.db
Done.


movie_id,language_name


In [112]:
%%sql
 
SELECT COUNT(1) FROM(
SELECT m.movie_id
FROM movies AS m
INNER JOIN languagemap AS lm
ON lm.movie_id = m.movie_id
INNER JOIN languages AS l
ON lm.iso_639_1 = l.iso_639_1 
WHERE
l.language_name = "Afrikaans" OR  l.language_name = "isiZulu"
GROUP BY  m.movie_id
    HAVING COUNT(DISTINCT l.language_name)>0
) as g ;

 * sqlite:///TMDB.db
Done.


COUNT(1)
8


**Question 9**

In which country was the movie “Star Wars” produced?  

**Options:**
 - Canada
 - United Kingdom
 - France
 - United States of America

**Solution**

In [113]:
%%sql 

SELECT * FROM sqlite_master WHERE type='table';


 * sqlite:///TMDB.db
Done.


type,name,tbl_name,rootpage,sql
table,actors,actors,2,"CREATE TABLE `actors` (  `actor_id` integer NOT NULL , `actor_name` varchar(100) DEFAULT NULL , `gender` integer DEFAULT NULL , PRIMARY KEY (`actor_id`) )"
table,casts,casts,320,"CREATE TABLE `casts` (  `movie_id` integer NOT NULL , `actor_id` integer NOT NULL , `characters` varchar(500) NOT NULL , PRIMARY KEY (`movie_id`,`actor_id`,`characters`) , CONSTRAINT `FK__Casts__actor_id__656C112C` FOREIGN KEY (`actor_id`) REFERENCES `actors` (`actor_id`) , CONSTRAINT `FK__Casts__movie_id__6477ECF3` FOREIGN KEY (`movie_id`) REFERENCES `movies` (`movie_id`) )"
table,genremap,genremap,1903,"CREATE TABLE `genremap` (  `movie_id` integer NOT NULL , `genre_id` integer NOT NULL , PRIMARY KEY (`movie_id`,`genre_id`) , CONSTRAINT `FK__GenreMap__genre___4E88ABD4` FOREIGN KEY (`genre_id`) REFERENCES `genres` (`genre_id`) , CONSTRAINT `FK__GenreMap__movie___4D94879B` FOREIGN KEY (`movie_id`) REFERENCES `movies` (`movie_id`) )"
table,genres,genres,1982,"CREATE TABLE `genres` (  `genre_id` integer NOT NULL , `genre_name` varchar(50) DEFAULT NULL , PRIMARY KEY (`genre_id`) )"
table,keywordmap,keywordmap,1983,"CREATE TABLE `keywordmap` (  `movie_id` integer NOT NULL , `keyword_id` integer NOT NULL , PRIMARY KEY (`movie_id`,`keyword_id`) , CONSTRAINT `FK__KeywordMa__keywo__5441852A` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`keyword_id`) , CONSTRAINT `FK__KeywordMa__movie__534D60F1` FOREIGN KEY (`movie_id`) REFERENCES `movies` (`movie_id`) )"
table,keywords,keywords,2234,"CREATE TABLE `keywords` (  `keyword_id` integer NOT NULL , `keyword_name` varchar(500) DEFAULT NULL , PRIMARY KEY (`keyword_id`) )"
table,languagemap,languagemap,2283,"CREATE TABLE `languagemap` (  `movie_id` integer NOT NULL , `iso_639_1` char(2) NOT NULL , PRIMARY KEY (`movie_id`,`iso_639_1`) , CONSTRAINT `FK__LanguageM__iso_6__59FA5E80` FOREIGN KEY (`iso_639_1`) REFERENCES `languages` (`iso_639_1`) , CONSTRAINT `FK__LanguageM__movie__59063A47` FOREIGN KEY (`movie_id`) REFERENCES `movies` (`movie_id`) )"
table,languages,languages,2331,"CREATE TABLE `languages` (  `iso_639_1` char(2) NOT NULL , `language_name` varchar(50) DEFAULT NULL , PRIMARY KEY (`iso_639_1`) )"
table,movies,movies,2333,"CREATE TABLE `movies` (  `movie_id` integer NOT NULL , `title` varchar(500) DEFAULT NULL , `release_date` datetime(6) DEFAULT NULL , `budget` integer DEFAULT NULL , `homepage` varchar(500) DEFAULT NULL , `original_language` varchar(50) DEFAULT NULL , `original_title` varchar(500) DEFAULT NULL , `overview` varchar(5000) DEFAULT NULL , `popularity` double DEFAULT NULL , `revenue` double DEFAULT NULL , `runtime` double DEFAULT NULL , `release_status` varchar(50) DEFAULT NULL , `tagline` varchar(500) DEFAULT NULL , `vote_average` double DEFAULT NULL , `vote_count` integer DEFAULT NULL , PRIMARY KEY (`movie_id`) )"
table,oscars,oscars,2922,"CREATE TABLE `oscars` (  `year` varchar(10) DEFAULT NULL , `award` varchar(500) DEFAULT NULL , `winner` varchar(10) DEFAULT NULL , `name` varchar(500) DEFAULT NULL , `film` varchar(500) DEFAULT NULL )"


In [115]:
%%sql 

SELECT pc.production_country_name, m.title
FROM productioncountries AS pc
INNER JOIN productioncountrymap AS pm
ON pc.iso_3166_1 = pm.iso_3166_1
INNER JOIN movies AS m
ON pm.movie_id = m.movie_id
WHERE m.title = "Star Wars"

 * sqlite:///TMDB.db
Done.


production_country_name,title
United States of America,Star Wars


**Question 10**

How many movies are in the database that are both a Romance and a Comedy?

**Options:**

 - 373
 - 484
 - 262
 - 595

**Solution**

In [138]:
%%sql 
SELECT COUNT(1) FROM(
SELECT m.movie_id
FROM movies AS m
INNER JOIN genremap AS gm
ON gm.movie_id = m.movie_id
INNER JOIN genres AS g
ON g.genre_id = gm.genre_id
WHERE
 g.genre_name IN ("Romance","Comedy")
GROUP BY m.movie_id
HAVING COUNT(DISTINCT g.genre_name)= 2)
AS total



 * sqlite:///TMDB.db
Done.


COUNT(1)
484
