# SQL for Data Science Examination: TMDb Database
#### © Explore Data Science Academy

## The TMDb Database

In this exam, I explored the [The Movie Database](https://www.themoviedb.org/) - an online movie and TV show database, which houses some of the most popular movies and TV shows at finger tips. The TMDb database supports 39 official languages used in over 180 countries daily, and dates all the way back to 2008.


<img src="images/sql_tmdb.jpg" width=80%/>


Below is an Entity Relationship diagram(ERD) of the TMDb database:

<img src="images/TMDB_ER_diagram.PNG" width=70%/>

As can be seen from the ER diagram, the TMDb database consists of `12 tables` containing information about movies, cast, genre and so much more.  



#### Getting started!

In [1]:
import sqlite3
import csv
from sqlalchemy import create_engine
%load_ext sql_magic

# create engine instance using sqlalchemy
engine = create_engine("sqlite:///data/TMDB.db")
%config SQL.conn_name = 'engine'

# create connection object using sqlite3
conn = sqlite3.connect('data/TMDB.db')
cursor = conn.cursor()

<br>
<br>

#### The query below was used to view all the tables in the database

In [2]:
%%read_sql

SELECT name FROM sqlite_master WHERE type IN ('table','view') AND name NOT LIKE 'sqlite_%' ORDER BY 1

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,name
0,Alan_Rickman_Movies
1,actors
2,casts
3,genremap
4,genres
5,keywordmap
6,keywords
7,languagemap
8,languages
9,movies


<br>
<br>

### Question

Who won the Oscar for “Actor in a Leading Role” in 2015? (Hint winner is indicated as '1.0')

In [3]:
%%read_sql

SELECT *
FROM oscars
WHERE award = 'Actor in a Leading Role'
AND year = '2015'
AND winner = '1.0'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,year,award,winner,name,film
0,2015,Actor in a Leading Role,1.0,Leonardo DiCaprio,The Revenant


<br>
<br>

### Question

What query will produce the ten oldest movies in the database?

In [4]:
%%read_sql

SELECT *
FROM movies
WHERE release_date IS NOT NULL
ORDER BY release_date ASC
LIMIT 10;

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,movie_id,title,release_date,budget,homepage,original_language,original_title,overview,popularity,revenue,runtime,release_status,tagline,vote_average,vote_count
0,3059,Intolerance,1916-09-04 00:00:00.000000,385907,,en,Intolerance,"The story of a poor young woman, separated by ...",3.232447,8394751.0,197.0,Released,The Cruel Hand of Intolerance,7.4,60
1,3060,The Big Parade,1925-11-05 00:00:00.000000,245000,,en,The Big Parade,The story of an idle rich boy who joins the US...,0.785744,22000000.0,151.0,Released,,7.0,21
2,19,Metropolis,1927-01-10 00:00:00.000000,92620000,,de,Metropolis,In a futuristic city sharply divided between t...,32.351527,650422.0,153.0,Released,There can be no understanding between the hand...,8.0,657
3,905,Pandora's Box,1929-01-30 00:00:00.000000,0,,de,Die Bnchse der Pandora,The rise and inevitable fall of an amoral but ...,1.824184,0.0,109.0,Released,,7.6,45
4,65203,The Broadway Melody,1929-02-08 00:00:00.000000,379000,,en,The Broadway Melody,"Harriet and Queenie Mahoney, a vaudeville act,...",0.968865,4358000.0,100.0,Released,The pulsating drama of Broadway's bared heart ...,5.0,19
5,22301,Hell's Angels,1930-11-15 00:00:00.000000,3950000,,en,Hell's Angels,Two brothers attending Oxford enlist with the ...,8.484123,8000000.0,127.0,Released,Howard Hughes' Thrilling Multi-Million Dollar ...,6.1,19
6,22649,A Farewell to Arms,1932-12-08 00:00:00.000000,4,,en,A Farewell to Arms,British nurse Catherine Barkley (Helen Hayes) ...,1.199451,25.0,89.0,Released,Every woman who has loved will understand,6.2,28
7,3062,42nd Street,1933-02-02 00:00:00.000000,439000,,en,42nd Street,A producer puts on what may be his last Broadw...,1.933366,2281000.0,89.0,Released,,6.1,37
8,43595,She Done Him Wrong,1933-02-09 00:00:00.000000,200000,,en,She Done Him Wrong,"""New York singer and nightclub owner Lady Lou ...",0.622752,2200000.0,66.0,Released,Mae West gives a 'Hot Time' to the nation!,5.1,27
9,3078,It Happened One Night,1934-02-22 00:00:00.000000,325000,,en,It Happened One Night,Ellie Andrews has just tied the knot with soci...,11.871424,4500000.0,105.0,Released,TOGETHER... for the first time,7.7,275


<br>
<br>

### Question
How many unique awards are there in the Oscars table?

In [5]:
%%read_sql

SELECT COUNT(DISTINCT award) AS "Number_of_Awards_in_Oscars"
FROM oscars

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_Awards_in_Oscars
0,114


<br>
<br>

### Question

How many movies are there that contain the word “Spider” within their title?

In [6]:
%%read_sql

SELECT COUNT(DISTINCT movie_id) AS "Number_of_movies_with_Spider_in_Title"
FROM movies
WHERE LOWER(title) LIKE '%spider%'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_movies_with_Spider_in_Title
0,9


<br>
<br>

### Question
How many movies are there that are both in the "Thriller" Genre and contains the word “love” anywhere in the keywords?

In [7]:
%%read_sql

SELECT COUNT(m.movie_id) AS "Number_of_Thriller_movies_and_with_love_keyword"
FROM movies m
INNER JOIN genremap gm
ON m.movie_id = gm.movie_id
INNER JOIN genres g
ON gm.genre_id = g.genre_id
INNER JOIN keywordmap km
ON km.movie_id = m.movie_id
INNER JOIN keywords k
ON km.keyword_id = k.keyword_id
WHERE g.genre_name = 'Thriller'
AND LOWER(k.keyword_name) LIKE '%love%'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_Thriller_movies_and_with_love_keyword
0,55


<br>
<br>

### Question
How many movies are there that were released between 1 August 2006 ('2006-08-01') and 1 October 2009 ('2009-10-01') that have a popularity score of more than 40 and a budget of less than 50 000 000?

In [8]:
%%read_sql

SELECT COUNT(movie_id) AS "Number_of_movies"
FROM movies
WHERE budget < 50000000
AND popularity > 40
AND release_date BETWEEN '2006-08-01' AND '2009-10-01'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_movies
0,29


<br>
<br>

### Question
How many unique characters has "Vin Diesel" played so far in the database?

Correct answer: 16

In [9]:
%%read_sql

SELECT COUNT(DISTINCT c.movie_id), a.actor_name AS "Number_of_unique_characters_played_by_Vin_Diesel"
FROM casts c
INNER JOIN actors a
ON c.actor_id = a.actor_id
WHERE LOWER(a.actor_name) = 'vin diesel'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,COUNT(DISTINCT c.movie_id),Number_of_unique_characters_played_by_Vin_Diesel
0,19,Vin Diesel


<br>
<br>

### Question

What are the Genres of the movie “The Royal Tenenbaums”?

In [10]:
%%read_sql

SELECT m.title, g.genre_name
FROM movies m
INNER JOIN genremap gm
ON m.movie_id = gm.movie_id
INNER JOIN genres g
ON g.genre_id = gm.genre_id
WHERE title = 'The Royal Tenenbaums'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,title,genre_name
0,The Royal Tenenbaums,Drama
1,The Royal Tenenbaums,Comedy


<br>
<br>

### Question

What are the three production companies that have the highest movie popularity score on average, as recorded within the database?

In [11]:
%%read_sql

SELECT pc.production_company_name, m.movie_id, avg(m.popularity) AS Avg_Popularity
FROM productioncompanies pc
INNER JOIN productioncompanymap pcm
ON pc.production_company_id = pcm.production_company_id
INNER JOIN movies m
ON pcm.movie_id = m.movie_id

GROUP BY pc.production_company_name
ORDER BY [Avg_Popularity] DESC
LIMIT 3;

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,production_company_name,movie_id,Avg_Popularity
0,The Donners' Company,293660,514.569956
1,Bulletproof Cupid,118340,481.098624
2,Kinberg Genre,246655,326.920999


<br>
<br>

### Question

How many female actors (i.e. gender = 1) have a name that starts with the letter "N"?

In [12]:
%%read_sql

SELECT COUNT(DISTINCT actor_id) AS "Number_of_female_actors_with_N_starting_their_names"
FROM actors
WHERE gender = 1
AND LOWER(LTRIM(actor_name)) LIKE 'n%'

Query started at 01:59:58 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_female_actors_with_N_starting_their_names
0,355


<br>
<br>

### Question

Which genre has, on average, the lowest movie popularity score?

In [13]:
%%read_sql

SELECT g.genre_name AS Genre_Name, avg(m.popularity) AS Avg_Popularity
FROM genres g
LEFT JOIN genremap gm
ON g.genre_id = gm.genre_id
LEFT JOIN movies m
ON gm.movie_id = m.movie_id

GROUP BY g.genre_name
ORDER BY [Avg_Popularity] ASC
LIMIT 5;

Query started at 01:59:59 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Genre_Name,Avg_Popularity
0,Foreign,0.686787
1,Documentary,3.945724
2,TV Movie,6.389415
3,Music,13.101512
4,Romance,15.962426


<br>
<br>

### Question

Which award category has the highest number of actor nominations (actors can be male or female)? (Hint `Oscars.name` contains both actors names and film names)

Correct answer: Actor in supporting role

In [14]:
%%read_sql

SELECT award AS Award, COUNT(name) AS Number_of_Nominations
FROM oscars
GROUP BY Award
ORDER BY [Number_of_Nominations] DESC
LIMIT 5;

Query started at 01:59:59 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Award,Number_of_Nominations
0,Directing,429
1,Film Editing,410
2,Actress in a Supporting Role,400
3,Actor in a Supporting Role,400
4,Documentary (Short Subject),348


<br>
<br>

### Question

For all of the entries in the Oscars table before 1934, the year is stored differently than in all the subsequent years. E.g the year would be saved as “1932/1933” instead of just “1933” (the second indicated year). Which of the following options would be the appropriate code to update this column to have the format of the year be consistent throughout the entire table (second indicated year only shown)?

In [15]:
%%read_sql


--Dropping the view if it exists
DROP VIEW Alan_Rickman_Movies;

Query started at 01:59:59 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x21c697e3d90>

In [16]:
%%read_sql

CREATE VIEW Alan_Rickman_Movies AS

SELECT title, release_date, tagline, overview
FROM Movies
LEFT JOIN Casts
ON Casts.movie_id = Movies.movie_id
LEFT JOIN Actors
ON Casts.actor_id = Actors.actor_id
WHERE Actors.actor_name = 'Alan Rickman'

Query started at 01:59:59 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x21c680b6310>

<br>
<br>

### Previous Question continuation

Viewing the created view

In [17]:
%%read_sql

SELECT *
FROM Alan_Rickman_Movies

Query started at 01:59:59 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,title,release_date,tagline,overview
0,Love Actually,2003-09-07 00:00:00.000000,The ultimate romantic comedy.,Follows seemingly unrelated people as their li...
1,Die Hard,1988-07-15 00:00:00.000000,40 Stories. Twelve Terrorists. One Cop.,"NYPD cop, John McClane's plan to reconcile wit..."
2,Harry Potter and the Philosopher's Stone,2001-11-16 00:00:00.000000,Let the Magic Begin.,Harry Potter has lived under the stairs at his...
3,Harry Potter and the Chamber of Secrets,2002-11-13 00:00:00.000000,Hogwarts is back in session.,"Ignoring threats to his life, Harry returns to..."
4,Harry Potter and the Prisoner of Azkaban,2004-05-31 00:00:00.000000,Something wicked this way comes.,"Harry, Ron and Hermione return to Hogwarts for..."
5,Harry Potter and the Goblet of Fire,2005-11-05 00:00:00.000000,Dark And Difficult Times Lie Ahead.,"Harry starts his fourth year at Hogwarts, comp..."
6,Harry Potter and the Order of the Phoenix,2007-06-28 00:00:00.000000,Evil Must Be Confronted.,Returning for his fifth year of study at Hogwa...
7,Harry Potter and the Half-Blood Prince,2009-07-07 00:00:00.000000,Dark Secrets Revealed,"As Harry begins his sixth year at Hogwarts, he..."
8,Galaxy Quest,1999-12-23 00:00:00.000000,A comedy of Galactic Proportions.,The stars of a 1970s sci-fi show - now scrapin...
9,Perfume: The Story of a Murderer,2006-09-13 00:00:00.000000,Based on the best-selling novel,"Jean-Baptiste Grenouille, born in the stench o..."
