<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# SQL Exam
© ExploreAI Academy

## Instructions to students

This challenge is designed to determine how much you have learned so far and will test your knowledge on SQL.

The answers for this challenge should be selected on Athena for each corresponding multiple-choice question. The questions are included in this notebook and are numbered according to the Athena questions. The options for each question have also been included.

Do not add or remove cells in this notebook. Do not edit or remove the `%%sql` comment as it is required to run each cell.

**_Good luck!_**

## Honour code

I, **Derick Malavi**, confirm – by submitting this document – that the solutions in this notebook are a result of my own work and that I abide by the EDSA honour code (https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.

## The TMDb database

In this supplementary exam, you will be exploring [The Movie Database](https://www.themoviedb.org/) – an online movie and TV show database that houses some of the most popular movies and TV shows at your fingertips. The TMDb database supports 39 official languages used in over 180 countries daily and dates back all the way to 2008. 


<img src="https://github.com/Explore-AI/Pictures/blob/master/sql_tmdb.jpg?raw=true" width=80%/>


Below is an Entity Relationship Diagram (ERD) of the TMDb database:

<img src="https://github.com/Explore-AI/Pictures/blob/master/TMDB_ER_diagram.png?raw=true" width=70%/>

As can be seen from the ERD, the TMDb database consists of `12 tables` containing information about movies, cast, genre, and so much more.  

Let's get started!

## Loading the database

Before you begin, you need to prepare your SQL environment.  You can do this by loading the magic command `%load_ext sql`.

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

Next, go ahead and load your database. To do this, you will need to ensure you have downloaded the `TMDB.db` sqlite file from Athena and have stored it in a known location.

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql sqlite:///TMDB_db.db

'Connected: @TMDB_db.db'

If the above line didn't throw out any errors, then you should be good to go. Good luck with the exam! 

## Questions on SQL

Use the given cell below each question to execute your SQL queries to find the correct input from the options provided. Your solution should match one of the multiple-choice questions on Athena.

### Question 1

Who won the Oscar for “Actor in a Leading Role” in  2015?

(Hint: The winner is indicated as '1.0'.)

**Options:** 

  - Micheal Fassbender
  - Natalie Portman
  - <mark>Leonardo DiCaprio</mark>
  - Eddie Redmayne


In [10]:
%%sql
SELECT *
FROM oscars
WHERE
    award = 'Actor in a Leading Role'
    AND year = 2015 AND winner = 1.0;

 * sqlite:///TMDB_db.db
Done.


year,award,winner,name,film
2015,Actor in a Leading Role,1.0,Leonardo DiCaprio,The Revenant


### Question 2

What query will produce the ten oldest movies in the database?

**Options:**

 - SELECT TOP(10) * FROM movies WHERE release_date ORDER BY release_date ASC

 - <mark>SELECT  * FROM movies WHERE release_date IS NOT NULL ORDER BY release_date ASC LIMIT 10</mark>

 - SELECT * FROM movies WHERE release_date IS NOT NULL ORDER BY release_date DESC LIMIT 10

 -  SELECT * FROM movies WHERE release_date IS NULL ORDER BY release_date DESC LIMIT 10

In [15]:
%%sql
SELECT 
    release_date 
FROM 
    movies 
WHERE 
    release_date IS NOT NULL 
ORDER BY 
    release_date ASC
LIMIT 10;

 * sqlite:///TMDB_db.db
Done.


release_date
1916-09-04 00:00:00.000000
1925-11-05 00:00:00.000000
1927-01-10 00:00:00.000000
1929-01-30 00:00:00.000000
1929-02-08 00:00:00.000000
1930-11-15 00:00:00.000000
1932-12-08 00:00:00.000000
1933-02-02 00:00:00.000000
1933-02-09 00:00:00.000000
1934-02-22 00:00:00.000000


### Question 3

How many unique awards are there in the Oscars table?

**Options:**
 - 141
 - 53 
 - 80
 - <mark>114</mark>

In [19]:
%%sql
SELECT 
    COUNT(DISTINCT award) as unique_awards
FROM 
    oscars;

 * sqlite:///TMDB_db.db
Done.


unique_awards
114


### Question 4

How many movies are there that contain the word “Spider” within their title?

**Options:**
 - 0
 - 5
 - 1
 - <mark>9</mark>

In [20]:
%%sql
SELECT
    COUNT(title) as spider_movies
FROM
    movies
WHERE
    title LIKE '%Spider%';

 * sqlite:///TMDB_db.db
Done.


spider_movies
9


### Question 5

How many movies are there that are both in the "Thriller" genre and contain the word “love” anywhere in the keywords?
- How about a WILDCARD------> Gives  value of 55


**Options:**
 - <mark>48</mark>
 - 38
 - 14
 - 1

In [4]:
%%sql
SELECT 
    COUNT(*) as thriller_love_movies
FROM
    (SELECT
        m.title,
        g.genre_id,
        gn.genre_name,
        kw.keyword_name
    FROM
        movies as m
    JOIN
       genremap as g 
        ON m.movie_id = g.movie_id
    JOIN
        genres as gn
        ON g.genre_id = gn.genre_id
    JOIN
        keywordmap as kwm
        ON m.movie_id = kwm.movie_id
    JOIN
        keywords as kw
        ON kwm.keyword_id = kw.keyword_id
    WHERE
         (gn.genre_name = 'Thriller' AND
        kw.keyword_name LIKE'%love%')
);

 * sqlite:///TMDB_db.db
Done.


thriller_love_movies
55


### Question 6

How many movies are there that were released between 1 August 2006 ('2006-08-01') and 1 October 2009 ('2009-10-01') that have a popularity score of more than 40 and a budget of less than 50 000 000?

 
**Options:**

 - <mark>29</mark>
 - 23
 - 28
 - 35

In [54]:
%%sql
SELECT
    COUNT(*) as selected_movies
FROM(
    SELECT
        release_date,
        budget,
        popularity
    FROM
        movies
    WHERE
        release_date BETWEEN '2006-08-01' AND '2009-10-01'
        AND popularity > 40
        AND budget < 50000000
);

 * sqlite:///TMDB_db.db
Done.


selected_movies
29


### Question 7

How many unique characters has "Vin Diesel" played so far in the database?

**Options:**
 - 24
 - 19
 - 18
 - <mark>16</mark>

In [61]:
%%sql
SELECT
    COUNT(DISTINCT characters) as unique_characters_Vin_Diesel
FROM
   (SELECT
        a.actor_name,
        c.characters
    FROM
        actors as a
    JOIN
        casts as c
        ON a.actor_id = c.actor_id
    WHERE
        actor_name =  "Vin Diesel"
);

 * sqlite:///TMDB_db.db
Done.


unique_characters_Vin_Diesel
16


### Question 8

What are the genres of the movie “The Royal Tenenbaums”?


**Options:**
 - Action, Romance
 - <mark>Drama, Comedy</mark>
 - Crime, Thriller
 - Drama, Romance

In [65]:
%%sql
SELECT
    m.title,
    gn.genre_name
 FROM
    movies as m
JOIN
     genremap as g 
     ON m.movie_id = g.movie_id
 JOIN
    genres as gn
    ON g.genre_id = gn.genre_id
WHERE
     title = "The Royal Tenenbaums";

 * sqlite:///TMDB_db.db
Done.


title,genre_name
The Royal Tenenbaums,Drama
The Royal Tenenbaums,Comedy


### Question 9

What are the three production companies that have the highest movie popularity score on average, as recorded within the database?


**Options:**

 - MCL Films S.A., Turner Pictures, and George Stevens Productions
 - <mark>The Donners' Company, Bulletproof Cupid, and Kinberg Genre</mark>
 - Bulletproof Cupid, The Donners' Company, and MCL Films S.A
 - B.Sting Entertainment, Illumination Pictures, and Aztec Musique

In [69]:
%%sql
SELECT
    pc.production_company_name,
    ROUND(AVG(m.popularity),2) as avg_popularity
FROM
    productioncompanies as pc
JOIN
    productioncompanymap as pcm
    ON pc.production_company_id = pcm.production_company_id
JOIN
    movies as m
    ON pcm.movie_id = m.movie_id
GROUP BY
    production_company_name
ORDER BY
    avg_popularity DESC
LIMIT 3;

 * sqlite:///TMDB_db.db
Done.


production_company_name,avg_popularity
The Donners' Company,514.57
Bulletproof Cupid,481.1
Kinberg Genre,326.92


### Question 10

How many female actors (i.e. gender = 1) have a name that starts with the letter "N"?


**Options:**

 - 0
 - <mark>355</mark>
 - 7335
 - 1949

In [72]:
%%sql
SELECT
    COUNT(actor_name) as number_of_female_actors
FROM(
SELECT
    gender,
    actor_name
FROM
    actors
WHERE
    gender = 1
    AND actor_name LIKE 'N%'
);

 * sqlite:///TMDB_db.db
Done.


number_of_female_actors
355


### Question 11

Which genre has, on average, the lowest movie popularity score? 


**Options:**

 - Science Fiction
 - Animation
 - Documentary
 - <mark>Foreign</mark>

In [76]:
%%sql
SELECT
    gn.genre_name,
    ROUND(AVG(m.popularity),2) as avg_popularity_asc
 FROM
    movies as m
JOIN
     genremap as g 
     ON m.movie_id = g.movie_id
 JOIN
    genres as gn
    ON g.genre_id = gn.genre_id
GROUP BY 
    genre_name
ORDER BY
    avg_popularity_asc ASC;

 * sqlite:///TMDB_db.db
Done.


genre_name,avg_popularity_asc
Foreign,0.69
Documentary,3.95
TV Movie,6.39
Music,13.1
Romance,15.96
History,17.44
Drama,17.76
Comedy,18.22
Western,18.24
Horror,18.3


### Question 12

Which award category has the highest number of actor nominations (actors can be male or female)? (Hint: `Oscars.name` contains both actors' names and film names.)

**Options:**

- Special Achievement Award
- <mark>Actor in a Supporting Role</mark>
- Actress in a Supporting Role
- Best Picture



In [3]:
%%sql
SELECT 
    award,
    name,
    film,
    COUNT(award) as award_nomination
FROM 
    oscars
WHERE
    award LIKE'%actor%'
    OR award LIKE '%actress%'
    AND winner = 1.0
GROUP BY
    award
ORDER BY
    award_nomination DESC;

 * sqlite:///TMDB_db.db
Done.


award,name,film,award_nomination
Actress in a Supporting Role,Gale Sondergaard,Anthony Adverse,1
Actress in a Leading Role,Faye Dunaway,Network,1
Actress,Janet Gaynor,7th Heaven,1
Actor in a Supporting Role,Mischa Auer,My Man Godfrey,1
Actor in a Leading Role,Robert De Niro,Taxi Driver,1
Actor,Richard Barthelmess,The Noose,1


In [98]:
%%sql
SELECT*
FROM oscars
LIMIT 2;

 * sqlite:///TMDB_db.db
Done.


year,award,winner,name,film
1928,Actor,,Richard Barthelmess,The Noose
1928,Actor,1.0,Emil Jannings,The Last Command


### Question 13

For all of the entries in the Oscars table before 1934, the year is stored differently than in all the subsequent years. For example, the year would be saved as “1932/1933” instead of just “1933” (the second indicated year).  Which of the following options would be the appropriate code to update this column to have the format of the year be consistent throughout the entire table (second indicated year only shown)?


**Options:**

- `UPDATE Oscars SET year = RIGHT(year, -4)`
- `UPDATE Oscars SET year = SELECT substr(year, -4)`
- <mark> UPDATE Oscars SET year = substr(year, -4) </mark>
- `UPDATE Oscars year =  substr(year, 4)`

In [None]:
%%sql
UPDATE Oscars SET year = substr(year, -4)

### Question 14

DStv will be having a special week dedicated to the actor Alan Rickman. Which of the following queries would create a new _view_ that shows the titles, release dates, taglines, and overviews of all movies that Alan Rickman has played in?



**Options:**

- SELECT title, release_date, tagline, overview 
FROM Movies LEFT JOIN Casts ON Casts.movie_id = Movies.movie_id Left JOIN Actors ON Casts.actor_id = Actors.actor_id 
WHERE Actors.actor_name = 'Alan Rickman'
AS VIEW Alan_Rickman_Movies

- <mark>CREATE VIEW Alan_Rickman_Movies AS  
SELECT title, release_date, tagline, overview FROM Movies  
LEFT JOIN Casts ON Casts.movie_id = Movies.movie_id Left JOIN Actors
ON Casts.actor_id = Actors.actor_id
WHERE Actors.actor_name = 'Alan Rickman'</mark> 


- CREATE NEW VIEW  Name  = Alan_Rickman_Movies AS SELECT title, release_date, tagline, overview FROM Movies LEFT JOIN Casts ON Casts.movie_id = Movies.movie_id Left JOIN Actors ON Casts.actor_id = Actors.actor_id WHERE Actors.actor_name = 'Alan Rickman'

- VIEW Alan_Rickman_Movies AS SELECT title, release_date, tagline, overview FROM Movies LEFT JOIN Casts ON Casts.movie_id = Movies.movie_id Left JOIN Actors ON Casts.actor_id = Actors.actor_id WHERE Actors.actor_name = 'Alan Rickman'

In [None]:
%%sql
CREATE VIEW Alan_Rickman_Movies AS
SELECT title, release_date, tagline, overview FROM Movies
LEFT JOIN Casts ON Casts.movie_id = Movies.movie_id Left JOIN Actors ON Casts.actor_id = Actors.actor_id WHERE Actors.actor_name = 'Alan Rickman'

### Question 15

Which of the statements about database normalisation are true?

**Statements:**
 
i) Database normalisation **improves** data redundancy, saves on storage space, and fulfils the requirement of records to be uniquely identified.

ii) Database normalisation supports up to the Third Normal Form and removes all data anomalies.

iii) Database normalisation removes inconsistencies that may cause the analysis of our data to be more complicated.

iv) Database normalisation increases data redundancy, saves on storage space, and fulfils the requirement of records to be uniquely identified.

**Options:**

 - (i) and (ii)
 - <mark>(i) and (iii)</mark>
 - (ii) and (iv)
 - (iii) and (iv)

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>