# Project 3 Part 3 - MYSQL DB
Cameron Peace

### Task

***For part 3 of the project you will be practicing applying an E.T.L process on your previously saved movie data. Specifically, you will create a new MySQL database after preparing the data for a relational database. You will export your database to a .sql file in your repository using MySQL Workbench.***

### Assignment

Specifications - Database

Your stakeholder wants you to take the data you have been cleaning and collecting in Parts 1 & 2 of the project, and wants you to create a MySQL database for them.

Specifically, they want the data from the following files included in your database:

* Title Basics:
    * Movie ID (tconst)
    * Primary Title
    * Start Year
    * Runtime (in Minutes)
    * Genres
* Title Ratings
    * Movie ID (tconst)
    * Average Movie Rating
    * Number of Votes
* The TMDB API Results (multiple files)
    * Movie ID
    * Revenue
    * Budget
    * Certification (MPAA Rating)
    
You should normalize the tables as best you can before adding them to your new database.

>**Note: an important exception to their request is that they would like you to keep all of the data from the TMDB API in 1 table together (even though it will not be perfectly normalized).**

>**You only need to keep the imdb_id, revenue, budget, and certification columns**

Required Transformation Steps for Title Basics:

* Normalize Genre:

    * Convert the single string of genres from title basics into 2 new tables.
    * [ ] "title_genres": with the columns:

        * tconst
        * genre_id
    * [ ] genres:
        * genre_id
        * genre_name

Discard unnecessary information:

For the title basics table, drop the following columns:

* [ ] "original_title" (we will use the primary title column instead)
* [ ] "isAdult" ("Adult" will show up in the genres so this is redundant information).
* [ ] "titleType" (every row will be a movie).
* [ ] "genres" and other variants of genre (genre is now represented in the 2 new tables described above.

Do not include the title_akas table in your SQL database.
You have already filtered out the desired movies using this table and the remaining data is mostly nulls and not of-interest to the stakeholder.

MySQL Database Requirements
Use sqlalchemy with pandas to execute your SQL queries inside your notebook.

* [ ] Create a new database on your MySQL server and call it "movies".

* Make sure to have the following tables in your "movies" database:
    * [ ] title_basics
    * [ ] title_ratings
    * [ ] title_genres
    * [ ] genres
    * [ ] tmdb_data

Make sure to set a Primary Key for each table that isn't a joiner table (e.g. title_genres is a joiner table).

* [ ] After creating each table, show the first 5 rows of that table using a SQL query.

* [ ] Make sure to run the "SHOW TABLES" SQL query at the end of your notebook to show that all required tables have been created.

Deliverables
Submit a link to your github respository containing the Jupyter Notebook file.

### Imports

In [3]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

### Loading, Viewing Data

In [5]:
# loading data
basics = pd.read_csv('Data/title_basics.csv.gz')
ratings = pd.read_csv('Data/title_ratings.csv.gz')

In [6]:
# initial view
display(basics.head(), ratings.head())

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0035423,movie,Kate & Leopold,Kate & Leopold,0,2001.0,,118,"Comedy,Fantasy,Romance"
1,tt0062336,movie,The Tango of the Widower and Its Distorting Mi...,El tango del viudo y su espejo deformante,0,2020.0,,70,Drama
2,tt0069049,movie,The Other Side of the Wind,The Other Side of the Wind,0,2018.0,,122,Drama
3,tt0088751,movie,The Naked Monster,The Naked Monster,0,2005.0,,100,"Comedy,Horror,Sci-Fi"
4,tt0096056,movie,Crime and Punishment,Crime and Punishment,0,2002.0,,126,Drama


Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.7,1960
1,tt0000002,5.8,263
2,tt0000005,6.2,2597
3,tt0000006,5.1,178
4,tt0000007,5.4,816
