# Part 4

For part 4 of the project, you will be using your MySQL database from part 3/3.5 to answer meaningful questions for your stakeholder. They want you to use your hypothesis testing and statistics knowledge to answer 3 questions about what makes a successful movie.
Questions to Answer
* The stakeholder's first question is: does the MPAA rating of a movie (G/PG/PG-13/R) affect how much revenue the movie generates?

    * They want you to perform a statistical test to get a mathematically-supported answer.
    * They want you to report if you found a significant difference between ratings. If so, what was the p-value of you analysis? And which rating earns the most revenue?
    * They want you to prepare a visualization that supports your finding.
    * It is then up to you to think of 2 additional hypotheses to test that your stakeholder may want to know.

* Some example hypotheses you could test:

    * Do movies that are over 2.5 hours long earn more revenue than movies that are 1.5 hours long (or less)?
    * Do movies released in 2020 earn less revenue than movies released in 2018?
    * How do the years compare for movie ratings?
    * Do some movie genres earn more revenue than others?
    * Are some genres higher rated than others? etc.

In [5]:
# Imports
import json, os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

import pymysql
pymysql.install_as_MySQLdb()

from sqlalchemy import create_engine

In [2]:
# get basics and ratings info
basics = pd.read_csv('Data/title_basics.csv.gz')
ratings= pd.read_csv('Data/title.ratings.csv.gz')
 # load early 2000 movies file
early_2k_movies = pd.read_csv('Data/early_2k_movies.csv.gz')

In [6]:
#load mysql credentials
with open('C:/Users/Sean/.edit/my_sql.json') as f:
    login = json.load(f)
login.keys()

dict_keys(['user', 'password'])

In [9]:
# create connection to database
from urllib.parse import quote_plus as urlquote
connection = f"mysql+pymysql://{login['user']}:{urlquote(login['password'])}@localhost/makin_better_movies.sql"
engine = create_engine(connection)

In [10]:
# check if tables loaded
q = """SHOW TABLES;"""
pd.read_sql(q,engine)

Unnamed: 0,Tables_in_makin_better_movies.sql
0,basics
1,early_2k_movies
2,genres_lookup
3,ratings
4,title_genres


In [11]:
q = """SELECT * FROM early_2k_movies;"""
pd.read_sql(q,engine)

Unnamed: 0,tconst,revenue,budget,certification
0,tt0035423,76019000.0,48000000.0,Missing
1,tt0079644,0.0,0.0,Missing
2,tt0089067,0.0,0.0,Missing
3,tt0114447,0.0,0.0,Missing
4,tt0114722,0.0,0.0,Missing
...,...,...,...,...
81220,tt9895024,0.0,0.0,Missing
81221,tt9896876,0.0,0.0,PG-13
81222,tt9898844,0.0,0.0,Missing
81223,tt9900940,0.0,0.0,Missing


In [12]:
q = """SELECT * FROM basics;"""
pd.read_sql(q,engine)

Unnamed: 0,tconst,primaryTitle,startYear,runtimeMinutes
0,tt0035423,Kate & Leopold,2001.0,118
1,tt0062336,The Tango of the Widower and Its Distorting Mi...,2020.0,70
2,tt0069049,The Other Side of the Wind,2018.0,122
3,tt0079644,November 1828,2001.0,140
4,tt0088751,The Naked Monster,2005.0,100
...,...,...,...,...
143286,tt9916170,The Rehearsal,2019.0,51
143287,tt9916190,Safeguard,2020.0,95
143288,tt9916270,Il talento del calabrone,2020.0,84
143289,tt9916362,Coven,2020.0,92


In [13]:
q = """SELECT * FROM ratings;"""
pd.read_sql(q,engine)

Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.7,1913
1,tt0000002,5.8,258
2,tt0000003,6.5,1717
3,tt0000004,5.6,170
4,tt0000005,6.2,2533
...,...,...,...
1261560,tt9916690,6.5,6
1261561,tt9916720,5.3,260
1261562,tt9916730,8.4,6
1261563,tt9916766,6.8,21


In [15]:
q = """SELECT * FROM genres_lookup;"""
pd.read_sql(q,engine)

Unnamed: 0,genre_name,genre_id
0,Action,0
1,Adult,1
2,Adventure,2
3,Animation,3
4,Biography,4
5,Comedy,5
6,Crime,6
7,Drama,7
8,Family,8
9,Fantasy,9


In [16]:
q = """SELECT * FROM title_genres;"""
pd.read_sql(q,engine)

Unnamed: 0,tconst,genre_id
0,tt0035423,5
1,tt0035423,9
2,tt0035423,18
3,tt0062336,7
4,tt0069049,7
...,...,...
248492,tt9916190,23
248493,tt9916270,23
248494,tt9916362,7
248495,tt9916362,11


# Hypothesis Testing