# Analysis of Fandango's Movie Ratings System


<p style="text-align:center;">
  <img src="https://fivethirtyeight.com/wp-content/uploads/2015/10/fandango_lede_revise.png?w=575" width="500" height="100">
  <br>
  Source: <a href="https://fivethirtyeight.com/features/fandango-movies-ratings/">FiveThirtyEight</a>
</p>


In this project, we will be analyzing movie ratings data from Fandango, an online movie ratings aggregator. Our goal is to determine whether there has been any change in Fandango's rating system after a data journalist named Walt Hickey found strong evidence to suggest that the rating system was biased and dishonest in 2015. We will use more recent movie ratings data to determine if Fandango has made any improvements to its rating system since [Hickey's analysis](https://fivethirtyeight.com/features/fandango-movies-ratings/). By performing this analysis, we aim to gain insight into the accuracy and fairness of Fandango's movie ratings.

## Project and Data Overview

One effective way to determine if there has been any changes in Fandango's rating system since Walt Hickey's analysis is by comparing the system's features before and after the analysis. Luckily, we have access to the necessary data for both periods of time:

- Walt Hickey made the data from his analysis publicly accessible on GitHub [FiveThirtyEight](https://github.com/fivethirtyeight/data/tree/master/fandango). This data will be used to analyze Fandango's rating system prior to Hickey's analysis.
- A team member from Dataquest has collected movie rating data for films released in 2016 and 2017. This data is available on GitHub [Movie_ratings_2016_17](https://github.com/mircealex/Movie_ratings_2016_17) and will be used to analyze the features of Fandango's rating system after Hickey's analysis.

Let's start by importing the necessary libraries, then proceed to read in both datasets.

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

pd.options.display.max_columns = 100  # Avoid having displayed truncated output
%matplotlib inline

In [2]:
# Read both datasets
previous = pd.read_csv('fandango_score_comparison.csv')
after = pd.read_csv('movie_ratings_16_17.csv')

In [3]:
# View first few rows of both datasets
display(previous.head())
display(after.head())

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,Metacritic_norm,Metacritic_user_nom,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,3.3,3.55,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,3.35,3.75,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5
2,Ant-Man (2015),80,90,64,8.1,7.8,5.0,4.5,4.0,4.5,3.2,4.05,3.9,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5
3,Do You Believe? (2015),18,84,22,4.7,5.4,5.0,4.5,0.9,4.2,1.1,2.35,2.7,1.0,4.0,1.0,2.5,2.5,31,3136,1793,0.5
4,Hot Tub Time Machine 2 (2015),14,28,29,3.4,5.1,3.5,3.0,0.7,1.4,1.45,1.7,2.55,0.5,1.5,1.5,1.5,2.5,88,19560,1021,0.5


Unnamed: 0,movie,year,metascore,imdb,tmeter,audience,fandango,n_metascore,n_imdb,n_tmeter,n_audience,nr_metascore,nr_imdb,nr_tmeter,nr_audience
0,10 Cloverfield Lane,2016,76,7.2,90,79,3.5,3.8,3.6,4.5,3.95,4.0,3.5,4.5,4.0
1,13 Hours,2016,48,7.3,50,83,4.5,2.4,3.65,2.5,4.15,2.5,3.5,2.5,4.0
2,A Cure for Wellness,2016,47,6.6,40,47,3.0,2.35,3.3,2.0,2.35,2.5,3.5,2.0,2.5
3,A Dog's Purpose,2017,43,5.2,33,76,4.5,2.15,2.6,1.65,3.8,2.0,2.5,1.5,4.0
4,A Hologram for the King,2016,58,6.1,70,57,3.0,2.9,3.05,3.5,2.85,3.0,3.0,3.5,3.0


After taking a quick look, it is clear that the `fandango_score_comparison.csv` contains information on every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data was collected on Aug. 24, 2015. On the other hand, the `movie_ratings_16_17.csv` contains movie ratings data for 214 of the most popular movies released in 2016 and 2017, and as of March 22, 2017, the ratings were up to date. However, significant changes could be expected mostly for movies released in 2017.

However, we want to isolate the relevant columns into separate variables for easier access to the data we are interested in later.

- For the dataset of ratings prior to Hickey's analysis, we will select the following columns: `'FILM', 'Fandango_Stars', 'Fandango_Ratingvalue', 'Fandango_votes', 'Fandango_Difference'`.
- For the other dataset, we will select the following columns: `'movie', 'year', 'fandango'`.

In [4]:
# Create subset of new dataframes based on the relevant columns
fandango_previous = previous[['FILM', 'Fandango_Stars', 'Fandango_Ratingvalue', 'Fandango_votes', 'Fandango_Difference']].copy()
fandango_after = after[['movie', 'year', 'fandango']].copy()

# View results
display(fandango_previous.head())
display(fandango_after.head())

Unnamed: 0,FILM,Fandango_Stars,Fandango_Ratingvalue,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),5.0,4.5,14846,0.5
1,Cinderella (2015),5.0,4.5,12640,0.5
2,Ant-Man (2015),5.0,4.5,12055,0.5
3,Do You Believe? (2015),5.0,4.5,1793,0.5
4,Hot Tub Time Machine 2 (2015),3.5,3.0,1021,0.5


Unnamed: 0,movie,year,fandango
0,10 Cloverfield Lane,2016,3.5
1,13 Hours,2016,4.5
2,A Cure for Wellness,2016,3.0
3,A Dog's Purpose,2017,4.5
4,A Hologram for the King,2016,3.0


Let's define the columns used in `fandango_previous` and `fandango_after`:

**fandango_previous**
- `FILM`: The film in question
- `Fandango_Stars`: The number of stars the film had on its Fandango movie page
- `Fandango_Ratingvalue`: The Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained
- `Fandango_votes`: The number of user votes the film had on Fandango
- `Fandango_Difference`: The difference between the presented *Fandango_Stars* and the actual *Fandango_Ratingvalue*

**fandango_after**
- `movie`: 	The name of the movie
- `year`: The release year of the movie
- `fandango`: The Fandango rating of the movie (user score)
