# Movie Ratings Analysis: Investigating Online Review Bias

## Project Overview
This data analysis project investigates potential bias in online movie ratings, specifically focusing on Fandango's rating system. The analysis compares movie ratings across multiple platforms including Fandango, Rotten Tomatoes, Metacritic, and IMDB to identify any systematic differences in rating patterns.

## Key Questions Addressed
- Do online movie review platforms show rating bias?
- How do Fandango's displayed ratings compare to actual user ratings?
- Is there a significant difference between critic and user ratings across platforms?
- How are poorly-rated movies scored across different platforms?

## Analysis Objectives
1. Evaluate potential bias in Fandango's rating system
2. Compare rating distributions across multiple platforms
3. Analyze the relationship between movie popularity and ratings
4. Investigate how poorly-rated movies are scored across platforms

## Data Sources
The analysis uses two primary datasets from FiveThirtyEight's GitHub repository:
1. `fandango_scrape.csv`: Contains movie ratings data from Fandango
2. `all_sites_scores.csv`: Includes aggregate ratings from multiple platforms

## References
- Original FiveThirtyEight Article: [Be Suspicious Of Online Movie Ratings, Especially Fandango's](http://fivethirtyeight.com/features/fandango-movies-ratings/)
- Data Source: [FiveThirtyEight GitHub Repository](https://github.com/fivethirtyeight/data)

## Environment Setup
Importing required libraries for data manipulation, analysis, and visualization.

In [1]:
# Core data manipulation and analysis libraries
import numpy as np
import pandas as pd

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

## Data Loading and Initial Exploration
Loading the two primary datasets:
1. `fandango_scrape.csv`: Fandango's movie ratings and displayed stars
2. `all_sites_scores.csv`: Aggregate ratings from multiple platforms

In [3]:
# Load Fandango ratings data
fandango = pd.read_csv("fandango_scrape.csv")

# Display basic information about the dataset
print("Fandango Dataset Overview:")
print("-" * 50)
print("\nFirst few rows:")
display(fandango.head())
print("\nDataset Info:")
display(fandango.info())
print("\nDescriptive Statistics:")
display(fandango.describe())

Fandango Dataset Overview:
--------------------------------------------------

First few rows:


Unnamed: 0,FILM,STARS,RATING,VOTES
0,Fifty Shades of Grey (2015),4.0,3.9,34846
1,Jurassic World (2015),4.5,4.5,34390
2,American Sniper (2015),5.0,4.8,34085
3,Furious 7 (2015),5.0,4.8,33538
4,Inside Out (2015),4.5,4.5,15749



Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   FILM    504 non-null    object 
 1   STARS   504 non-null    float64
 2   RATING  504 non-null    float64
 3   VOTES   504 non-null    int64  
dtypes: float64(2), int64(1), object(1)
memory usage: 15.9+ KB


None


Descriptive Statistics:


Unnamed: 0,STARS,RATING,VOTES
count,504.0,504.0,504.0
mean,3.558532,3.375794,1147.863095
std,1.563133,1.491223,3830.583136
min,0.0,0.0,0.0
25%,3.5,3.1,3.0
50%,4.0,3.8,18.5
75%,4.5,4.3,189.75
max,5.0,5.0,34846.0


In [4]:
df=fandango.copy()