# Project Title - Movie Ratings: Critics vs. Audience 

## Data set selection

> In this section, you will need to provide the following information about the selected data set:
>
> - Source with a link: https://www.kaggle.com/datasets/subhajournal/movie-rating
> - Fields
movie_title â€“ Name of the movie

movie_info â€“ Short description or plot summary

critics_consensus â€“ Summary statement reflecting criticsâ€™ opinions

rating â€“ MPAA rating (G, PG, PG-13, R, etc.)

genre â€“ Genre(s) of the movie

directors â€“ Director(s) of the movie

actors â€“ Cast information

runtime â€“ Duration of the movie in minutes

release_date_theaters â€“ Theater release date

release_date_streaming â€“ Streaming release date

studio_name â€“ Producing or distributing studio

tomatometer_status â€“ Fresh / Rotten / Certified Fresh

tomatometer_rating â€“ Critic rating (0â€“100)

tomatometer_count â€“ Number of critic reviews

audience_rating â€“ Audience score (0â€“100)

> - License: CC BY 4.0

### Data set selection rationale

> Why did you select this data set?
I selected this dataset because movie ratings are relatable, widely discussed, and offer a range of opportunities for statistical exploration. Rotten Tomatoes is one of the most influential platforms for movie reviews, and understanding critic vs. audience perspectives can reveal how different groups evaluate films. The dataset is large (15,000+ movies), diverse in features, and structured well for visual and statistical analysis. Additionally, the dataset allows for questions that stakeholders in the film industry such as production studios, streaming services, and marketing teams would care about. Insights like genre performance, audience engagement patterns, and factors influencing ratings can help in decision-making around content creation, budgeting, and promotions.

### Questions to be answered

> - What is the relationship between critic ratings and audience ratings?
> - How do Tomatometer ratings vary across different genres?
> - Does runtime influence how movies are rated?
> - How do MPAA ratings (G, PG, PG-13, R) compare in terms of average audience rating and popularity?
> - Do movies from certain studios tend to receive higher critic or audience ratings?

### Visualization ideas

> Distributions of critic and audience scores (overall & by genre)
Charts: Histograms / KDE plots of tomatometer_rating and audience_rating. Histograms or boxplots by genre (top 5â€“10 genres).
Purpose: Show the overall shape of the ratings and which genres have tighter or wider spreads in scores

> Critic vs audience agreement
Charts: Scatter plot of tomatometer_rating vs audience_rating, with points colored by genre.
Interactive Plotly scatter where hovering shows movie_title, genre, and studio.
Purpose: Identify clusters (genres where critics and audience mostly agree) and outliers (movies audiences love but critics dislike, and vice versa).

> Runtime and ratings
Charts: Scatter plot of runtime_in_minutes vs audience_rating and/or tomatometer_rating.
Possibly bin runtimes (e.g., <90, 90â€“120, >120 minutes) and use boxplots to compare scores across bins.
Purpose: Check if there is a trend (mid-length movies doing best) or if runtime doesnâ€™t matter much once genre is considered.

> MPAA rating vs scores
Charts: Grouped bar chart or boxplot of tomatometer_rating and audience_rating by rating (G, PG, PG-13, R, etc.).
Purpose: Reveal which content ratings tend to be more positively received, and whether critics and audiences differ (maybe R-rated dramas are critic favorites but not audience favorites).

> Studio comparison
Charts:Bar chart of mean tomatometer_rating by studio_name (for studios with at least a minimum number of movies, e.g., > 20).
Purpose: Show which studios are most consistent in producing highly-rated films, useful from a branding and partnership perspective.


In [None]:
# ðŸš€ Importing some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Import the dataset
df = pd.read_csv("data/Rotten Tomatoes Movies.csv")
df.head()