# 02 – Exploratory Analysis of Curated Movie Ratings

This notebook demonstrates exploratory insights from the curated dataset:

- IMDb vs Rotten Tomatoes critic score relationship
- Rating gap (RT − IMDb) across genres
- Budget vs critic score

These visualizations support the CS598 project's goal of showing how the curated dataset enables meaningful analysis.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8')

df = pd.read_csv('../data/curated/movies_with_scores.csv')
df.head()

## IMDb vs Rotten Tomatoes (normalized ratings)

This chart visualizes whether critics and audiences generally agree across platforms.

In [2]:
plt.figure(figsize=(8,6))
plt.scatter(df['imdb_rating_norm'], df['rt_score_norm'], alpha=0.7)
plt.xlabel('IMDb Rating (0–100)')
plt.ylabel('Rotten Tomatoes Critic Score (0–100)')
plt.title('IMDb vs Rotten Tomatoes Ratings')
plt.grid(True)
plt.show()

## Rating Gap by Genre (RT − IMDb)

Positive values: critics rate higher than audiences.

Negative values: audiences like the movie more than critics.

In [3]:
df['primary_genre'] = df['genres'].fillna('').str.split('|').str[0]
df['rating_gap'] = df['rt_score_norm'] - df['imdb_rating_norm']

genre_gap = df.groupby('primary_genre')['rating_gap'].mean().sort_values()

plt.figure(figsize=(10,6))
genre_gap.plot(kind='barh', color='slateblue')
plt.xlabel('Average (RT - IMDb) Rating Difference')
plt.title('Critic vs Audience Rating Gap by Genre')
plt.tight_layout()
plt.show()

## Budget vs Critic Score

This chart tests whether big-budget films tend to receive higher critic ratings.

Movies with budget 0 are excluded since they represent missing metadata in TMDb.

In [4]:
df_clean = df[df['budget'] > 0].copy()

plt.figure(figsize=(8,6))
plt.scatter(df_clean['budget'], df_clean['rt_score_norm'], alpha=0.7)
plt.xscale('log')
plt.xlabel('Budget (log USD)')
plt.ylabel('Rotten Tomatoes Critic Score (0–100)')
plt.title('Budget vs Critic Score')
plt.grid(True)
plt.show()

## Summary

- IMDb and Rotten Tomatoes ratings show modest correlation.
- Certain genres show larger critic–audience disagreements.
- Budget does not strongly predict critic score in this pilot dataset.

These analyses demonstrate the value and utility of the curated dataset for downstream research.