In [2]:
# 📘 IMDb Flop Score: Merge and Prepare

# 📦 Imports
import pandas as pd
from google.colab import files

# 📁 Upload both CSV files
print("⬆️ Please upload 'worst_200_movies.csv' and 'tmdb_movie_data_by_title.csv'")
uploaded = files.upload()

# 📂 Load CSVs
worst_df = pd.read_csv('worst_200_movies.csv')
tmdb_df = pd.read_csv('tmdb_movie_data_by_title.csv')

# 🧾 Check columns for consistency
print("Worst Movies columns:", worst_df.columns.tolist())
print("TMDb Data columns:", tmdb_df.columns.tolist())

# 🛠️ Rename columns to ensure they match for merge
tmdb_df.rename(columns={
    'title': 'primaryTitle',
    'year': 'startYear'
}, inplace=True)

# 🔗 Merge datasets on title + year
merged_df = pd.merge(
    worst_df,
    tmdb_df,
    on=['primaryTitle', 'startYear'],
    how='left'
)

# 📊 Preview merged data
print("✅ Merged shape:", merged_df.shape)
merged_df.head()

# 💾 Save merged data
import os
os.makedirs('data', exist_ok=True)
merged_df.to_csv('data/enriched_worst_200_movies.csv', index=False)
print("✅ Saved merged dataset to 'data/enriched_worst_200_movies.csv'")


⬆️ Please upload 'worst_200_movies.csv' and 'tmdb_movie_data_by_title.csv'


Saving worst_200_movies.csv to worst_200_movies (1).csv
Saving tmdb_movie_data_by_title.csv to tmdb_movie_data_by_title.csv
Worst Movies columns: ['primaryTitle', 'startYear', 'averageRating', 'numVotes', 'genres']
TMDb Data columns: ['title', 'year', 'tmdb_id', 'budget', 'revenue', 'release_date', 'popularity', 'vote_average', 'vote_count']
✅ Merged shape: (200, 12)
✅ Saved merged dataset to 'data/enriched_worst_200_movies.csv'


# IMDb Flop Score: Step 2 - Merging and Data Preparation

This notebook merges the IMDb worst 200 movies dataset with enriched metadata from TMDb (budget, revenue, popularity, etc.).

## 📌 Goals
- Join datasets using `primaryTitle` and `startYear`
- Create a clean, enriched dataset ready for Flop Score analysis
- Save as a new CSV file for further exploration

➡️ This merged data will feed into notebook 03: Flop Score calculation & EDA

