# Movie Data Analysis

This Jupyter Notebook performs a comprehensive analysis of movie data fetched from the TMDB API. It uses functions defined in `movie_analysis_functions.py` to fetch, clean, and analyze data, calculate KPIs, perform advanced filtering, compare franchises vs. standalone movies, analyze franchises and directors, and generate visualizations.

**Prerequisites**:
- Ensure `movie_analysis_functions.py` is in the same directory as this notebook.
- Set the TMDB API key as an environment variable (`export api_key='your_api_key_here'`) or replace `'YOUR_TMDB_API_KEY'` in the next cell.
- Install required libraries: `pip install requests pandas matplotlib`

In [9]:
# Import necessary libraries
import os
import pandas as pd
import matplotlib.pyplot as plt

# # Set TMDB API key
# os.environ['api_key'] = 'fc67bc86d72331280543a2761748f0e6'  # Replace with your actual API key




In [10]:


# Import functions from movie_analysis_functions.py
from tmdb_functions import (
    get_api_key,
    fetch_movie_data,
    save_df,
    load_df,
    clean_df,
    kpi_ranking,
    advanced_search,
    franchise_vs_standalone,
    analyze_franchise,
    analyze_directors,
    plot_revenue_vs_budget,
    plot_roi_by_genre,
    plot_popularity_vs_rating,
    plot_yearly_box_office,
    plot_franchise_vs_standalone
)

In [11]:
# Get API
get_api_key()

# Define movie IDs and fetch data
movie_ids = [0, 299534, 19995, 140607, 299536, 597, 135397,
             420818, 24428, 168259, 99861, 284054, 12445,
             181808, 330457, 351286, 109445, 321612, 260513]

# Fetch movie data
raw_data = fetch_movie_data(movie_ids)

# Save raw data to CSV
save_df(raw_data, 'raw_movie_data_new.csv')

# Display first few rows
raw_data.head()

INFO:root:Saved to raw_movie_data_new.csv


Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,origin_country,original_language,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,credits
0,False,/7RyHsO4yDXtBv1zUU3mTpHeQ0d5.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",356000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...",https://www.marvel.com/movies/avengers-endgame,299534,tt4154796,[US],en,...,2799439100,181,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Avenge the fallen.,Avengers: Endgame,False,8.237,26240,"{'cast': [{'adult': False, 'gender': 2, 'id': ..."
1,False,/vL5LR6WdxWPjLPFRLe133jXWsh5.jpg,"{'id': 87096, 'name': 'Avatar Collection', 'po...",237000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",https://www.avatar.com/movies/avatar,19995,tt0499549,[US],en,...,2923706026,162,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Enter the world of Pandora.,Avatar,False,7.588,32153,"{'cast': [{'adult': False, 'gender': 2, 'id': ..."
2,False,/k6EOrckWFuz7I4z4wiRwz8zsj4H.jpg,"{'id': 10, 'name': 'Star Wars Collection', 'po...",245000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",http://www.starwars.com/films/star-wars-episod...,140607,tt2488496,[US],en,...,2068223624,136,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Every generation has a story.,Star Wars: The Force Awakens,False,7.261,19686,"{'cast': [{'adult': False, 'gender': 2, 'id': ..."
3,False,/mDfJG3LC3Dqb67AZ52x3Z0jU0uB.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",300000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",https://www.marvel.com/movies/avengers-infinit...,299536,tt4154756,[US],en,...,2052415039,149,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Destiny arrives all the same.,Avengers: Infinity War,False,8.235,30420,"{'cast': [{'adult': False, 'gender': 2, 'id': ..."
4,False,/sCzcYW9h55WcesOqA12cgEr9Exw.jpg,,200000000,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",https://www.paramountmovies.com/movies/titanic,597,tt0120338,[US],en,...,2264162353,194,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Nothing on Earth could come between them.,Titanic,False,7.905,25902,"{'cast': [{'adult': False, 'gender': 2, 'id': ..."


In [12]:
# Clean the data
cleaned_data = clean_df(raw_data)

# Additional cleaning steps
cleaned_data['overview'] = cleaned_data['overview'].replace('No Data', pd.NA)
cleaned_data['tagline'] = cleaned_data['tagline'].replace('No Data', pd.NA)
cleaned_data = cleaned_data.drop_duplicates().dropna(subset=['id', 'title'])
cleaned_data = cleaned_data.dropna(thresh=10)
if 'status' in cleaned_data:
    cleaned_data = cleaned_data[cleaned_data['status'] == 'Released']
    cleaned_data = cleaned_data.drop(columns=['status'])

# Save cleaned data to CSV
save_df(cleaned_data, 'cleaned_movie_data.csv')

# Display first few rows
cleaned_data.head()

INFO:root:Saved to cleaned_movie_data.csv


Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_millions,revenue_millions,production_companies,...,runtime,overview,spoken_languages,poster_path,cast,cast_size,director,crew_size,profit,roi
0,299534,Avengers: Endgame,Avenge the fallen.,2019-04-24,Action|Adventure|Science Fiction,The Avengers Collection,en,356.0,2799.4391,Marvel Studios,...,181,After the devastating events of Avengers: Infi...,English|日本語|,/ulzhLuWrPK07P1YkdWQLZnQh1JL.jpg,Robert Downey Jr.|Chris Evans|Mark Ruffalo|Chr...,105,Joe Russo|Anthony Russo,593,2443.4391,7.863593
1,19995,Avatar,Enter the world of Pandora.,2009-12-15,Action|Adventure|Fantasy|Science Fiction,Avatar Collection,en,237.0,2923.706026,Dune Entertainment|Lightstorm Entertainment|20...,...,162,"In the 22nd century, a paraplegic Marine is di...",English|Español,/kyeqWdyUXW608qlYkRqosgbbJyK.jpg,Sam Worthington|Zoe Saldaña|Sigourney Weaver|S...,65,James Cameron,986,2686.706026,12.336312
2,140607,Star Wars: The Force Awakens,Every generation has a story.,2015-12-15,Action|Adventure|Science Fiction,Star Wars Collection,en,245.0,2068.223624,Lucasfilm Ltd.|Bad Robot,...,136,Thirty years after defeating the Galactic Empi...,English,/wqnLdwVXoBjKibFRR5U3y0aDUhs.jpg,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,182,J.J. Abrams,257,1823.223624,8.441729
3,299536,Avengers: Infinity War,Destiny arrives all the same.,2018-04-25,Action|Adventure|Science Fiction,The Avengers Collection,en,300.0,2052.415039,Marvel Studios,...,149,As the Avengers and their allies have continue...,English|,/7WsyChQLEftFiDOVTGkv3hFpyyt.jpg,Robert Downey Jr.|Chris Evans|Chris Hemsworth|...,69,Joe Russo|Anthony Russo,724,1752.415039,6.841383
4,597,Titanic,Nothing on Earth could come between them.,1997-11-18,Drama|Romance,,en,200.0,2264.162353,Paramount Pictures|20th Century Fox|Lightstorm...,...,194,101-year-old Rose DeWitt Bukater tells the sto...,English|Français|Deutsch|svenska|Italiano|Pусский,/9xjZS2rlVxm8SFx8kPC3aIGCOYQ.jpg,Leonardo DiCaprio|Kate Winslet|Billy Zane|Kath...,116,James Cameron,258,2064.162353,11.320812


In [13]:
# Calculate KPIs
print("Top 5 Movies by Revenue:")
print(kpi_ranking(cleaned_data, 'revenue_millions', n=5)[['title', 'revenue_millions']])

print("\nTop 5 Movies by Budget:")
print(kpi_ranking(cleaned_data, 'budget_millions', n=5)[['title', 'budget_millions']])

print("\nTop 5 Movies by Profit:")
print(kpi_ranking(cleaned_data, 'profit', n=5)[['title', 'profit']])

print("\nBottom 5 Movies by Profit:")
print(kpi_ranking(cleaned_data, 'profit', n=5, top=False)[['title', 'profit']])

print("\nTop 5 Movies by ROI (Budget >= 10M):")
print(kpi_ranking(cleaned_data, 'roi', n=5, filter_col='budget_millions', filter_val=10)[['title', 'roi']])

print("\nBottom 5 Movies by ROI (Budget >= 10M):")
print(kpi_ranking(cleaned_data, 'roi', n=5, top=False, filter_col='budget_millions', filter_val=10)[['title', 'roi']])

print("\nMost Voted Movies:")
print(kpi_ranking(cleaned_data, 'vote_count', n=5)[['title', 'vote_count']])

print("\nHighest Rated Movies (>= 10 votes):")
print(kpi_ranking(cleaned_data, 'vote_average', n=5, filter_col='vote_count', filter_val=10)[['title', 'vote_average']])

print("\nLowest Rated Movies (>= 10 votes):")
print(kpi_ranking(cleaned_data, 'vote_average', n=5, top=False, filter_col='vote_count', filter_val=10)[['title', 'vote_average']])

print("\nMost Popular Movies:")
print(kpi_ranking(cleaned_data, 'popularity', n=5)[['title', 'popularity']])

Top 5 Movies by Revenue:
                          title  revenue_millions
1                        Avatar       2923.706026
0             Avengers: Endgame       2799.439100
4                       Titanic       2264.162353
2  Star Wars: The Force Awakens       2068.223624
3        Avengers: Infinity War       2052.415039

Top 5 Movies by Budget:
                          title  budget_millions
9       Avengers: Age of Ultron            365.0
0             Avengers: Endgame            356.0
3        Avengers: Infinity War            300.0
6                 The Lion King            260.0
2  Star Wars: The Force Awakens            245.0

Top 5 Movies by Profit:
                          title       profit
1                        Avatar  2686.706026
0             Avengers: Endgame  2443.439100
4                       Titanic  2064.162353
2  Star Wars: The Force Awakens  1823.223624
3        Avengers: Infinity War  1752.415039

Bottom 5 Movies by Profit:
                       title     

In [14]:
# Advanced filtering
# Search 1: Best-rated Science Fiction Action movies starring Bruce Willis
sci_fi_action_willis = advanced_search(
    cleaned_data,
    genre_keywords='Science Fiction|Action',
    cast_keywords='Bruce Willis',
    sort_by='vote_average',
    ascending=False
)
print("Science Fiction Action Movies Starring Bruce Willis (Sorted by Rating):")
print(sci_fi_action_willis[['title', 'genres', 'cast', 'vote_average']])

# Search 2: Movies starring Uma Thurman, directed by Quentin Tarantino
thurman_tarantino = advanced_search(
    cleaned_data,
    cast_keywords='Uma Thurman',
    director_keywords='Quentin Tarantino',
    sort_by='runtime',
    ascending=True
)
print("\nMovies Starring Uma Thurman Directed by Quentin Tarantino (Sorted by Runtime):")
print(thurman_tarantino[['title', 'cast', 'director', 'runtime']])

Science Fiction Action Movies Starring Bruce Willis (Sorted by Rating):
Empty DataFrame
Columns: [title, genres, cast, vote_average]
Index: []

Movies Starring Uma Thurman Directed by Quentin Tarantino (Sorted by Runtime):
Empty DataFrame
Columns: [title, cast, director, runtime]
Index: []
