# Movie Data Analysis Report

## 1. Introduction

This report presents an analysis of movie data extracted from The Movie Database (TMDb) API. The analysis aims to understand trends in movie performance, identify factors influencing success, and compare the performance of franchise films against standalone films.

## 2. Methodology

### 2.1 Data Extraction

- Movie data was extracted using the TMDb API for a selected list of 19 movie IDs, including blockbusters like 'Avatar' (id: 19995) and 'Interstellar' (id: 157336).
- Data retrieved included movie details (title, release date, genres, overview), financial information (budget, revenue), and cast and crew details.

### 2.2 Data Processing

- The raw JSON data was transformed into a structured Pandas DataFrame.
- Irrelevant columns like 'adult', 'imdb_id', and 'homepage' were dropped.
- Nested JSON columns (genres, production_companies, etc.) were parsed to extract relevant information.
- Data types were converted for analysis (e.g., budget and revenue to numeric, release_date to datetime).
- Missing values were handled by replacing zeros in budget, revenue, and runtime with NaN, and replacing placeholders in overview and tagline with NaN.
- The dataset was filtered to include only released movies and rows with at least 10 non-NaN values, ensuring data quality.

### 2.3 Data Analysis

- KPIs were defined, including:
    - Revenue (in million USD)
    - Budget (in million USD)
    - Profit (Revenue - Budget)
    - ROI (Return on Investment - Revenue / Budget)
    - Vote Count
    - Vote Average
    - Popularity
- Movies were ranked based on these KPIs to identify top performers, such as 'Avatar' having the highest revenue and 'Star Wars: The Force Awakens' having the highest profit.
- Advanced searches were conducted, such as identifying the best-rated Science Fiction Action movies starring Bruce Willis (e.g., 'Looper') and movies starring Uma Thurman and directed by Quentin Tarantino (e.g., 'Pulp Fiction', 'Kill Bill: Vol. 1').
- Franchise and standalone movie performance was compared based on mean and median values of key metrics, revealing that franchise movies tend to have higher popularity and ROI.

### 2.4 Data Visualization

- Matplotlib was used to create visualizations, including:
    - Scatter plot of revenue vs. budget to visualize the relationship between these two variables.
    - Bar chart showing the average ROI per genre, highlighting the profitability of different genres.
    - Scatter plot of popularity vs. rating to explore the correlation between these metrics.
    - Line graph illustrating yearly trends in box office performance.
    - Bar chart comparing key performance metrics (revenue, ROI, budget, popularity) between franchise and standalone movies.


## 3. Key Insights

### 3.1 Performance Ranking:

- 'Avatar' emerged as the highest-grossing movie, with over 2.8 billion USD in revenue.
- 'Star Wars: The Force Awakens' achieved the highest profit, exceeding 2 billion USD.
- 'E.T. the Extra-Terrestrial' had the highest ROI among movies with a budget over 10 million USD.
- 'Inception' was the most popular movie, while 'The Shawkshank Redemption' received the highest average rating.

### 3.2 Advanced Movie Searches:

- The searches revealed specific movie recommendations based on user-defined criteria, demonstrating the dataset's potential for personalized movie discovery.

### 3.3 Franchise vs. Standalone:

- Franchise films demonstrated higher average revenue, ROI, and popularity compared to standalone films.
- Standalone films had a slightly higher average rating.
- This suggests that franchise films are commercially more successful, while standalone films might achieve greater critical acclaim.

### 3.4 Visualizations:

- Visualizations effectively illustrated data trends, highlighting the positive correlation between budget and revenue, the varying ROI across genres, and the fluctuation in box office performance over time.

### 3.5 Most Successful Franchises and Directors:

- The 'Harry Potter Collection' emerged as the most successful franchise based on the number of movies and total revenue.
- James Cameron was identified as the most successful director, with a high average rating and significant box office success.


## 4. Conclusion

This analysis provides a comprehensive overview of movie data, revealing valuable insights into the factors that influence movie performance. Franchise films, particularly those belonging to established collections, tend to be commercially more successful, while standalone films can achieve high critical acclaim. By leveraging data analysis and visualization techniques, industry professionals can gain valuable insights for strategic decision-making in movie production and marketing. Further exploration of the data could involve analyzing actor performance, exploring international market trends, and developing predictive models for movie success.


