<h1 style="text-align: center;">Movie Analysis </h1>



# 1. Project Overview
This project involves conducting an in-depth analysis aimed at generating actionable insights for stakeholders in the movie business. The analysis will explore various aspects of the movie industry, including financial performance, audience reception, and market trends. Our goal is to equip stakeholders with a comprehensive understanding of key metrics and trends, enabling them to make informed decisions that drive the company's growth and success in the competitive movie market.

This project leverages data from [Box Office Mojo](https://www.boxofficemojo.com/), [IMDB](https://www.imdb.com/), [Rotten Tomatoes](https://www.rottentomatoes.com/), [TheMovieBD](https://www.themoviedb.org/), and [TheNumbers](https://www.the-numbers.com/). These sites provide various types of information related to movies, TV shows, and the entertainment industry.

1. **Box Office Mojo** - is a comprehensive resource for box office revenue data. It is widely used to analyze the financial success of films.
2. **IMDB** - also contains information about TV shows, movies and the entertainment industry. It provides extensive information on cast and crew, plot summaries, user and critic reviews, ratings, trivia, and more. IMDb also features user-generated content like reviews and lists,
3. **Rotten Tomatoes** - Rotten Tomatoes is primarily a review aggregation site. It compiles reviews from professional critics and audiences for movies and TV shows. It is widely used by moviegoers and TV viewers to gauge the critical and public reception of a title before watching it. It helps in making informed decisions about which movies or TV shows to watch based on aggregated reviews and ratings.
4. **TheMovieDB** - is a user-driven database that offers information on movies, TV shows, and celebrities. It includes details such as cast and crew, plot summaries, user ratings, and reviews.
5. **TheNumbers** - provides detailed box office data, including financial analyses of movies. It also offers additional industry statistics and insights, such as home video sales and movie budgets.

# 2. Business Understanding

## Defining the question

Your company now sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. You are charged with exploring what types of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of your company's new movie studio can use to help decide what type of films to create.

##### **Problem Statement/Objective :** *Determine which types of films are currently doing the best at the box office and translate the findings into actionable insights.*



**Objectives:**
1. Identify types of films that are performing the best.
2. Explore and analyze current box office trends.
3. Identify factors that  contribute to the success of a movie.
4. Translate findings into actionable insights.

To address our objectives, we need to answer the following research questions:
1. What are the **top performing genres?**
2. What are the **preferences of the target audience?**
3. What can be done to **break market barriers?**

The following are the data questions answered in this analysis:

1. **What types of films are currently performing best at the box office (based on box office gross)?**
   * What are the characteristics of top-performing movies based on box office gross?
   * Which movies have been the most successful financially?
2. **Which movie genres have been the most popular and successful over time?**
   * What are the trends in genre preferences?
   * What are the preferences of the target audience based on user ratings and reviews?
   * What type of content resonates well with the audience?
3. **How does the movie budget impact box office revenue, and can smaller budget films be profitable?**
   * How does the movie budget affect box office revenue?
   * Can smaller budget films be profitable, and is there an optimal budget range?
4. **Are there seasonal trends in movie performance, and when is the best time to release a movie?**
   * Are there seasonal patterns in movie performance?
   * When is the best time to release a movie for maximum revenue?





# 3. Data Understanding
For the analysis,we used data from the `zippedData` folder of [this repository.](https://github.com/learn-co-curriculum/dsc-phase-2-project-v3)

### Dataset Descriptions

1. **bom.movie_gross.csv**
   - **Description:** Contains financial information about movies.
   - **Columns:**
     - **Title:** The title of the film.
     - **Year of Production:** The year the movie was produced.
     - **Domestic Gross:** Total revenue generated within the domestic market.
     - **Foreign Gross:** Total revenue generated in the global market.

2. **rt.movie_info.tsv**
   - **Description:** Details about various movies.
   - **Columns:**
     - **Synopsis:** A brief summary of the movie’s plot.
     - **Rating:** Rating of the movie (e.g., Rotten Tomatoes score).
     - **Release Date:** The date when the movie was first shown in theaters.
     - **Box Office:** Financial data related to the movie’s box office performance.

3. **rt.reviews.tsv**
   - **Description:** Contains movie reviews and ratings.
   - **Columns:**
     - **Rating:** Rating of the movie on a scale of 1-5.
     - **Publisher:** The entity that published the review.
     - **Date:** The date the review was published.

4. **tmdb.movies.csv**
   - **Description:** Information about movies from The Movie Database (TMDb).
   - **Columns:**
     - **Genres_ids:** Identifiers for movie genres.
     - **Language:** The language in which the movie is made.
     - **Title:** The title of the movie.
     - **Popularity:** Popularity score of the movie.
     - **Average Vote Count:** Average number of votes the movie has received.

5. **tn.movie_budgets.csv**
   - **Description:** Contains financial details about movies.
   - **Columns:**
     - **Budget:** The production budget of the movie.
     - **Domestic Gross:** Revenue generated in the domestic market.
     - **Worldwide Gross:** Total revenue generated worldwide.
     - 
5. **im.db**
- **Description:** im.db is a database that contains the tables listed below.

- **Tables:**
  - **movie_basics**
    - **Columns:**
      - **movie_id:** Unique identifier for each movie.
      - **primary_title:** The primary title of the movie.
      - **original_title:** The original title of the movie.
      - **start_year:** The year the movie was released.
      - **runtime_minutes:** The runtime of the movie in minutes.
      - **genres:** Genres associated with the movie.

  - **directors**
    - **Columns:**
      - **movie_id:** Unique identifier for the movie.
      - **person_id:** Unique identifier for the director.

  - **known_for**
    - **Columns:**
      - **person_id:** Unique identifier for the person.
      - **movie_id:** Unique identifier for the movie the person is known for.

  - **movie_akas**
    - **Columns:**
      - **movie_id:** Unique identifier for the movie.
      - **ordering:** Ordering of the alternate title.
      - **title:** The alternate title of the movie.
      - **region:** The region where the alternate title is used.
      - **language:** The language of the alternate title.
      - **types:** Type of the alternate title (e.g., IMDb display).
      - **attributes:** Any additional attributes of the alternate title.
      - **is_original_title:** Indicator if the alternate title is the original title (0.0 for No, 1.0 for Yes).

  - **movie_ratings**
    - **Columns:**
      - **movie_id:** Unique identifier for the movie.
      - **averagerating:** Average rating of the movie.
      - **numvotes:** Number of votes the movie received.

  - **persons**
    - **Columns:**
      - **person_id:** Unique identifier for the person.
      - **primary_name:** The primary name of the person.
      - **birth_year:** The birth year of the person.
      - **death_year:** The death year of the person (if applicable).
      - **primary_profession:** The primary profession(s) of the person (e.g., actor, director).

  - **principals**
    - **Columns:**
      - **movie_id:** Unique identifier for the movie.
      - **ordering:** Ordering of the credit role.
      - **person_id:** Unique identifier for the person.
      - **category:** The category of the role (e.g., actor, director).
      - **job:** The job description (if applicable).
      - **characters:** List of characters played by the actor (if applicable).

  - **writers**
    - **Columns:**
      - **movie_id:** Unique identifier for the movie.
      - **person_id:** Unique identifier for the writer.




**Properties of Variables of interest:**

1. ***Movie Name:*** Categorical variable which is a textual label or name of the movie.
2. ***Genre:*** Categorical variable representing the type or category of the movie (e.g., Action, Drama, Comedy).
3. ***Budget:*** Continuous variable representing the production cost or budget of the movie.
4. ***Worldwide Gross:*** Continuous variable representing the total revenue generated by the movie at the box office.
5. ***User Rating:*** Continuous variable representing the average ratings or scores given by users for the movie.
6. ***Release Date:*** Temporal variable indicating the date when the movie was released in theaters.


Target Variable: The target variable for this project is the `worldwide gross` of movies. Box office gross represents the total revenue generated by the movie in theaters and serves as an indicator of movie financial success.

These variables will be used to answer the data questions and derive actionable insights to guide our new movie studio in making informed decisions for successful movie production. The analysis will focus on understanding the relationships between these variables and their impact on movie success and profitability.

### Entity-Relationship Diagram (ERD)
An ERD for the IMDb data is provided to illustrate the relationships between different entities within the dataset.


The data was collected from various sources, resulting in files with different formats. Some files are compressed CSV (comma-separated values) or TSV (tab-separated values), which can be opened using spreadsheet software or with pd.read_csv. The IMDb data is stored in a SQLite database. Below is the Entity-Relationship Diagram (ERD) for the IMDb data.

![image.png](attachment:49fe1ac2-0095-4b3b-acc8-4b73ffe57d48.png)


 