# A DATA ANALYSIS OF THE SUCCESS OF FILMS FOR A NEW MOVIE STUDIO

## INTRODUCTION

## BUSINESS UNDERSTANDING

To capitalize on the recent surge in original video content, our company is launching a new movie studio. This project seeks to analyze box office data to study the successes of films, genres, directors, among other key performance factors to help the company make informed decisions on the type of fims to produce. The project's findings will be translated into strategic recommendations, guiding stakeholders to make data-driven decisions and positioning the studio for success.

## DATA UNDERSTANDING

This analysis uses data sourced from [IMDb](https://www.imdb.com/), a database, and [The Movie Database (TMDb)](https://www.themoviedb.org/), a csv file, which provide comprehensive information on movies, including genres, directors, release dates, audience ratings, and box office performance. Key attributes such as genre, runtime, release timing, and financial performance will be assessed to ensure relevance and accuracy in analyzing factors that contribute to box office success. Quality checks will address any missing values or inconsistencies to prepare the data for analysis.

## DATA PREPARATION

In this section, we'll preprocess the film data to ensure it's suitable for analysis. This involves importing the necessary libraries, loading the data,  creating variables, cleaning the data, among other steps. The prepared data will enable reliable analysis, gaining insights into performance factors essential for making data-driven decisions for the new movie studio.

In [1]:
# Importing libraries 
import pandas as pd
import sqlite3
import numpy as np

In [2]:
# Load the data to view
# Begin with the database
# Create a connection
conn = sqlite3.connect('im.db')

In [3]:
# View the summary of the database
im = pd.read_sql("""SELECT * FROM sqlite_master""", conn)
im

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,movie_basics,movie_basics,2,"CREATE TABLE ""movie_basics"" (\n""movie_id"" TEXT..."
1,table,directors,directors,3,"CREATE TABLE ""directors"" (\n""movie_id"" TEXT,\n..."
2,table,known_for,known_for,4,"CREATE TABLE ""known_for"" (\n""person_id"" TEXT,\..."
3,table,movie_akas,movie_akas,5,"CREATE TABLE ""movie_akas"" (\n""movie_id"" TEXT,\..."
4,table,movie_ratings,movie_ratings,6,"CREATE TABLE ""movie_ratings"" (\n""movie_id"" TEX..."
5,table,persons,persons,7,"CREATE TABLE ""persons"" (\n""person_id"" TEXT,\n ..."
6,table,principals,principals,8,"CREATE TABLE ""principals"" (\n""movie_id"" TEXT,\..."
7,table,writers,writers,9,"CREATE TABLE ""writers"" (\n""movie_id"" TEXT,\n ..."


## DATA ANALYSIS

In this segment, we'll begin our analysis of the prepared data to answer the key objectives of this projects. The objectives of this analysis include finding; 

1. What types of films are doing best in box office?
2. What types of films should the company prioritize producing?
3. Who are some of the most popular directors/writers that the company might consider collaborating with or hiring?
4. What are the preferred film runtimes for audiences?
5. Who are some of the most popular actors the company should consider casting to attract a dedicated audience base?
6. What is the average production budget the company should expect?
7. What is the average returns of films?
8. What has been the trend of movie production over the years?
9. Linear Regression

#### 1. What types of films are doing best in box office?

#### 2. What types of films should the company prioritize producing?

#### 3. Who are some of the most popular directors/writers that the company might consider collaborating with or hiring?

#### 4. What are the preferred film runtimes for audiences?

#### 5. Who are some of the most popular actors the company should consider casting to attract a dedicated audience base?

#### 6. What is the average production budget the company should expect?

#### 7. What is the average returns of films?

#### 8. According to data, is this a viable business? (trend of movie production over the years?)

#### 9. Linear Regression

## VISUALIZATIONS

In this segment, we'll present visualizations that summarize our analysis, effectively conveying the insights gained. Through various graphical representations, we'll enhance understanding of the data, address the objectives discussed earlier, and provide actionable recommendations for the company.

#### 1. Types of films the company should consider producing

#### 2. Directors and writers the company should consider collaborating with

#### 3. Recommended movie runtimes  

#### 4. Actors the company should consider casting

#### 5. Recommended production budget

#### 6. 

#### 7. Linear Regression

# CONCLUSION

This project has provided a comprehensive analysis of box office data sourced from [IMDb](https://www.imdb.com/) and [TMDb](https://www.themoviedb.org/), uncovering success factors in the film industry. By examining genres, runtimes, directors, among other factors, we've derived actionable insights to guide the new movie studio's production strategy. The visualizations effectively communicated these findings, enabling stakeholders to make informed decisions. As the company moves forward, leveraging these insights will be crucial for developing a profitable film portfolio and ensuring a competitive edge in the evolving landscape of original video content.