

# Movie Industry Data Analysis Project Group 1

This notebook analyzes movie data to answer key business questions for a new movie studio.
## Business Problem

Our company is planning to enter the film industry and launch a new movie studio. However, with no prior experience in movie production, we need to conduct thorough data analysis to guide our decisions. 

We aim to answer the following business questions:

1️⃣ What genres tend to perform best at the box office?  
2️⃣ Do critic ratings predict box office success?  
3️⃣ How do production budgets correlate with box office revenue?  
  

By answering these, we aim to make data-driven decisions on genre selection, budget allocation, and release scheduling.


## Table of Contents

- [1. Data Loading](#Data-Loading)
- [2. Data Cleaning](#Data-Cleaning)
- [3. Merging Datasets](#Merging-Datasets)
- [4. Exploratory Data Analysis](#Exploratory-Data-Analysis)
- [5. Business Recommendations](#Business-Recommendations)


## 1. Data Loading

We load all required datasets into pandas dataframes from both SQLite database and CSV/TSV files. These contain IMDB data, budget data, box office revenues, and Rotten Tomatoes info.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3
import pandas as pd
import re

# Load IMDB database tables
conn = sqlite3.connect("im.db")
df_basics = pd.read_sql_query("SELECT * FROM movie_basics", conn)
df_ratings = pd.read_sql_query("SELECT * FROM movie_ratings", conn)

# Load external CSV & TSV files
df_bom = pd.read_csv("bom.movie_gross.csv")
df_budgets = pd.read_csv("tn.movie_budgets.csv")
df_rt_info = pd.read_csv("rt.movie_info.tsv", sep="\t", encoding="latin1")
