# Box office data analysis for Microsoft recommendations
---
## Introduction
Microsoft recognizes the trend among major corporations venturing into original video content production and is eager to join the industry. To seize this opportunity, Microsoft has made the strategic decision to establish its own movie studio. However, given their lack of expertise in the filmmaking domain, the role of this project is to investigate the current landscape of successful films at the box office and analyzing the types of films that are currently performing exceptionally well in terms of box office revenue. 

The goal is to provide actionable insights that can guide the decision-making process for the head of Microsoft's new movie studio. By understanding the preferences and trends in the market, the project aims to assist Microsoft in identifying the most promising genres, themes, and styles that resonate with audiences.



---








## Details on the data sets
The datasets are from  popular platforms and databases related to the film industry, each serving different purposes. Here's a brief overview of each:
1. **Box Office Mojo:**

**Purpose:** Box Office Mojo is a website that focuses on providing box office revenue information for films.

**Features:** It includes details such as box office gross, release dates, production budgets, and more. It's a valuable resource for understanding the financial performance of movies.

2. **IMDB (Internet Movie Database):**

**Purpose:** IMDB is a comprehensive database of films, television shows, and industry professionals.

**Features:** It includes information on cast and crew, production details, user ratings, reviews, and trivia. IMDB is widely used for its extensive coverage of the entertainment industry.

3. **Rotten Tomatoes:**

**Purpose:** Rotten Tomatoes is a review-aggregation website that provides information on film and television reviews.

**Features:** It aggregates reviews from critics and audiences, providing a "Tomatometer" score that reflects the overall critical reception. It's commonly used to gauge the general consensus on a movie's quality.

4. **TheMovieDB (TMDb):**

**Purpose:** TheMovieDB is an online database that focuses on providing metadata for movies and TV shows.

**Features:** It includes information on cast, crew, plot summaries, posters, and fan-contributed content. Developers often use TMDb for building applications related to movies and TV shows.

5. **The Numbers:**

**Purpose:** The Numbers is a website that focuses on providing detailed financial information for movies.

**Features:** It offers box office data, home video sales, and other financial metrics. It's particularly useful for understanding the financial aspects of the film industry. 

These platforms collectively offer a wealth of information for industry professionals, movie enthusiasts, and those involved in decision-making related to film production, distribution, and consumption. Filmmakers, studios, and investors often use data from these sources to make informed decisions about their projects.

---

## GOAL OF THE PROJECT/ PROBLEM STATEMENT
This project aims at conducting data analysis on box office trends to provide strategic recommendations to Microsoft which would like to get in on the movie business. The ultimate objective is to translate these research findings into practical recommendations for the new movie studio, aiding them in making informed choices about the types of films to produce. This strategic approach ensures that Microsoft's entry into the world of filmmaking aligns with current market demands and maximizes the potential for success in the highly competitive entertainment industry

---

## Importing necessary libraries for the project


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3

## IMDB Data analysis

In [5]:
# Opening up a connection to the database
conn = sqlite3.connect('im.db')

# Displaying the table names
table_name_query = """ 
SELECT name AS table_names
FROM sqlite_master
WHERE type = 'table';
"""
pd.read_sql(table_name_query, conn)

Unnamed: 0,table_names
0,movie_basics
1,directors
2,known_for
3,movie_akas
4,movie_ratings
5,persons
6,principals
7,writers


In [7]:
query = """ 
SELECT *
FROM movie_basics;
"""
pd.read_sql(query, conn)

Unnamed: 0,movie_id,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"
...,...,...,...,...,...,...
146139,tt9916538,Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,2019,123.0,Drama
146140,tt9916622,Rodolpho Teóphilo - O Legado de um Pioneiro,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary
146141,tt9916706,Dankyavar Danka,Dankyavar Danka,2013,,Comedy
146142,tt9916730,6 Gunn,6 Gunn,2017,116.0,


### Analyzing the movie_basics table

In [8]:
query = """ 
SELECT *
FROM movie_ratings;
"""
pd.read_sql(query, conn)

Unnamed: 0,movie_id,averagerating,numvotes
0,tt10356526,8.3,31
1,tt10384606,8.9,559
2,tt1042974,6.4,20
3,tt1043726,4.2,50352
4,tt1060240,6.5,21
...,...,...,...
73851,tt9805820,8.1,25
73852,tt9844256,7.5,24
73853,tt9851050,4.7,14
73854,tt9886934,7.0,5


### Analyzing the movie_ratings table

In [10]:
query = """ 
SELECT *
FROM movie_ratings
ORDER BY averagerating DESC;
"""
pd.read_sql(query, conn)

Unnamed: 0,movie_id,averagerating,numvotes
0,tt5390098,10.0,5
1,tt6295832,10.0,5
2,tt1770682,10.0,5
3,tt2632430,10.0,5
4,tt8730716,10.0,5
...,...,...,...
73851,tt7926296,1.0,17
73852,tt3235258,1.0,510
73853,tt7831076,1.0,96
73854,tt3262718,1.0,223


In [11]:
query = """ 
SELECT *
FROM movie_ratings
ORDER BY numvotes DESC;
"""
pd.read_sql(query, conn)

Unnamed: 0,movie_id,averagerating,numvotes
0,tt1375666,8.8,1841066
1,tt1345836,8.4,1387769
2,tt0816692,8.6,1299334
3,tt1853728,8.4,1211405
4,tt0848228,8.1,1183655
...,...,...,...
73851,tt8420530,6.8,5
73852,tt8747790,4.6,5
73853,tt9367004,8.2,5
73854,tt9647642,2.0,5


## Joining the two tables

In [16]:
query = """ 
 SELECT movie_basics.genres, movie_basics.original_title
FROM movie_basics
JOIN movie_ratings ON movie_basics.movie_id = movie_ratings.movie_id
WHERE movie_ratings.averagerating >= 8
ORDER BY movie_ratings.numvotes DESC;
"""
pd.read_sql(query, conn)

Unnamed: 0,genres,original_title
0,"Action,Adventure,Sci-Fi",Inception
1,"Action,Thriller",The Dark Knight Rises
2,"Adventure,Drama,Sci-Fi",Interstellar
3,"Drama,Western",Django Unchained
4,"Action,Adventure,Sci-Fi",The Avengers
...,...,...
9443,"Biography,Crime,Documentary",Pull of Gravity
9444,"Biography,Documentary,Music",Faith Hope and BBQ
9445,"Adventure,Documentary,History",Who Owns Water
9446,Documentary,Psychic: A Gift of Grace
