# Investigate a TMDb movie Database

## Introduction

In this project we will be investigating a TMDb movies database file which has collection of important details of about 10k+ movies, including their details of budget, revenue, release dates, etc.

Let's take a glimpse at TMDb movie database csv file...

In [1]:
import pandas as pd

#reading tmdb csv file and storing that to a variable
glimpse_tmdb = pd.read_csv('data/data.csv')

#calling out first 5 rows (excluding headers) of tmdb database
glimpse_tmdb.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,tagline,...,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,...,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015,137999939.3,1392446000.0
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,...,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/2015,6185,7.1,2015,137999939.3,348161300.0
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,...,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/2015,2480,6.3,2015,101199955.5,271619000.0
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,...,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/2015,5292,7.5,2015,183999919.0,1902723000.0
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,...,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/2015,2947,7.3,2015,174799923.1,1385749000.0


### What can we say about the dataset provided?
<ul>
    <li>The columns *'budget', 'revenue', 'budget_adj', 'revenue_adj'* has not given us the currency but for this dataset we will assume that it is in dollars.</li>
    <li>The vote count for each movie is not similar, for example, the movie *'Mad Max : Fury Road'* has *6k+* votes while *Sinister 2* has only *331 votes* (as seen above). Since the votes of the movies vary so much the *vote_average* column also is effected by it. So we cannot calculate or assume that movie with highest votes or rating was more successful since the voters of each film vary.</li>
</ul>  

### What Questions can be brainstormed?
Looking at this database...
<ul>
<li>The first question comes in my mind is which movie gained the most profit or we can also kind of say that which movie has been the people's favourite?</li>

<li>Since this is just the glimpse of the database, the glimpse of the data just shows the movies in the year 2015, but there are also other movies released in different years so the Second question comes in my mind is in which year the movies made the most profit?</li>

<li>Finally my curious mind wanted to know what are the similar characteristics of movies which have gained highest profits?</li>
</ul>


### Questions to be Answered
<ol>
    <li>General questions about the dataset.</li>
        <ol type = 'a'>
            <li>Which movie earns the most and least profit?</li>
            <li>Which movie had the greatest and least runtime?</li>
            <li>Which movie had the greatest and least budget?</li>
            <li>Which movie had the greatest and least revenue?</li>
            <li>What is the average runtime of all movies?</li>
            <li>In which year we had the most movies making profits?</li>
        </ol>
    <li>What are the similar characteristics does the most profitable movie have?</li>
        <ol type = 'a'>
            <li>Average duration of movies.</li>
            <li>Average Budget.</li>
            <li>Average revenue.</li>
            <li>Average profits.</li>
            <li>Which director directed most films?</li>
            <li>Whcih cast has appeared the most?</li>
            <li>Which genre were more successful?</li>
        </ol>
</ol>


-----


## Data Cleaning

**Before answering the above questions we need a clean dataset which has columns and rows we need for calculations.**

First, lets clean up the columns.
We will only keep the columns we need and remove the rest of them.

Columns to delete -  `id, imdb_id, popularity, budget_adj, revenue_adj, homepage, keywords, overview, production_companies, vote_count and vote_average.`

**We have already cleaned the dataset for you**

In [74]:
#importing all the nescessory libraries we need for our analysis
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

#this variable will store the database of tmdb movies into a dataframe
movie_data = pd.read_csv('data/movie_data_clean.csv')
movie_data.head(3)

Unnamed: 0,budget_(in_US-Dollars),revenue_(in_US-Dollars),profit_(in_US_Dollars),original_title,cast,director,tagline,runtime,genres,release_date,release_year
0,150000000,1513528810,1363528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,124,Action|Adventure|Science Fiction|Thriller,2015-06-09,2015
1,150000000,378436354,228436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,George Miller,What a Lovely Day.,120,Action|Adventure|Science Fiction|Thriller,2015-05-13,2015
2,110000000,295238201,185238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,Robert Schwentke,One Choice Can Destroy You,119,Adventure|Science Fiction|Thriller,2015-03-18,2015


**Now let's dig deep and answer the questions!**

### Q1. 1A Which movie earns the most and least profit?

### 1B Which movie had the greatest and least runtime?

### 1C Which movie had the greatest and least budget?

### 1D Which movie had the greatest and least budget?

### 1E What is the average runtime of all movies?

### 1F In which year we had the most movies making profits?

### Q2. 2A Average runtime of movies

### 2B Average Budget of Movies

### 2C Average Revenue of Movies

### 2D Average Profit of Movies

### 2E Which directer directed most films?

### 2F Which cast has appeared the most?

### 2G Which genre were more successful?