# **IMDB Movie Analysis**

## 1. Download IMDB Dataset
**Note:** If you can't download using gdown due to limited number of downloads, please download it manually and upload it to your drive, then copy it from the drive to colab.

In [19]:
# Dataset: https://drive.google.com/file/d/1xaj40SRwgcabIxsV1SeNKv2M4UtI3h80/view?usp=share_link
!gdown 1xaj40SRwgcabIxsV1SeNKv2M4UtI3h80

Downloading...
From: https://drive.google.com/uc?id=1xaj40SRwgcabIxsV1SeNKv2M4UtI3h80
To: /content/IMDB-Movie-Data.csv
  0% 0.00/310k [00:00<?, ?B/s]100% 310k/310k [00:00<00:00, 78.8MB/s]


## 2. Import libraries

In [2]:
import pandas as pd

## 3. Load and view dataset

In [3]:
dataset_path = 'IMDB-Movie-Data.csv'
data = pd.read_csv(dataset_path)
data.head(2)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  872 non-null    float64
 11  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB


In [5]:
data.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0


## 4. Data Selection

In [6]:
data['Genre'].head(2)

Unnamed: 0,Genre
0,"Action,Adventure,Sci-Fi"
1,"Adventure,Mystery,Sci-Fi"


In [7]:
data[['Title','Genre','Actors','Director','Rating']].tail(2)

Unnamed: 0,Title,Genre,Actors,Director,Rating
998,Search Party,"Adventure,Comedy","Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",Scot Armstrong,5.6
999,Nine Lives,"Comedy,Family,Fantasy","Kevin Spacey, Jennifer Garner, Robbie Amell,Ch...",Barry Sonnenfeld,5.3


In [8]:
data.set_index('Title').loc[['Suicide Squad']][['Genre','Rating']]

Unnamed: 0_level_0,Genre,Rating
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Suicide Squad,"Action,Adventure,Fantasy",6.2


In [9]:
data.iloc[:6:2][['Title','Rating','Revenue (Millions)']]

Unnamed: 0,Title,Rating,Revenue (Millions)
0,Guardians of the Galaxy,8.1,333.13
2,Split,7.3,138.12
4,Suicide Squad,6.2,325.02


In [10]:
data['Revenue (Millions)'].quantile(0.75)

113.715

In [11]:
data[((data['Year'] >= 2015) & (data['Year'] <= 2016))
      & (data['Rating'] > 7.0)
      & (data['Revenue (Millions)'] > data['Revenue (Millions)'].quantile(0.95))].sort_values('Revenue (Millions)', ascending=False).head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
50,51,Star Wars: Episode VII - The Force Awakens,"Action,Adventure,Fantasy",Three decades after the defeat of the Galactic...,J.J. Abrams,"Daisy Ridley, John Boyega, Oscar Isaac, Domhna...",2015,136,8.1,661608,936.63,81.0
12,13,Rogue One,"Action,Adventure,Sci-Fi",The Rebel Alliance makes a risky move to steal...,Gareth Edwards,"Felicity Jones, Diego Luna, Alan Tudyk, Donnie...",2016,133,7.9,323118,532.17,65.0
119,120,Finding Dory,"Animation,Adventure,Comedy","The friendly but forgetful blue tang fish, Dor...",Andrew Stanton,"Ellen DeGeneres, Albert Brooks,Ed O'Neill, Kai...",2016,97,7.4,157026,486.29,77.0
94,95,Avengers: Age of Ultron,"Action,Adventure,Sci-Fi",When Tony Stark and Bruce Banner try to jump-s...,Joss Whedon,"Robert Downey Jr., Chris Evans, Mark Ruffalo, ...",2015,141,7.4,516895,458.99,66.0
35,36,Captain America: Civil War,"Action,Adventure,Sci-Fi",Political interference in the Avengers' activi...,Anthony Russo,"Chris Evans, Robert Downey Jr.,Scarlett Johans...",2016,147,7.9,411656,408.08,75.0


In [12]:
data.groupby('Director')[['Rating']].mean().head(2)

Unnamed: 0_level_0,Rating
Director,Unnamed: 1_level_1
Aamir Khan,8.5
Abdellatif Kechiche,7.8


## 5. Dealing with missing values

In [13]:
data.isnull().sum()

Unnamed: 0,0
Rank,0
Title,0
Genre,0
Description,0
Director,0
Actors,0
Year,0
Runtime (Minutes),0
Rating,0
Votes,0


In [14]:
data[data.isnull().any(axis=1)].head(2)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
7,8,Mindhorn,Comedy,A has-been actor best known for playing the ti...,Sean Foley,"Essie Davis, Andrea Riseborough, Julian Barrat...",2016,89,6.4,2490,,71.0
22,23,Hounds of Love,"Crime,Drama,Horror",A cold-blooded predatory couple while cruising...,Ben Young,"Emma Booth, Ashleigh Cummings, Stephen Curry,S...",2016,108,6.7,1115,,72.0


In [15]:
data.dropna().shape

(838, 12)

In [16]:
data.fillna(300).shape

(1000, 12)

In [17]:
data.drop(['Revenue (Millions)', 'Metascore'], axis=1).head(2)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820


In [18]:
def rating_group(rating):
    if rating >= 7.5:
        return 'Good'
    elif rating >= 6.0:
        return 'Average'
    else:
        return 'Bad'

data['Rating_category'] = data['Rating'].apply(rating_group)
data[['Title','Director','Rating','Rating_category']].head(5)

Unnamed: 0,Title,Director,Rating,Rating_category
0,Guardians of the Galaxy,James Gunn,8.1,Good
1,Prometheus,Ridley Scott,7.0,Average
2,Split,M. Night Shyamalan,7.3,Average
3,Sing,Christophe Lourdelet,7.2,Average
4,Suicide Squad,David Ayer,6.2,Average
