# Exploratory Data Analysis
In this notebook, we will do some general exploration of the respective datasets to get a feel for what would be the best way to build the predictive model. Woof woof!

## Project Setup

In [1]:
# Importing the necessary libraries
import pandas as pd

In [2]:
# Loading the raw dataset
df_raw = pd.read_csv('../data/raw.csv')

## Raw Dataset Exploration

In [6]:
# Viewing the first few rows of the dataset
df_raw.head()

Unnamed: 0,Name,Category,Rating,Flickable,Episode Number
0,Zoolander 2,Movie,7.0,Yes,10
1,Dope,Movie,8.5,Yes,11
2,The Big Short,Movie,8.0,Yes,12
3,Deadpool,Movie,10.0,Yes,13
4,Vinyl,TV Show,7.5,Yes,15


In [4]:
# Dropping the "Notes" column as it is just used for fun and will not be applicable to the model
df_raw.drop(columns = ['Notes'], inplace = True)

In [5]:
# Keeping only reviews from the main show
df_raw = df_raw[df_raw['Episode Number'].str.startswith('M') == False]

In [8]:
# Changing datatype of "Episode Number" from object to int
df_raw['Episode Number'] = df_raw['Episode Number'].astype(int)

## The Two Banger

In [17]:
# Getting all the reviews between the first and second banger
two_banger = df_raw[df_raw['Episode Number'] > 69]

In [18]:
# Viewing general info about the two banger
two_banger.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 107 to 195
Data columns (total 5 columns):
Name              89 non-null object
Category          89 non-null object
Rating            88 non-null float64
Flickable         89 non-null object
Episode Number    89 non-null int32
dtypes: float64(1), int32(1), object(3)
memory usage: 2.8+ KB


In [19]:
# Viewing the dispersion of category types
two_banger['Category'].value_counts()

Movie      50
Food       14
Other      13
TV Show     8
Person      4
Name: Category, dtype: int64

In [20]:
# Viewing all people rated in this timeframe
two_banger[two_banger['Category'] == 'Person']

Unnamed: 0,Name,Category,Rating,Flickable,Episode Number
108,Neil DeGrasse Tyson,Person,9.5,Yes,72
109,Bill Nye,Person,7.0,Yes,72
152,Michael Biehn,Person,10.0,Yes,100
163,Caelan's little bro,Person,10.0,Yes,114


In [16]:
# Viewing perfect 10s
two_banger[two_banger['Rating'] == 10]

Unnamed: 0,Name,Category,Rating,Flickable,Episode Number
115,Pizza,Food,10.0,Yes,75
118,Summer vacation,Other,10.0,Yes,75
119,Sending caelan free Boston baked beans,Other,10.0,Yes,81
132,Rick and Marty (Season 1),TV Show,10.0,Yes,86
150,Bartenders on the Carolla cruise,Other,10.0,Yes,100
152,Michael Biehn,Person,10.0,Yes,100
163,Caelan's little bro,Person,10.0,Yes,114


In [21]:
# Viewing perfect 0s
two_banger[two_banger['Rating'] == 0]

Unnamed: 0,Name,Category,Rating,Flickable,Episode Number
111,Sonic's,Food,0.0,No,73
144,Whatchamacallit,Food,0.0,No,96


In [25]:
# Viewing low scores
two_banger[two_banger['Rating'] <= 3]

Unnamed: 0,Name,Category,Rating,Flickable,Episode Number
111,Sonic's,Food,0.0,No,73
130,Life,Movie,3.0,No,86
134,Death Note,Movie,0.5,No,87
141,Reese's Pieces,Food,3.0,No,96
142,Reese's Peanut Butter Cup,Food,3.0,Yes,96
143,Twix,Food,2.0,No,96
144,Whatchamacallit,Food,0.0,No,96
153,Mixed berry e-cigarette,Other,3.0,No,100
160,The Cloverfield Paradox,Movie,2.0,No,109
173,Kidney bean,Food,2.0,No,126
