# Table of Contents

* [Data Sources](#Data-Sources)
* [Gather the Data](#Gather-the-Data)
    * [Merge Dataframes / Tables](#Merge-Dataframes-/-Tables)
* [Explore the Data](#Explore-the-Data)
* [Model the Data](#Model-the-Data)
* [Visualize the Results](#Visualize-the-Results)


<hr>

## Data Sources

Description of the IMDB data: https://www.imdb.com/interfaces/

IMDB Data Sources: https://datasets.imdbws.com/

<hr>

## Gather the Data

In [1]:
%matplotlib inline

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm

In [3]:
sns.set(rc={'figure.figsize': (12, 10), "lines.markeredgewidth": 0.5 })

In [4]:
#--------------------------------------------------------
#--  Input File 1:  name.basics.tsv
#--------------------------------------------------------
print('Reading name.basics.tsv')
nameBasics = pd.read_csv("../Data/name.basics.tsv/data.tsv", sep='\t')
print('Complete - 1 of 3')
print(nameBasics.head(5))

#--------------------------------------------------------
#--  Input File 6:  title.principals.tsv
#--------------------------------------------------------
print('Reading title.principals.tsv')
titlePrincipals = pd.read_csv("../Data/title.principals.tsv/data.tsv", sep='\t')
print('Complete - 2 of 3')
print(titlePrincipals.head(5))

#--------------------------------------------------------
#--  Input File 7:  title.ratings.tsv
#--------------------------------------------------------
print('Reading title.ratings.tsv')
titleRatings = pd.read_csv("../Data/title.ratings.tsv/data.tsv", sep='\t',dtype={"tconst": object, "averageRating": float, "numVotes": int})
print('CompletitleRatingste - 3 of 3')
print(titleRatings.head(5))

print('\n-----all data loaded -----')

Reading name.basics.tsv
Complete - 1 of 3
      nconst      primaryName birthYear deathYear  \
0  nm0000001     Fred Astaire      1899      1987   
1  nm0000002    Lauren Bacall      1924      2014   
2  nm0000003  Brigitte Bardot      1934        \N   
3  nm0000004     John Belushi      1949      1982   
4  nm0000005   Ingmar Bergman      1918      2007   

                primaryProfession                           knownForTitles  
0  soundtrack,actor,miscellaneous  tt0072308,tt0045537,tt0050419,tt0043044  
1              actress,soundtrack  tt0117057,tt0038355,tt0071877,tt0037382  
2     actress,soundtrack,producer  tt0059956,tt0054452,tt0057345,tt0049189  
3         actor,writer,soundtrack  tt0072562,tt0077975,tt0078723,tt0080455  
4           writer,director,actor  tt0050976,tt0083922,tt0060827,tt0050986  
Reading title.principals.tsv
Complete - 2 of 3
      tconst  ordering     nconst         category                      job  \
0  tt0000001         1  nm1588970             self 

### Dataset Descriptions

**name.basics.tsv.gz** - Contains the following information for names:

- nconst (string) – alphanumeric unique identifier of the name/person
- primaryName (string) – name by which the person is most often credited
- birthYear – in YYYY format
- deathYear – in YYYY format if applicable, else '\N'
- primaryProfession (array of strings) – the top-3 professions of the person
- knownForTitles (array of tconsts) – titles the person is known for


**title.principals.tsv** - Contains the principal cast/crew for titles:

- tconst (string) – alphanumeric unique identifier of the title
- ordering (integer) – a number to uniquely identify rows for a given titleId
- nconst (string) – alphanumeric unique identifier of the name/person
- category (string) – the category of job that person was in
- job (string) – the specific job title if applicable, else '\N'
- characters (string) – the name of the character played if applicable, else '\N' 

### Merge Dataframes / Tables

In [5]:
print(len(titlePrincipals))
titlePrincipals.head(5)

29345162


Unnamed: 0,tconst,ordering,nconst,category,job,characters
0,tt0000001,1,nm1588970,self,\N,"[""Herself""]"
1,tt0000001,2,nm0005690,director,\N,\N
2,tt0000001,3,nm0374658,cinematographer,director of photography,\N
3,tt0000002,1,nm0721526,director,\N,\N
4,tt0000002,2,nm1335271,composer,\N,\N


In [6]:
# replace values with '\N' with the pandas 'NaN'
# source: https://stackoverflow.com/a/49406417
titlePrincipals = titlePrincipals.replace({'\\N': np.nan})

In [7]:
print(len(titlePrincipals))
titlePrincipals.head(5)

29345162


Unnamed: 0,tconst,ordering,nconst,category,job,characters
0,tt0000001,1,nm1588970,self,,"[""Herself""]"
1,tt0000001,2,nm0005690,director,,
2,tt0000001,3,nm0374658,cinematographer,director of photography,
3,tt0000002,1,nm0721526,director,,
4,tt0000002,2,nm1335271,composer,,


In [8]:
titlePrincipals.job.unique()

array([nan, 'director of photography', 'producer', ..., 'co-creater',
       'faculty advisor', 'play "Racajda"'], dtype=object)

In [9]:
print(len(nameBasics))
nameBasics.head()

8739727


Unnamed: 0,nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
0,nm0000001,Fred Astaire,1899,1987,"soundtrack,actor,miscellaneous","tt0072308,tt0045537,tt0050419,tt0043044"
1,nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack","tt0117057,tt0038355,tt0071877,tt0037382"
2,nm0000003,Brigitte Bardot,1934,\N,"actress,soundtrack,producer","tt0059956,tt0054452,tt0057345,tt0049189"
3,nm0000004,John Belushi,1949,1982,"actor,writer,soundtrack","tt0072562,tt0077975,tt0078723,tt0080455"
4,nm0000005,Ingmar Bergman,1918,2007,"writer,director,actor","tt0050976,tt0083922,tt0060827,tt0050986"


In [10]:
# drop unused columns
nameBasics = nameBasics.drop(['knownForTitles'], axis=1)

In [11]:
# replace values with '\N' with the pandas 'NaN'
nameBasics = nameBasics.replace({'\\N': np.nan})

In [12]:
nameBasics[nameBasics.deathYear.isnull() == False].head()

Unnamed: 0,nconst,primaryName,birthYear,deathYear,primaryProfession
0,nm0000001,Fred Astaire,1899,1987,"soundtrack,actor,miscellaneous"
1,nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack"
3,nm0000004,John Belushi,1949,1982,"actor,writer,soundtrack"
4,nm0000005,Ingmar Bergman,1918,2007,"writer,director,actor"
5,nm0000006,Ingrid Bergman,1915,1982,"actress,soundtrack,producer"


In [13]:
nameBasics[nameBasics.deathYear.isnull() == False].index

Int64Index([      0,       1,       3,       4,       5,       6,       7,
                  8,       9,      10,
            ...
            8698619, 8698621, 8698622, 8703060, 8704749, 8705076, 8712952,
            8721585, 8721590, 8727373],
           dtype='int64', length=147760)

In [14]:
# source: https://stackoverflow.com/a/27360130
nameBasics = nameBasics.drop(nameBasics[nameBasics.deathYear.isnull() == False].index)

In [15]:
# drop unused columns
nameBasics = nameBasics.drop(['deathYear'], axis=1)

In [16]:
print(len(nameBasics))
nameBasics.head()

8591967


Unnamed: 0,nconst,primaryName,birthYear,primaryProfession
2,nm0000003,Brigitte Bardot,1934,"actress,soundtrack,producer"
12,nm0000013,Doris Day,1922,"soundtrack,actress,producer"
13,nm0000014,Olivia de Havilland,1916,"actress,soundtrack"
17,nm0000018,Kirk Douglas,1916,"actor,producer,soundtrack"
46,nm0000047,Sophia Loren,1934,"actress,soundtrack"


In [17]:
# check what rows are missing data
nameBasics.isnull().sum()

nconst                     0
primaryName                0
birthYear            8309010
primaryProfession    1515654
dtype: int64

In [18]:
# returns a left join of both dataframes
principal_data = pd.merge(titlePrincipals, nameBasics, how='right', on=['nconst'])

# Check the length of the resulting join
print(len(principal_data))

principal_data.head()

30956887


Unnamed: 0,tconst,ordering,nconst,category,job,characters,primaryName,birthYear,primaryProfession
0,tt0000001,1.0,nm1588970,self,,"[""Herself""]",Carmencita,,soundtrack
1,tt7513040,2.0,nm1588970,archive_footage,,"[""Herself""]",Carmencita,,soundtrack
2,tt0000003,2.0,nm5442194,producer,producer,,Julien Pappé,,producer
3,tt0000003,4.0,nm5442200,editor,,,Tamara Pappé,,editor
4,tt0000005,1.0,nm0443482,actor,,"[""Blacksmith""]",Charles Kayser,,actor


<hr> 

## Explore the Data

**Decision Needed:** Do we limit the analysis of cast and crew to just movies? Or do we take into account their experience on other projects such as television?

**Decision Needed:** Do we remove cast/crew that are no longer living? The goal is to decide who to hire.

In [19]:
# check what rows are missing data
principal_data.isnull().sum()

tconst                5255624
ordering              5255624
nconst                      0
category              5255624
job                  26804599
characters           17653159
primaryName                 0
birthYear            21319728
primaryProfession     2489186
dtype: int64

In [20]:
principal_data.head()

Unnamed: 0,tconst,ordering,nconst,category,job,characters,primaryName,birthYear,primaryProfession
0,tt0000001,1.0,nm1588970,self,,"[""Herself""]",Carmencita,,soundtrack
1,tt7513040,2.0,nm1588970,archive_footage,,"[""Herself""]",Carmencita,,soundtrack
2,tt0000003,2.0,nm5442194,producer,producer,,Julien Pappé,,producer
3,tt0000003,4.0,nm5442200,editor,,,Tamara Pappé,,editor
4,tt0000005,1.0,nm0443482,actor,,"[""Blacksmith""]",Charles Kayser,,actor


In [21]:
# returns a left join of both dataframes
principal_data = pd.merge(principal_data, titleRatings, how='left', on=['tconst'])

# Check the length of the resulting join
print(len(principal_data))

principal_data.head()

30956887


Unnamed: 0,tconst,ordering,nconst,category,job,characters,primaryName,birthYear,primaryProfession,averageRating,numVotes
0,tt0000001,1.0,nm1588970,self,,"[""Herself""]",Carmencita,,soundtrack,5.8,1391.0
1,tt7513040,2.0,nm1588970,archive_footage,,"[""Herself""]",Carmencita,,soundtrack,,
2,tt0000003,2.0,nm5442194,producer,producer,,Julien Pappé,,producer,6.6,979.0
3,tt0000003,4.0,nm5442200,editor,,,Tamara Pappé,,editor,6.6,979.0
4,tt0000005,1.0,nm0443482,actor,,"[""Blacksmith""]",Charles Kayser,,actor,6.2,1673.0


In [22]:
#principal_data.to_csv("../Data/principals_data.csv", sep='\t', index=False)

In [23]:
principal_data.describe()

Unnamed: 0,ordering,averageRating,numVotes
count,25701260.0,6018745.0,6018745.0
mean,4.630871,6.933896,1211.979
std,2.797558,1.393281,17410.34
min,1.0,1.0,5.0
25%,2.0,6.2,9.0
50%,4.0,7.2,23.0
75%,7.0,7.9,97.0
max,10.0,10.0,1974184.0


In [24]:
principal_data.category.value_counts()

actor                  5889876
self                   4577486
actress                4546642
writer                 3002015
director               2826287
producer               1586137
composer                979903
cinematographer         966509
editor                  964908
production_designer     227007
archive_footage         133163
archive_sound             1330
Name: category, dtype: int64

In [25]:
principal_data.job.unique()

array([nan, 'producer', 'original idea', ...,
       'author-Kizudarake no machi-A Wounded Town', 'editor: tip-outs',
       'co-creater'], dtype=object)

In [26]:
principal_data.primaryProfession.unique()

array(['soundtrack', 'producer', 'editor', ...,
       'casting_department,art_department,costume_designer',
       'art_department,casting_department,editorial_department',
       'casting_department,assistant,producer'], dtype=object)

In [27]:
principal_data.nconst.describe()

count      30956887
unique      8591967
top       nm0251041
freq          12715
Name: nconst, dtype: object

In [28]:
# Sort the data by actor then by the category
principal_data = principal_data.sort_values(by=['nconst', 'category'])

In [29]:
principal_data.head()

Unnamed: 0,tconst,ordering,nconst,category,job,characters,primaryName,birthYear,primaryProfession,averageRating,numVotes
251056,tt0044881,1.0,nm0000003,actress,,"[""Manina""]",Brigitte Bardot,1934,"actress,soundtrack,producer",5.5,214.0
251057,tt0046200,3.0,nm0000003,actress,,"[""Domino""]",Brigitte Bardot,1934,"actress,soundtrack,producer",5.4,27.0
251058,tt0047607,1.0,nm0000003,actress,,"[""Anna""]",Brigitte Bardot,1934,"actress,soundtrack,producer",6.2,41.0
251059,tt0048001,3.0,nm0000003,actress,,"[""Hélène Colbert""]",Brigitte Bardot,1934,"actress,soundtrack,producer",5.8,678.0
251060,tt0048103,2.0,nm0000003,actress,,"[""Sophie Dimater""]",Brigitte Bardot,1934,"actress,soundtrack,producer",5.2,96.0


Need to rollup data so that there is one unique row per person, `nconst`, per `category` (there seem to be too many variations on `job`). For each row it will summarize their `averageRating`, `max`, `min`, `numMovies`, `stdDev`, and `numVotes`. This will tell us how many movies they participated in and their average ranking in that role overall based on ratings.

Above analysis could be completed by creating a dataframe that is [MultiIndex](https://pandas.pydata.org/pandas-docs/stable/advanced.html) where the outer level is `tconst` and the second level is based on `nconst`. Somthing similar to below:

| nconst | category | avgs |
| ------ | -------- | ---- |
| nm0251041 | director | 123 |
| | producer | 567 |
| nm5442200 | editor | 891 |
| | writer | 543 |

In [30]:
columns = ['tconst', 'nconst', 'category', 'averageRating', 'numVotes']

principal_subset = pd.DataFrame(principal_data[columns]).copy(deep=True)

In [31]:
# source: https://www.datacamp.com/community/tutorials/pandas-multi-index
# uniquely identify the rows by creating a MultiIndex
ncategory_data = principal_subset.set_index(['nconst', 'category'])

ncategory_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,tconst,averageRating,numVotes
nconst,category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
nm0000003,actress,tt0044881,5.5,214.0
nm0000003,actress,tt0046200,5.4,27.0
nm0000003,actress,tt0047607,6.2,41.0
nm0000003,actress,tt0048001,5.8,678.0
nm0000003,actress,tt0048103,5.2,96.0


In [32]:
# calculate the mean, sum, and std by group
# source: https://stackoverflow.com/a/19385591
ncategory_rollup = ncategory_data.groupby(['nconst', 'category']).agg([np.mean, 'median', np.std, 'count'])

ncategory_rollup.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,averageRating,averageRating,averageRating,averageRating,numVotes,numVotes,numVotes,numVotes
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,median,std,count,mean,median,std,count
nconst,category,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
nm0000003,actress,5.940541,5.8,0.733886,37,1508.918919,378.0,3927.183606,37
nm0000003,archive_footage,6.25,6.8,1.839423,24,310.583333,36.0,789.400318,24
nm0000003,self,6.6,6.9,1.287925,17,43.235294,24.0,46.200013,17
nm0000013,actress,6.75,6.8,0.761364,170,868.764706,11.0,4035.806056,170
nm0000013,archive_footage,6.736364,7.2,1.78565,11,28.363636,13.0,26.84501,11
nm0000013,self,6.925,6.95,1.300513,16,35.5,23.5,35.447614,16
nm0000014,actress,6.830769,6.9,0.684145,52,2778.673077,837.0,6264.044655,52
nm0000014,archive_footage,7.92,7.7,1.327403,5,60.4,49.0,35.767304,5
nm0000014,self,7.784211,8.4,2.107464,19,95.421053,58.0,121.315619,19
nm0000018,actor,6.693182,6.7,0.913097,88,6451.079545,1357.0,19562.332684,88


In [33]:
#ncategory_data.to_csv("../Data/principals_ncategory_data.csv", sep='\t', index=False)

Rollup by movie and category

In [34]:
# uniquely identify the rows by creating a MultiIndex
tcat_data = principal_subset.set_index(['tconst', 'category'])

tcat_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,nconst,averageRating,numVotes
tconst,category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
tt0044881,actress,nm0000003,5.5,214.0
tt0046200,actress,nm0000003,5.4,27.0
tt0047607,actress,nm0000003,6.2,41.0
tt0048001,actress,nm0000003,5.8,678.0
tt0048103,actress,nm0000003,5.2,96.0


In [35]:
# calculate the mean, sum, and std by group
# source: https://stackoverflow.com/a/19385591
tcat_data = principal_subset.groupby(['tconst', 'category']).agg([np.mean, 'median', np.std, 'count'])

tcat_data.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,averageRating,averageRating,averageRating,averageRating,numVotes,numVotes,numVotes,numVotes
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,median,std,count,mean,median,std,count
tconst,category,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
tt0000001,self,5.8,5.8,,1,1391.0,1391.0,,1
tt0000003,editor,6.6,6.6,,1,979.0,979.0,,1
tt0000003,producer,6.6,6.6,,1,979.0,979.0,,1
tt0000005,actor,6.2,6.2,0.0,2,1673.0,1673.0,0.0,2
tt0000011,actor,5.4,5.4,,1,206.0,206.0,,1
tt0000012,self,7.4,7.4,0.0,4,8337.0,8337.0,0.0,4
tt0000014,actor,7.2,7.2,0.0,2,3637.0,3637.0,0.0,2
tt0000016,self,5.9,5.9,,1,946.0,946.0,,1
tt0000017,actor,4.8,4.8,,1,192.0,192.0,,1
tt0000017,actress,4.8,4.8,,1,192.0,192.0,,1


<hr>

_POC for calculating rollups using Pandas methods only_

In [36]:
columns = ['tconst', 'nconst', 'category', 'averageRating', 'numVotes']

test = pd.DataFrame(principal_data[columns])[1:300].copy(deep=True)

# source: https://www.datacamp.com/community/tutorials/pandas-multi-index
# uniquely identify the rows by creating a MultiIndex
test.set_index(['nconst', 'category'], inplace=True)

test.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,tconst,averageRating,numVotes
nconst,category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
nm0000003,actress,tt0046200,5.4,27.0
nm0000003,actress,tt0047607,6.2,41.0
nm0000003,actress,tt0048001,5.8,678.0
nm0000003,actress,tt0048103,5.2,96.0
nm0000003,actress,tt0048321,6.1,101.0


In [37]:
test.index

MultiIndex(levels=[['nm0000003', 'nm0000013'], ['actress', 'archive_footage', 'self']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

In [38]:
test.sort_index(inplace=True)

The [`.agg()`](pandas.pydata.org/pandas-docs/stable/groupby.html#applying-multiple-functions-at-once) function docs.

In [39]:
# calculate the mean, sum, and std by group
# source: https://stackoverflow.com/a/19385591
df = test.groupby(['nconst', 'category']).agg([np.mean, np.std, 'count', 'median', 'quantile'])

df

Unnamed: 0_level_0,Unnamed: 1_level_0,averageRating,averageRating,averageRating,averageRating,averageRating,numVotes,numVotes,numVotes,numVotes,numVotes
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,count,median,quantile,mean,std,count,median,quantile
nconst,category,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
nm0000003,actress,5.952778,0.740458,36,5.85,5.85,1544.888889,3976.704901,36,392.5,392.5
nm0000003,archive_footage,6.25,1.839423,24,6.8,6.8,310.583333,789.400318,24,36.0,36.0
nm0000003,self,6.6,1.287925,17,6.9,6.9,43.235294,46.200013,17,24.0,24.0
nm0000013,actress,6.688,0.745809,150,6.8,6.8,983.026667,4285.123176,150,12.0,12.0


In [40]:
# get the counts from each grouping
count = test.groupby(['nconst', 'category']).size()

count

nconst     category       
nm0000003  actress             36
           archive_footage     41
           self                72
nm0000013  actress            150
dtype: int64

In [41]:
df.unstack()

Unnamed: 0_level_0,averageRating,averageRating,averageRating,averageRating,averageRating,averageRating,averageRating,averageRating,averageRating,averageRating,...,numVotes,numVotes,numVotes,numVotes,numVotes,numVotes,numVotes,numVotes,numVotes,numVotes
Unnamed: 0_level_1,mean,mean,mean,std,std,std,count,count,count,median,...,std,count,count,count,median,median,median,quantile,quantile,quantile
category,actress,archive_footage,self,actress,archive_footage,self,actress,archive_footage,self,actress,...,self,actress,archive_footage,self,actress,archive_footage,self,actress,archive_footage,self
nconst,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
nm0000003,5.952778,6.25,6.6,0.740458,1.839423,1.287925,36.0,24.0,17.0,5.85,...,46.200013,36.0,24.0,17.0,392.5,36.0,24.0,392.5,36.0,24.0
nm0000013,6.688,,,0.745809,,,150.0,,,6.8,...,,150.0,,,12.0,,,12.0,,


_End POC_

<hr>

<hr> 

## Model the Data

<hr> 

## Visualize the Results