# 02 Combine the Disney Films List with IMDB Data

## 02.01 Python Imports

In [1]:
import gzip
import pandas as pd

## 02.02 IMDB Title Basic Info

### 02.02.01 Import IMDB Title data

In [9]:
bs=gzip.open('../Other Source Data/IMDB/title.basics.tsv.gz','rb')
df_basics = pd.read_csv(bs,sep='\t', low_memory=False)

In [10]:
df_basics.shape

(8784772, 9)

In [11]:
df_basics.head(10)

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0000001,short,Carmencita,Carmencita,0,1894,\N,1,"Documentary,Short"
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,\N,5,"Animation,Short"
2,tt0000003,short,Pauvre Pierrot,Pauvre Pierrot,0,1892,\N,4,"Animation,Comedy,Romance"
3,tt0000004,short,Un bon bock,Un bon bock,0,1892,\N,12,"Animation,Short"
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893,\N,1,"Comedy,Short"
5,tt0000006,short,Chinese Opium Den,Chinese Opium Den,0,1894,\N,1,Short
6,tt0000007,short,Corbett and Courtney Before the Kinetograph,Corbett and Courtney Before the Kinetograph,0,1894,\N,1,"Short,Sport"
7,tt0000008,short,Edison Kinetoscopic Record of a Sneeze,Edison Kinetoscopic Record of a Sneeze,0,1894,\N,1,"Documentary,Short"
8,tt0000009,short,Miss Jerry,Miss Jerry,0,1894,\N,40,"Romance,Short"
9,tt0000010,short,Leaving the Factory,La sortie de l'usine Lumière à Lyon,0,1895,\N,1,"Documentary,Short"


We dont want to include any films "startYear" prior to 1936 when Snow White and The Seven Dwarfs was released.  We also don't need to include any films "titleType" other than "movie" as we are only worried about theatrical releases.  However by excluding these things now, we may end up having look through more data later, but we will also avoid any false positives and some duplicates when we do the matching later. Since startYear is an object type, and there are too many values to see them all, we'll just leave it be for now.

## 02.03 Disney Film List

### 02.03.01 Import Disney Film List

In [12]:
FL_Disney = pd.read_csv('../Other Source Data/DD_Films_List_Disney_Com/Film_list_Disney_com.csv')
FL_Disney.rename(     columns=({ 'Movie Title': 'title'}),     inplace=True )

print(FL_Disney.shape)
FL_Disney.head()

(657, 1)


Unnamed: 0,title
0,101 Dalmatians
1,101 Dalmatians (1996)
2,101 Dalmatians II: Patch's London Adventure
3,102 Dalmatians
4,"20,000 Leagues Under the Sea"


We ran in to a lot of title issues originally.  It will be faster to create lowercase, stripped down title columns, and join on those.

In [13]:
df_basics['lower_title'] = df_basics['primaryTitle'].str.lower().replace(r'[^A-Za-z0-9 ]+', '', regex=True)

In [14]:
FL_Disney['lower_title'] = FL_Disney['title'].str.lower().replace(r'[^A-Za-z0-9 ]+', '', regex=True)

In [2]:
#Double checking an issue we ran in to later down the line

In [3]:
df_basics[df_basics['tconst']=='tt0074968']

NameError: name 'df_basics' is not defined

In [4]:
FL_Disney[FL_Disney['lower_title'].str.contains("deposit")]

NameError: name 'FL_Disney' is not defined

## 02.04 Merge The Disney and IMDB Data

This is just a first pass to make sure the merge is going to work as we expect, find all of the correct titles, explore any duplicates or missing values, etc.

In [18]:
FL_Disney_IMDB = pd.merge(FL_Disney ,                 # left df
                          df_basics[['tconst','lower_title','titleType','startYear']],                  # right df
                          how="left",                 # left join
                          left_on='lower_title',            # left column
                          right_on='lower_title',    # right column
                          indicator = True,           # indicates source of each row
                          #validate = "one_to_many"    # alerts us of the relationship from left to right, incase there are dups
        )

## 02.05 Explore Combined Disney and IMDB Data

In [5]:
FL_Disney_IMDB.head()

NameError: name 'FL_Disney_IMDB' is not defined

In [6]:
print(FL_Disney_IMDB.shape)
FL_Disney_IMDB['_merge'].value_counts()

# Original results for this cell
# (6143, 11)
# both          6018
# left_only      125
# right_only       0

# 1st update  results for this cell
# (6293, 12)
# both          6209
# left_only       84
# right_only       0
# Name: _merge, dtype: int64

# 2nd update  results for this cell
# (6584, 6)
# both          6502
# left_only       82
# right_only       0
# Name: _merge, dtype: int64

# So there is more duplication, but fewer 'left only' cells, meaning it foudn more matches

NameError: name 'FL_Disney_IMDB' is not defined

In [21]:
FL_Disney_IMDB['titleType'].value_counts()


#  Original results for this cell
# tvEpisode       3675
# movie            904
# short            668
# tvMovie          244
# video            218
# tvSeries         160
# videoGame         83
# tvMiniSeries      56
# tvShort            6
# tvSpecial          4


tvEpisode       3973
movie            974
short            736
tvMovie          262
video            232
tvSeries         170
videoGame         86
tvMiniSeries      59
tvShort            6
tvSpecial          4
Name: titleType, dtype: int64

We started with 657 rows in the Disney Film list, and ended with 6502 after the joined the tables, so there are a lot of duplicates that we'll need to clean up.  82 title from the original Disney list were not found in the IMDM list. To clean this up, we'll look at the 82 from the Disney list, and explore those seperately.  

We'll pull out any single matches from the list, and explore those under the hypothesis that they are correct but will look to disprove that hypothesis.  

Finally, we'll look at the duplicated values from the list and attempt to ID which titles we really need to keep.  

The goal is to get back down to the original 657 row with complete IMDB data.  Since we only want to consider feature releases, not TV or Video, then we will likely end up with fewer than the original 657 AFTER we've done some filtering.

## 02.06 Films missing from IMDB

### 02.06.01 Identify and explore titles not found on IMDB

First, lets look at the 82 that were not found in IMDB at all. We're going to create a new column for 'updated_title' and correct that as we go.  We are operating under the assumption that as we pull in future datasets, we will run in to more name issues and will not always have the 'tconst' in each data set to rely on.  This will allow us to have multiple name options to join on later if we choose, and allow us to not alter the titles as we originally pulled them.

In [22]:
FL_Disney_IMDB[FL_Disney_IMDB['tconst'].isnull()].head(35)

Unnamed: 0,title,lower_title,tconst,titleType,startYear,_merge
144,A Very Playhouse Disney Holiday,a very playhouse disney holiday,,,,left_only
318,Aladdin: The Return of Jafar,aladdin the return of jafar,,,,left_only
498,America's Heart and Soul,americas heart and soul,,,,left_only
570,Annie (1999),annie 1999,,,,left_only
642,"Atta Girl, Kelly!",atta girl kelly,,,,left_only
1587,"Chronicles of Narnia: The Lion, the Witch and ...",chronicles of narnia the lion the witch and th...,,,,left_only
1869,Disney Animation Collection Volume 1: Mickey A...,disney animation collection volume 1 mickey an...,,,,left_only
1870,Disney Animation Collection Volume 2: Three Li...,disney animation collection volume 2 three lit...,,,,left_only
1871,Disney Animation Collection Volume 3: The Prin...,disney animation collection volume 3 the princ...,,,,left_only
1872,Disney Animation Collection Volume 4: The Tort...,disney animation collection volume 4 the torto...,,,,left_only


Spot checking the first 5:
 - A Very Playhouse Disney Holiday, NOT on IMDB.com, looks like a DVD release, 2005, NR, 56m <mark>We want to consider feature films. Not video releases.</mark>
 - Aladdin: The Return of Jafar: Appears on IMDB.com as "Aladdin 2: The Return of Jafar" but video, not a feature film, 1994, G, 1h 9m #tt0107952 <mark>We want to consider feature films. Not video releases.</mark>
 - America's Heart and Soul is in IMDB but as America's Heart & Soul, with an ampersand. <mark>This will be changed in the new updated_title.</mark>
 - Annie (1999) is also in IMDB as Annie, however, it is an episode of The Wonderful World of Disney and not considered a feature release. <mark>We want to consider feature films. Not TV movies. </mark>
 - Atta Girl, Kelly! is also in IMDB.cim as Atta Girl, Kelly!: Love Is Blind, however, it is an episode of The Wonderful World of Disney and not considered a feature release. <mark>We want to consider feature films. Not TV movies. </mark>
 
Disney Animated Collection:
 - These all appear to be made for TV <mark>The titleType will all need to be  updated. </mark>
 
Disney Sing Along:
 - These all appear to be made for video <mark>The titleType will all need to be  updated. </mark> 

For the sake of time, we will not be updating any rows that we intend to use in our analysis.  That means we will not be updating any of the following:
tvEpisodes, 
shorts, 
tvMovies,
videos ,
tvSeries,
videoGames ,
tvMiniSeries,
tvShorts,
tvSpecials

### 02.06.02 Correct titles not found on IMDB

In [23]:
FL_Disney_IMDB.at[498,'updated_title'] = "America's Heart And Soul"    # America'S Heart And Soul tt0381006
FL_Disney_IMDB.at[498,'tconst'] = "tt0381006"    # America'S Heart And Soul tt0381006

Since we kept the index values from the Disney list, we could use the index to update all of the "titleType"s for the TV movies or video movies <br>
FL_Disney_IMDB.at[141,'titleType'] = 'tvMovie' <BR>
However, for the sake of take, we won't bother updating any titles that we don't intend to use in our analysis.  We will scan the new list of 84 titles for feature releases and update them in the temp df. 

In [24]:
FL_Disney_IMDB.at[570,'updated_title'] = "Annie"               # Annie (1999) tt0207972
FL_Disney_IMDB.at[570,'tconst'] = "tt0207972"               # Annie (1999) tt0207972

FL_Disney_IMDB.at[1587,'updated_title'] = "The Chronicles of Narnia: The Lion, the Witch and the Wardrobe" # Chronicles Of Narnia: The Lion, The Witch And  tt0363771
FL_Disney_IMDB.at[1587,'tconst'] = "tt0363771" # Chronicles Of Narnia: The Lion, The Witch And  tt0363771

In [25]:
FL_Disney_IMDB.at[1896,'updated_title'] = "American Legends"   # Disney'S American Legends tt0372866
FL_Disney_IMDB.at[1896,'tconst'] = "tt0372866"   # Disney'S American Legends tt0372866

In [26]:
FL_Disney_IMDB.at[1903,'updated_title'] = "DuckTales the Movie: Treasure of the Lost Lamp" # 1791	Ducktales: The Movie - Treasure Of The Lost Lamp	tt0099472
FL_Disney_IMDB.at[1903,'tconst'] = "tt0099472" # 1791	Ducktales: The Movie - Treasure Of The Lost Lamp	tt0099472

In [27]:
FL_Disney_IMDB.at[2256,'updated_title'] = "Freaky Friday" # 2141	Freaky Friday (1976)	tt0076054
FL_Disney_IMDB.at[2256,'tconst'] = "tt0076054" # 2141	Freaky Friday (1976)	tt0076054

In [28]:
FL_Disney_IMDB.at[2424,'updated_title'] = "Hannah Montana and Miley Cyrus: Best of Both Worlds Concert" # 2304	Hannah Montana & Miley Cyrus: Best Of Both Wor...	tt1127884
FL_Disney_IMDB.at[2424,'tconst'] = "tt1127884" # 2304	Hannah Montana & Miley Cyrus: Best Of Both Wor... tt1127884

In [29]:
FL_Disney_IMDB.at[1902,'updated_title'] = "Horton Hears a Who!" # Dr. Seuss' Horton Hears A Who tt0451079
FL_Disney_IMDB.at[1902,'tconst'] = "tt0451079" # Dr. Seuss' Horton Hears A Who tt0451079

In [30]:
FL_Disney_IMDB.at[2551,'updated_title'] = "Disney High School Musical: China" # 2429	High School Musical: China	tt1556143
FL_Disney_IMDB.at[2551,'tconst'] = "tt1556143" # 2429	High School Musical: China  tt1556143

In [31]:
FL_Disney_IMDB.at[3065,'updated_title'] = "Jonas Brothers: The 3D Concert Experience" # 2937	Jonas Brothers: The Concert Experience	tt1229827
FL_Disney_IMDB.at[3065,'tconst'] = "tt1229827" # 2937	Jonas Brothers: The Concert Experience  tt1229827

FL_Disney_IMDB.at[3448,'updated_title'] = "Doctor Strange" # 3283	Marvel Studios' Doctor Strange		tt1211837
FL_Disney_IMDB.at[3448,'tconst'] = "tt1211837" # 3283	Marvel Studios' Doctor Strange	  tt1211837

In [32]:
FL_Disney_IMDB.at[3450,'updated_title'] = "Avengers: Age of Ultron" # 3285	Marvel'S Avengers: Age Of Ultron		tt2395427
FL_Disney_IMDB.at[3450,'tconst'] = "tt2395427" # 3285	Marvel'S Avengers: Age Of Ultron	  tt2395427

FL_Disney_IMDB.at[3451,'updated_title'] = "Captain America: The Winter Soldier" # 33286	Marvel'S Captain America: Winter Soldier		tt1843866
FL_Disney_IMDB.at[3451,'tconst'] = "tt1843866" # 3286	Marvel'S Captain America: Winter Soldier	  tt1843866

FL_Disney_IMDB.at[3455,'updated_title'] = "Iron Man 3" # 3290	Marvel'S Iron Man 3 	tt1300854
FL_Disney_IMDB.at[3455,'tconst'] = "tt1300854" # 3290	Marvel'S Iron Man 3	  tt1300854

FL_Disney_IMDB.at[4240,'updated_title'] = "Ralph Breaks the Internet" # 4045	Ralph Breaks The Internet: Wreck-It Ralph 2 	tt5848272
FL_Disney_IMDB.at[4240,'tconst'] = "tt5848272" # 4045	Ralph Breaks The Internet: Wreck-It Ralph 2	  tt5848272

In [33]:
FL_Disney_IMDB.at[4286,'updated_title'] = "Peter Pan 2: Return to Never Land" # 4091	Return To Never Land	tt0280030
FL_Disney_IMDB.at[4286,'tconst'] = "tt0280030" # 4091	Return To Never Land		  tt0280030

FL_Disney_IMDB.at[4664,'updated_title'] = "Sleeping Beauty" # 4465	Sleeping Beauty (1959)	tt0053285tt0053285
FL_Disney_IMDB.at[4664,'tconst'] = "tt0053285" # 4465	Sleeping Beauty (1959)	tt0053285

In [34]:
FL_Disney_IMDB.at[4995,'updated_title'] = "Tall Tale" # 4761	Tall Tale: The Unbelievable Adventure	tt0111359
FL_Disney_IMDB.at[4995,'tconst'] = "tt0111359" # 4761	Tall Tale: The Unbelievable Adventure	tt0111359

FL_Disney_IMDB.at[5262,'updated_title'] = "The Boys" # 5008	The Boys: The Sherman Brothers' Story tt1015971
FL_Disney_IMDB.at[5262,'tconst'] = "tt1015971" # 5008	The Boys: The Sherman Brothers' Story tt1015971

In [35]:

# FL_Disney_IMDB.at[1587,'updated_title'] = "Charlie, the Lonesome Cougar"  #   Charlie The Lonesome Cougar tt0062793
# FL_Disney_IMDB.at[1587,'tconst'] = "tt0062793"  #   Charlie The Lonesome Cougar tt0062793

# FL_Disney_IMDB.at[2363,'updated_title'] = "Herbie Fully Loaded" # 2363	Herbie: Fully Loaded	tt0400497
# FL_Disney_IMDB.at[2363,'tconst'] = "tt0400497" # 2363	Herbie: Fully Loaded  tt0400497

FL_Disney_IMDB.at[6165,'updated_title'] = "Toby Tyler or Ten Weeks with a Circus" # 5891	Toby Tyler	   tt0054390
FL_Disney_IMDB.at[6165,'tconst'] = "tt0054390" # 5891	Toby Tyler	   tt0054390

FL_Disney_IMDB.at[6543,'updated_title'] = "Disneynature: Wings of Life" # 6255	Wings Of Life	  tt1222816
FL_Disney_IMDB.at[6543,'tconst'] = "tt1222816" # 6255	Wings Of Life	tt1222816

FL_Disney_IMDB.at[6544,'updated_title'] = "Winnie the Pooh" # 6256	Winnie The Pooh (2011)	  tt1449283
FL_Disney_IMDB.at[6544,'tconst'] = "tt1449283" # 6256	Winnie The Pooh (2011)	 tt1449283
   

We can see the 22 titles that we've updated

In [36]:
FL_Disney_IMDB.isnull().sum()

title               0
lower_title         0
tconst             60
titleType          82
startYear          82
_merge              0
updated_title    6562
dtype: int64

### 02.06.03 Marking for removal films not found in IMDB that we won't be using

Filling in the tconst with 'remove' so we know to pull these out of the final list later.

In [37]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Disney Animation Collection Volume")), 'tconst'] = 'remove'

In [38]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Disney Sing")), 'tconst'] = 'remove'

In [39]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Disney Princess Sing")), 'tconst'] = 'remove'

In [40]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Mickey Mouse Clubhouse")), 'tconst'] = 'remove'

In [41]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Pixar Short Films Collection")), 'tconst'] = 'remove'

In [42]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Disney Learning Adventures")), 'tconst'] = 'remove'

In [43]:
FL_Disney_IMDB[FL_Disney_IMDB['title'].str.contains("Sing Along Songs")]

Unnamed: 0,title,lower_title,tconst,titleType,startYear,_merge,updated_title
1883,Disney Princess Sing Along Songs Volume One: O...,disney princess sing along songs volume one on...,remove,,,left_only,
1884,Disney Princess Sing Along Songs Volume Three:...,disney princess sing along songs volume three ...,remove,,,left_only,
1885,Disney Princess Sing Along Songs Volume Two: E...,disney princess sing along songs volume two en...,remove,,,left_only,
1889,Disney Sing Along Songs: Happy Haunting,disney sing along songs happy haunting,remove,,,left_only,
1890,Disney Sing Along Songs: Home on the Range - L...,disney sing along songs home on the range lit...,remove,video,2004.0,both,
4616,Sing Along Songs: 101 Dalmatians -- Pongo & Pe...,sing along songs 101 dalmatians pongo perdita,,,,left_only,
4617,Sing Along Songs: Brother Bear - On My Way,sing along songs brother bear on my way,tt1664858,video,2003.0,both,
4618,Sing Along Songs: Campout At Walt Disney World,sing along songs campout at walt disney world,,,,left_only,
4619,Sing Along Songs: Mary Poppins -- Supercalifra...,sing along songs mary poppins supercalifragil...,,,,left_only,
4620,Sing Along Songs: Peter Pan -- You Can Fly!,sing along songs peter pan you can fly,,,,left_only,


In [44]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['title'].str.contains("Sing Along Songs")), 'tconst'] = 'remove'

In [45]:
FL_Disney_IMDB.isnull().sum()

title               0
lower_title         0
tconst             31
titleType          82
startYear          82
_merge              0
updated_title    6562
dtype: int64

In [46]:
FL_Disney_IMDB[FL_Disney_IMDB['tconst'].isnull()].head(35)

Unnamed: 0,title,lower_title,tconst,titleType,startYear,_merge,updated_title
144,A Very Playhouse Disney Holiday,a very playhouse disney holiday,,,,left_only,
318,Aladdin: The Return of Jafar,aladdin the return of jafar,,,,left_only,
642,"Atta Girl, Kelly!",atta girl kelly,,,,left_only,
1877,Disney Junior Holiday,disney junior holiday,,,,left_only,
1881,Disney Princess Party: Volume One,disney princess party volume one,,,,left_only,
1898,Doc McStuffins: Pet Vet,doc mcstuffins pet vet,,,,left_only,
1986,"Elfego Baca And The Swamp Fox, Legendary Heroes",elfego baca and the swamp fox legendary heroes,,,,left_only,
2326,Frozen (Sing-Along Edition),frozen singalong edition,,,,left_only,
2425,Hannah Montana: Pop Star Profile,hannah montana pop star profile,,,,left_only,
2434,"Hans Brinker, Or The Silver Skates",hans brinker or the silver skates,,,,left_only,


The rest of these are either direct to video or TV movies.  We label them all for removal.

In [47]:
FL_Disney_IMDB.loc[(FL_Disney_IMDB['tconst'].isnull()), 'tconst'] = 'remove'

In [48]:
FL_Disney_IMDB.isnull().sum()

title               0
lower_title         0
tconst              0
titleType          82
startYear          82
_merge              0
updated_title    6562
dtype: int64

In [49]:
FL_Disney_IMDB.drop(FL_Disney_IMDB[FL_Disney_IMDB['tconst'] == "remove"].index, inplace = True)

In [50]:
FL_Disney_IMDB.isnull().sum()

title               0
lower_title         0
tconst              0
titleType          22
startYear          22
_merge              0
updated_title    6497
dtype: int64

We have now removed any films that were not Disney feature releases but there are a lot of duplicates. Let's get the title and tconst columns, then pull in the other IDDB data and explore.

## 02.07 Duplicate Films

### 02.07.01 Identify and Remove Duplicate Films

In [51]:
FL_Disney_IMDB = FL_Disney_IMDB[['title','tconst']]

In [52]:
FL_Disney_IMDB['dup'] = FL_Disney_IMDB['tconst'].duplicated()

There are likely some duplicate tconst values. So we'll remove those.

In [53]:
FL_Disney_IMDB.drop(FL_Disney_IMDB[FL_Disney_IMDB['dup']==True].index, inplace = True)

In [54]:
FL_Disney_IMDB = FL_Disney_IMDB[['title','tconst']]

The index is the original index from our Disney List, 'tconst' is the unique identifier from IMDB.  Now we need to go back to the IMDB data, pull in the meta data, and deal with the duplicates.  

## 02.08 Additional IMDB features

### 02.08.01 Merging to pull in more IMDB features

In [55]:
FL_Disney_IMDB_combined = pd.merge(FL_Disney_IMDB ,                 # left df
                          df_basics,                  # right df
                          how="left",                 # left join
                          left_on='tconst',            # left column
                          right_on='tconst',    # right column
                          indicator = True,           # indicates source of each row
                          #validate = "one_to_many"    # alerts us of the relationship from left to right, incase there are dups
        )

### 02.08.02 Exploring more IMDB features

In [56]:
FL_Disney_IMDB_combined.titleType.value_counts()

tvEpisode       3738
movie            947
short            707
tvMovie          235
video            209
tvSeries         156
videoGame         78
tvMiniSeries      56
tvShort            6
tvSpecial          4
Name: titleType, dtype: int64

In [57]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Deposit")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
3405,No Deposit No Return,tt0069011,movie,"No Deposit, No Return","No Deposit, No Return",0,1972,\N,80,\N,no deposit no return,both
3406,No Deposit No Return,tt0074968,movie,"No Deposit, No Return",No Deposit No Return,0,1976,\N,112,"Comedy,Family",no deposit no return,both
3407,No Deposit No Return,tt0267784,movie,"No Deposit, No Return","No Deposit, No Return",0,2000,\N,90,Action,no deposit no return,both
3408,No Deposit No Return,tt0399414,short,"No Deposit, No Return","No Deposit, No Return",0,2004,\N,12,"Romance,Short",no deposit no return,both
3409,No Deposit No Return,tt0566167,tvEpisode,"No Deposit, No Return","No Deposit, No Return",0,1992,\N,\N,Comedy,no deposit no return,both
3410,No Deposit No Return,tt1037519,tvEpisode,"No Deposit, No Return","No Deposit, No Return",0,1965,\N,\N,Comedy,no deposit no return,both
3411,No Deposit No Return,tt1281796,tvEpisode,No Deposit No Return,No Deposit No Return,0,1990,\N,\N,Comedy,no deposit no return,both
3412,No Deposit No Return,tt1324782,tvEpisode,"No Deposit, No Return","No Deposit, No Return",0,1985,\N,\N,Comedy,no deposit no return,both
3413,No Deposit No Return,tt2145698,short,No Deposit No Return,No Deposit No Return,0,1997,\N,22,"Drama,Short",no deposit no return,both
3414,No Deposit No Return,tt8728662,movie,No Deposit No Return,No Deposit No Return,1,1975,\N,54,Adult,no deposit no return,both


For this project, we want to focus on movies.  Anything tv, video, or short can come out. Keeping only 'movie' types

In [58]:
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined[FL_Disney_IMDB_combined.titleType == 'movie']

In [59]:
FL_Disney_IMDB_combined.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 947 entries, 0 to 6124
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   title           947 non-null    object  
 1   tconst          947 non-null    object  
 2   titleType       947 non-null    object  
 3   primaryTitle    947 non-null    object  
 4   originalTitle   947 non-null    object  
 5   isAdult         947 non-null    object  
 6   startYear       947 non-null    object  
 7   endYear         947 non-null    object  
 8   runtimeMinutes  947 non-null    object  
 9   genres          947 non-null    object  
 10  lower_title     947 non-null    object  
 11  _merge          947 non-null    category
dtypes: category(1), object(11)
memory usage: 89.8+ KB


In [60]:
FL_Disney_IMDB_combined._merge.value_counts()

both          947
left_only       0
right_only      0
Name: _merge, dtype: int64

In [61]:
FL_Disney_IMDB_combined.head(20)

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
0,101 Dalmatians,tt0115433,movie,101 Dalmatians,101 Dalmatians,0,1996,\N,103,"Adventure,Comedy,Crime",101 dalmatians,both
5,102 Dalmatians,tt0211181,movie,102 Dalmatians,102 Dalmatians,0,2000,\N,100,"Adventure,Comedy,Family",102 dalmatians,both
9,"20,000 Leagues Under the Sea",tt0006333,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1916,\N,105,"Action,Adventure,Sci-Fi",20000 leagues under the sea,both
10,"20,000 Leagues Under the Sea",tt0046672,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1954,\N,127,"Adventure,Drama,Family",20000 leagues under the sea,both
13,"20,000 Leagues Under the Sea",tt0397230,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1973,\N,60,"Animation,Family",20000 leagues under the sea,both
14,"20,000 Leagues Under the Sea",tt0498755,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1985,\N,50,"Action,Adventure,Animation",20000 leagues under the sea,both
17,"20,000 Leagues Under the Sea",tt10915850,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,\N,\N,\N,Adventure,20000 leagues under the sea,both
23,"20,000 Leagues Under the Sea",tt9328210,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,\N,\N,\N,Animation,20000 leagues under the sea,both
24,A Bug's Life,tt0120623,movie,A Bug's Life,A Bug's Life,0,1998,\N,95,"Adventure,Animation,Comedy",a bugs life,both
37,A Christmas Carol,tt0029992,movie,A Christmas Carol,A Christmas Carol,0,1938,\N,69,"Drama,Family,Fantasy",a christmas carol,both


### 02.08.03 Exploring more duplicates

Creating a list of the dupcate value.  Will have to seach IMDB to make sure we have the correct tconst going forward.  We could bump this up against one of the other source lists that include the year of release, but there's no guarantee that everything on this list will be on those, as we already know the counts are different, and there are some discrepancies in the month/year of release. Ultimately, this may prove faster than creating code and trying to account each variant.

In [62]:
dups_df = FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].isin(FL_Disney_IMDB_combined['title'].value_counts()[FL_Disney_IMDB_combined['title'].value_counts()>2].index)]
print(dups_df.shape)
dups_df.head(30)

(541, 12)


Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
9,"20,000 Leagues Under the Sea",tt0006333,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1916,\N,105,"Action,Adventure,Sci-Fi",20000 leagues under the sea,both
10,"20,000 Leagues Under the Sea",tt0046672,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1954,\N,127,"Adventure,Drama,Family",20000 leagues under the sea,both
13,"20,000 Leagues Under the Sea",tt0397230,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1973,\N,60,"Animation,Family",20000 leagues under the sea,both
14,"20,000 Leagues Under the Sea",tt0498755,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,1985,\N,50,"Action,Adventure,Animation",20000 leagues under the sea,both
17,"20,000 Leagues Under the Sea",tt10915850,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,\N,\N,\N,Adventure,20000 leagues under the sea,both
23,"20,000 Leagues Under the Sea",tt9328210,movie,"20,000 Leagues Under the Sea","20,000 Leagues Under the Sea",0,\N,\N,\N,Animation,20000 leagues under the sea,both
37,A Christmas Carol,tt0029992,movie,A Christmas Carol,A Christmas Carol,0,1938,\N,69,"Drama,Family,Fantasy",a christmas carol,both
38,A Christmas Carol,tt0039562,movie,A Christmas Carol,Leyenda de Navidad,0,1947,\N,80,"Drama,Fantasy",a christmas carol,both
39,A Christmas Carol,tt0044008,movie,A Christmas Carol,Scrooge,0,1951,\N,86,"Drama,Family,Fantasy",a christmas carol,both
59,A Christmas Carol,tt0443734,movie,A Christmas Carol,Natale a casa Deejay - A Christmas Carol,0,2004,\N,75,"Comedy,Fantasy",a christmas carol,both


### 02.08.03 Removing the incorrect duplicates

In [63]:
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "20,000 Leagues Under the Sea") & (FL_Disney_IMDB_combined['tconst'] != "tt0046672")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "A Christmas Carol") & (FL_Disney_IMDB_combined['tconst'] != "tt1067106")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Aladdin") & ( (FL_Disney_IMDB_combined['tconst'] != "tt6139732") & (FL_Disney_IMDB_combined['tconst'] != "tt0103639"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Alice in Wonderland") & ( (FL_Disney_IMDB_combined['tconst'] != "tt0043274") & (FL_Disney_IMDB_combined['tconst'] != "tt1014759"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Amy") & (FL_Disney_IMDB_combined['tconst'] != "tt0082017")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Around the World in 80 Days") & (FL_Disney_IMDB_combined['tconst'] != "tt0327437")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Bambi") & (FL_Disney_IMDB_combined['tconst'] != "tt0034492")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Beauty and the Beast") & ( (FL_Disney_IMDB_combined['tconst'] != "tt2771200") & (FL_Disney_IMDB_combined['tconst'] != "tt0101414"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Big Red") & (FL_Disney_IMDB_combined['tconst'] != "tt0055793")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Black Widow") & (FL_Disney_IMDB_combined['tconst'] != "tt3480822")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Bon Voyage!") & (FL_Disney_IMDB_combined['tconst'] != "tt0055807")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Brave") & (FL_Disney_IMDB_combined['tconst'] != "tt1217209")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Cheetah") & (FL_Disney_IMDB_combined['tconst'] != "tt0097053")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Cinderella") & ( (FL_Disney_IMDB_combined['tconst'] != "tt0042332") & (FL_Disney_IMDB_combined['tconst'] != "tt1661199"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Coco") & (FL_Disney_IMDB_combined['tconst'] != "tt2380307")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Dinosaur") & (FL_Disney_IMDB_combined['tconst'] != "tt0130623")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Dumbo") & ( (FL_Disney_IMDB_combined['tconst'] != "tt0033563") & (FL_Disney_IMDB_combined['tconst'] != "tt3861390"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Earth") & (FL_Disney_IMDB_combined['tconst'] != "tt0393597")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Epic") & (FL_Disney_IMDB_combined['tconst'] != "tt0848537")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Emil and the Detectives") & (FL_Disney_IMDB_combined['tconst'] != "tt0058056")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Enchanted") & (FL_Disney_IMDB_combined['tconst'] != "tt0461770")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Endurance") & (FL_Disney_IMDB_combined['tconst'] != "tt0120659")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Fantasia") & (FL_Disney_IMDB_combined['tconst'] != "tt0032455")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Frozen") & (FL_Disney_IMDB_combined['tconst'] != "Frozen")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Gus") & (FL_Disney_IMDB_combined['tconst'] != "tt0074599")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Heidi") & (FL_Disney_IMDB_combined['tconst'] != "tt0107099")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Hercules") & (FL_Disney_IMDB_combined['tconst'] != "tt0119282")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Home on the Range") & (FL_Disney_IMDB_combined['tconst'] != "tt0299172")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Ice Age") & (FL_Disney_IMDB_combined['tconst'] != "tt0268380")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "I'll Be Home for Christmas") & (FL_Disney_IMDB_combined['tconst'] != "tt0155753")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "In Search Of The Castaways") & (FL_Disney_IMDB_combined['tconst'] != "tt0056095")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Inside Out") & (FL_Disney_IMDB_combined['tconst'] != "tt2096673")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Into the Woods") & (FL_Disney_IMDB_combined['tconst'] != "tt2180411")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Invincible") & (FL_Disney_IMDB_combined['tconst'] != "tt0445990")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Iron Will") & (FL_Disney_IMDB_combined['tconst'] != "tt0110157")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Kidnapped") & (FL_Disney_IMDB_combined['tconst'] != "tt0053994")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Luca") & (FL_Disney_IMDB_combined['tconst'] != "tt12801262")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Man of the House") & (FL_Disney_IMDB_combined['tconst'] != "tt0113755")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Midnight Madness") & (FL_Disney_IMDB_combined['tconst'] != "tt0081159")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Miracle") & (FL_Disney_IMDB_combined['tconst'] != "tt0349825	")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Moana") & (FL_Disney_IMDB_combined['tconst'] != "tt3521164")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Mulan") & ( (FL_Disney_IMDB_combined['tconst'] != "tt0120762") & (FL_Disney_IMDB_combined['tconst'] != "tt4566758"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Never a Dull Moment") & (FL_Disney_IMDB_combined['tconst'] != "tt0063341")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "No Deposit No Return") & (FL_Disney_IMDB_combined['tconst'] != "tt0074968")].index)
# *************************************************************************  This is a TV movie, and would have been removed at this point
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Oliver Twist") & (FL_Disney_IMDB_combined['tconst'] != "tt0119825")].index)
# *************************************************************************
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Peter Pan") & (FL_Disney_IMDB_combined['tconst'] != "tt0046183")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Pinocchio") & (FL_Disney_IMDB_combined['tconst'] != "tt0032910")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Rascal") & (FL_Disney_IMDB_combined['tconst'] != "tt0064875")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Rio") & (FL_Disney_IMDB_combined['tconst'] != "tt1436562")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Robin Hood") & (FL_Disney_IMDB_combined['tconst'] != "tt0070608")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "RocketMan") & (FL_Disney_IMDB_combined['tconst'] != "tt0120029")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Sky High") & (FL_Disney_IMDB_combined['tconst'] != "tt0405325")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Smith!") & (FL_Disney_IMDB_combined['tconst'] != "tt0065003")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Snow White and the Seven Dwarfs") & (FL_Disney_IMDB_combined['tconst'] != "tt0029583")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Something Wicked This Way Comes") & (FL_Disney_IMDB_combined['tconst'] != "tt0086336")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Soul") & (FL_Disney_IMDB_combined['tconst'] != "tt2948372")].index)
# ************************************************************************* This is a TV movie, and would have been removed at this point
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Starstruck") & (FL_Disney_IMDB_combined['tconst'] != "tt1579247")].index)
# *************************************************************************
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Swiss Family Robinson") & (FL_Disney_IMDB_combined['tconst'] != "tt0054357")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Tangled") & (FL_Disney_IMDB_combined['tconst'] != "tt0398286")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Teacher's Pet") & (FL_Disney_IMDB_combined['tconst'] != "tt0350194")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Tex") & (FL_Disney_IMDB_combined['tconst'] != "tt0084783")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Black Hole") & (FL_Disney_IMDB_combined['tconst'] != "tt0078869")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Duke") & (FL_Disney_IMDB_combined['tconst'] != "tt0196516")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Hunchback of Notre Dame") & (FL_Disney_IMDB_combined['tconst'] != "tt0116583")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Kid") & (FL_Disney_IMDB_combined['tconst'] != "tt0219854")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Lion King") & ( (FL_Disney_IMDB_combined['tconst'] != "tt0110357") & (FL_Disney_IMDB_combined['tconst'] != "tt6105098"))].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Little Mermaid") & (FL_Disney_IMDB_combined['tconst'] != "tt0097757")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Lone Ranger") & (FL_Disney_IMDB_combined['tconst'] != "tt1210819")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Rookie") & (FL_Disney_IMDB_combined['tconst'] != "tt0265662")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Sorcerer's Apprentice") & (FL_Disney_IMDB_combined['tconst'] != "tt0963966")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Three Musketeers") & (FL_Disney_IMDB_combined['tconst'] != "tt0108333")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Wild") & (FL_Disney_IMDB_combined['tconst'] != "tt0405469")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Tonka") & (FL_Disney_IMDB_combined['tconst'] != "tt0052300")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Treasure Island") & (FL_Disney_IMDB_combined['tconst'] != "tt0043067")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Underdog") & (FL_Disney_IMDB_combined['tconst'] != "tt0467110")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Valiant") & (FL_Disney_IMDB_combined['tconst'] != "tt0361089")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "White Fang") & (FL_Disney_IMDB_combined['tconst'] != "tt0103247")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Heavyweights") & (FL_Disney_IMDB_combined['tconst'] != "tt0110006")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Old Dogs") & (FL_Disney_IMDB_combined['tconst'] != "tt0976238")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Pocahontas") & (FL_Disney_IMDB_combined['tconst'] != "tt0114148")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Hocus Pocus") & (FL_Disney_IMDB_combined['tconst'] != "tt0107120")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Prom") & (FL_Disney_IMDB_combined['tconst'] != "tt1604171")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The BFG") & (FL_Disney_IMDB_combined['tconst'] != "tt3691740")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Tomorrowland") & (FL_Disney_IMDB_combined['tconst'] != "tt1964418")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Black Panther") & (FL_Disney_IMDB_combined['tconst'] != "tt1825683")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Bolt") & (FL_Disney_IMDB_combined['tconst'] != "tt0397892")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Music Man")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Miracle Worker")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "The Perfect Game")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Tower Of Terror")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['title'] == "Bedtime Stories")].index)

### 02.08.05 Exploring films prior to 1937

There shouldn't be anything in here prios to 1937 (when Snow White and the Seven Dwarfs was released) and I wouldn't expect to see any startYear values missing unless it's a future release.  So lets explore all of those.

In [64]:
FL_Disney_IMDB_combined.sort_values(by='startYear')

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
3787,Pollyanna,tt0011588,movie,Pollyanna,Pollyanna,0,1920,\N,58,"Comedy,Drama,Family",pollyanna,both
4183,Shipwrecked,tt0017379,movie,Shipwrecked,Shipwrecked,0,1926,\N,74,"Adventure,Drama,Romance",shipwrecked,both
4301,Snow White and the Seven Dwarfs,tt0029583,movie,Snow White and the Seven Dwarfs,Snow White and the Seven Dwarfs,0,1937,\N,83,"Adventure,Animation,Family",snow white and the seven dwarfs,both
4822,The Biscuit Eater,tt0032254,movie,The Biscuit Eater,The Biscuit Eater,0,1940,\N,81,"Drama,Family",the biscuit eater,both
1785,Fantasia,tt0032455,movie,Fantasia,Fantasia,0,1940,\N,125,"Animation,Family,Fantasy",fantasia,both
...,...,...,...,...,...,...,...,...,...,...,...,...
4056,Robots,tt12579470,movie,Robots,Robots,0,\N,\N,\N,"Comedy,Romance,Sci-Fi",robots,both
4275,Smart House,tt6824720,movie,Smart House,Smart House,0,\N,\N,\N,"Horror,Thriller",smart house,both
4802,The Aristocats,tt17220728,movie,The Aristocats,The Aristocats,0,\N,\N,\N,Family,the aristocats,both
4883,The Christmas Star,tt3511874,movie,The Christmas Star,The Christmas Star,0,\N,\N,\N,"Drama,Family",the christmas star,both


In [66]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title']== 'Pollyanna']

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
3787,Pollyanna,tt0011588,movie,Pollyanna,Pollyanna,0,1920,\N,58,"Comedy,Drama,Family",pollyanna,both
3788,Pollyanna,tt0054195,movie,Pollyanna,Pollyanna,0,1960,\N,134,"Comedy,Drama,Family",pollyanna,both


In [67]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title']== 'Shipwrecked']

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
4183,Shipwrecked,tt0017379,movie,Shipwrecked,Shipwrecked,0,1926,\N,74,"Adventure,Drama,Romance",shipwrecked,both
4184,Shipwrecked,tt0099816,movie,Shipwrecked,Haakon Haakonsen,0,1990,\N,92,"Adventure,Family",shipwrecked,both


In [73]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains('Alexander')]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
256,"Alexander and the Terrible, Horrible, No Good,...",tt13782222,movie,"Alexander and the Terrible, Horrible, No Good,...","Alexander and the Terrible, Horrible, No Good,...",0,\N,\N,\N,"Comedy,Family",alexander and the terrible horrible no good ve...,both
258,"Alexander and the Terrible, Horrible, No Good,...",tt1698641,movie,"Alexander and the Terrible, Horrible, No Good,...","Alexander and the Terrible, Horrible, No Good,...",0,2014,\N,81,"Comedy,Drama,Family",alexander and the terrible horrible no good ve...,both


In [74]:
# Alexander and the Terrible, Horrible, No Good,... is a remake under development
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt13782222"].index, inplace = True)

In [75]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains('Encanto')]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
1693,Encanto,tt2953050,movie,Encanto,Encanto,0,2021,\N,102,"Animation,Comedy,Family",encanto,both
1695,Encanto,tt5964462,movie,Encanto,Encanto,0,\N,\N,\N,Drama,encanto,both


In [76]:
#tt5964462 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt5964462"].index, inplace = True)

In [77]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains('Flight of the Navigator')]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
1870,Flight of the Navigator,tt0091059,movie,Flight of the Navigator,Flight of the Navigator,0,1986,\N,90,"Adventure,Comedy,Family",flight of the navigator,both
1872,Flight of the Navigator,tt1444308,movie,Flight of the Navigator,Flight of the Navigator,0,\N,\N,\N,"Adventure,Comedy,Family",flight of the navigator,both


In [78]:
#tt1444308 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt1444308"].index, inplace = True)

In [79]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains('James and the')]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
2697,James and the Giant Peach,tt0116683,movie,James and the Giant Peach,James and the Giant Peach,0,1996,\N,79,"Adventure,Animation,Family",james and the giant peach,both
2699,James and the Giant Peach,tt6009922,movie,James and the Giant Peach,James and the Giant Peach,0,\N,\N,\N,"Adventure,Family,Fantasy",james and the giant peach,both


In [80]:
#tt6009922 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt6009922"].index, inplace = True)

In [81]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Mr. Toad's Wild Ride")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
3313,Mr. Toad's Wild Ride,tt0118172,movie,Mr. Toad's Wild Ride,The Wind in the Willows,0,1996,\N,88,"Adventure,Comedy,Family",mr toads wild ride,both
3314,Mr. Toad's Wild Ride,tt2369119,movie,Mr. Toad's Wild Ride,Mr. Toad's Wild Ride,0,\N,\N,\N,"Animation,Family",mr toads wild ride,both


In [82]:
#tt2369119 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt2369119"].index, inplace = True)

In [83]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Robots")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
4042,Robots,tt0358082,movie,Robots,Robots,0,2005,\N,91,"Adventure,Animation,Comedy",robots,both
4056,Robots,tt12579470,movie,Robots,Robots,0,\N,\N,\N,"Comedy,Romance,Sci-Fi",robots,both


In [84]:
#tt12579470 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt12579470"].index, inplace = True)

In [85]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Smart House")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
4275,Smart House,tt6824720,movie,Smart House,Smart House,0,\N,\N,\N,"Horror,Thriller",smart house,both


In [86]:
#tt6824720 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt6824720"].index, inplace = True)

In [87]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Aristocats")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
4801,The Aristocats,tt0065421,movie,The Aristocats,The AristoCats,0,1970,\N,78,"Adventure,Animation,Comedy",the aristocats,both
4802,The Aristocats,tt17220728,movie,The Aristocats,The Aristocats,0,\N,\N,\N,Family,the aristocats,both


In [88]:
#tt6824720 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt17220728"].index, inplace = True)

In [89]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("The Christmas Star")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
4879,The Christmas Star,tt0090840,movie,The Christmas Star,The Christmas Star,0,1986,\N,94,"Drama,Family",the christmas star,both
4883,The Christmas Star,tt3511874,movie,The Christmas Star,The Christmas Star,0,\N,\N,\N,"Drama,Family",the christmas star,both


In [90]:
#tt3511874 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt3511874"].index, inplace = True)

In [91]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Greatest Game Ever")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
4986,The Greatest Game Ever Played,tt0388980,movie,The Greatest Game Ever Played,The Greatest Game Ever Played,0,2005,\N,120,"Biography,Drama,Sport",the greatest game ever played,both
4989,The Greatest Game Ever Played,tt16358988,movie,The Greatest Game Ever Played,The Greatest Game Ever Played,0,\N,\N,\N,Documentary,the greatest game ever played,both


In [92]:
#tt3511874 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt16358988"].index, inplace = True)

In [93]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Tower Of Terror")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge


In [94]:
#tt4922444 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt4922444"].index, inplace = True)

In [95]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].str.contains("Tron")]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
5939,Tron,tt0084827,movie,TRON,TRON,0,1982,\N,96,"Action,Adventure,Sci-Fi",tron,both
5942,Tron,tt12916062,movie,Tron,Tron,0,\N,\N,\N,\N,tron,both


In [96]:
#tt12916062 is an error
FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['tconst'] == "tt12916062"].index, inplace = True)

In [97]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['startYear'] == "\\N"]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge


In [72]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['title'].isnull()]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge


In [99]:
FL_Disney_IMDB_combined.sort_values(by='titleType')
FL_Disney_IMDB_combined['titleType'].value_counts()

movie    453
Name: titleType, dtype: int64

### 02.08.06 Removing films prior to 1937

In [100]:
FL_Disney_IMDB_combined.sort_values(by='startYear')
FL_Disney_IMDB_combined['startYear'].value_counts()

2009    13
2016    12
2007    12
2010    12
2011    11
        ..
1937     1
1942     1
1946     1
1947     1
1944     1
Name: startYear, Length: 83, dtype: int64

In [69]:
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['tconst'] == "tt0011588")].index)
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['tconst'] == "tt0017379")].index)

In [71]:
FL_Disney_IMDB_combined = FL_Disney_IMDB_combined.drop(FL_Disney_IMDB_combined[(FL_Disney_IMDB_combined['tconst'] == "tt0246786")].index)

In [70]:
FL_Disney_IMDB_combined[FL_Disney_IMDB_combined['startYear'] == "\\N"]

Unnamed: 0,title,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,lower_title,_merge
256,"Alexander and the Terrible, Horrible, No Good,...",tt13782222,movie,"Alexander and the Terrible, Horrible, No Good,...","Alexander and the Terrible, Horrible, No Good,...",0,\N,\N,\N,"Comedy,Family",alexander and the terrible horrible no good ve...,both
1695,Encanto,tt5964462,movie,Encanto,Encanto,0,\N,\N,\N,Drama,encanto,both
1872,Flight of the Navigator,tt1444308,movie,Flight of the Navigator,Flight of the Navigator,0,\N,\N,\N,"Adventure,Comedy,Family",flight of the navigator,both
2699,James and the Giant Peach,tt6009922,movie,James and the Giant Peach,James and the Giant Peach,0,\N,\N,\N,"Adventure,Family,Fantasy",james and the giant peach,both
3314,Mr. Toad's Wild Ride,tt2369119,movie,Mr. Toad's Wild Ride,Mr. Toad's Wild Ride,0,\N,\N,\N,"Animation,Family",mr toads wild ride,both
4056,Robots,tt12579470,movie,Robots,Robots,0,\N,\N,\N,"Comedy,Romance,Sci-Fi",robots,both
4275,Smart House,tt6824720,movie,Smart House,Smart House,0,\N,\N,\N,"Horror,Thriller",smart house,both
4802,The Aristocats,tt17220728,movie,The Aristocats,The Aristocats,0,\N,\N,\N,Family,the aristocats,both
4883,The Christmas Star,tt3511874,movie,The Christmas Star,The Christmas Star,0,\N,\N,\N,"Drama,Family",the christmas star,both
4989,The Greatest Game Ever Played,tt16358988,movie,The Greatest Game Ever Played,The Greatest Game Ever Played,0,\N,\N,\N,Documentary,the greatest game ever played,both


In [101]:
FL_Disney_IMDB_combined.sort_values(by='genres')
FL_Disney_IMDB_combined['genres'].value_counts()

Adventure,Animation,Comedy    75
Adventure,Comedy,Family       21
Adventure,Family,Fantasy      20
Drama,Family                  19
Action,Adventure,Comedy       19
                              ..
Adventure,Family,History       1
Comedy,Western                 1
Musical,Romance                1
Adventure,Documentary          1
Documentary,Family,News        1
Name: genres, Length: 75, dtype: int64

In [98]:
FL_Disney_IMDB_combined.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 453 entries, 0 to 6124
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   title           453 non-null    object  
 1   tconst          453 non-null    object  
 2   titleType       453 non-null    object  
 3   primaryTitle    453 non-null    object  
 4   originalTitle   453 non-null    object  
 5   isAdult         453 non-null    object  
 6   startYear       453 non-null    object  
 7   endYear         453 non-null    object  
 8   runtimeMinutes  453 non-null    object  
 9   genres          453 non-null    object  
 10  lower_title     453 non-null    object  
 11  _merge          453 non-null    category
dtypes: category(1), object(11)
memory usage: 43.0+ KB


We've removed errors, corrected erros, removed duplcates, and removed anything that was not a feature release.  That leaves us with 453 Disney Movies. This is the data we will start with to gather Ratings and Votes

## 02.09 Export Features Films

In [102]:
FL_Disney_IMDB_combined.to_csv('../Bens_Data/Combined_IMDB_Disney_453.csv')