# Walt Disney Movies

The code in this notebook will access a few Wikipedia pages using python to create a list of movies released by Walt Disney Studios. It then pulls the movie details from The Open Movie Database API.

In [1]:
#Import used libraries
import requests as req
import pandas as pd
from bs4 import BeautifulSoup as bs


### Wikipedia Decade Links

The list of movies are stored in a couple of Wikipedia articles found here: https://en.wikipedia.org/wiki/Lists_of_Walt_Disney_Studios_films

Because of this, we'll create a list with all the decade links found in this Wikipedia page. 

In [2]:

base_link = 'https://en.wikipedia.org/'
main_url = req.get('https://en.wikipedia.org/wiki/Lists_of_Walt_Disney_Studios_films')

soup = bs(main_url.content)

link_table = soup.select('.mw-parser-output ul')
link_table = link_table[0]

link_list = []

for link in link_table.find_all('a', href=True):
    link_dec = base_link + link['href'].replace('%E2%80%93', '-')
    link_list.append(link_dec)

link_list

['https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1937-1959)',
 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1960-1979)',
 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1980-1989)',
 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1990-1999)',
 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(2000-2009)',
 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(2010-2019)',
 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(2020-2029)']

### Movie List

To get the full list of movies, we'll need parse various tables in each of the links identified above.

In [3]:
movie_list = []

for link, l in enumerate(link_list):
    
    link_content = req.get(link_list[link]).content
    link_soup = bs(link_content)
        
    tables = link_soup.find_all('table', class_='wikitable sortable')
    
    for table, t in enumerate(tables):
        
        if link == 6 and table > 0:
            break
            
        else:
            rows = tables[table].find_all('tr')
        
            for row, r in enumerate(rows):
                if row >= 1 :

                    movie_cell = rows[row].find_all('i')
                    release_cell = rows[row].find_all('td')

                    title = movie_cell[0].get_text(strip=True)
                    release = release_cell[0].get_text(strip=True)

                    movie_info = {'Title': title, 'Released': release, 'Decade Link': link_list[link]}
                    movie_list.append(movie_info)      

print(movie_list[:5])
print(len(movie_list))

[{'Title': 'Academy Award Review of Walt Disney Cartoons', 'Released': 'May 19, 1937', 'Decade Link': 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1937-1959)'}, {'Title': 'Snow White and the Seven Dwarfs', 'Released': 'December 21, 1937', 'Decade Link': 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1937-1959)'}, {'Title': 'Pinocchio', 'Released': 'February 7, 1940', 'Decade Link': 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1937-1959)'}, {'Title': 'Fantasia', 'Released': 'November 13, 1940', 'Decade Link': 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1937-1959)'}, {'Title': 'The Reluctant Dragon', 'Released': 'June 20, 1941', 'Decade Link': 'https://en.wikipedia.org//wiki/List_of_Walt_Disney_Studios_films_(1937-1959)'}]
818


### Saving the Movie List

Now that we have a list of movies, we'll move it to a pandas DataFrame to do a bit of clean up before getting the movie details.

In [4]:
movie_df = pd.DataFrame.from_dict(movie_list)
movie_df.head()

Unnamed: 0,Title,Released,Decade Link
0,Academy Award Review of Walt Disney Cartoons,"May 19, 1937",https://en.wikipedia.org//wiki/List_of_Walt_Di...
1,Snow White and the Seven Dwarfs,"December 21, 1937",https://en.wikipedia.org//wiki/List_of_Walt_Di...
2,Pinocchio,"February 7, 1940",https://en.wikipedia.org//wiki/List_of_Walt_Di...
3,Fantasia,"November 13, 1940",https://en.wikipedia.org//wiki/List_of_Walt_Di...
4,The Reluctant Dragon,"June 20, 1941",https://en.wikipedia.org//wiki/List_of_Walt_Di...


In [5]:
movie_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 818 entries, 0 to 817
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Title        818 non-null    object
 1   Released     818 non-null    object
 2   Decade Link  818 non-null    object
dtypes: object(3)
memory usage: 19.3+ KB


In [6]:
movie_df['Released Date'] = pd.to_datetime(movie_df['Released'], errors='coerce')
movie_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 818 entries, 0 to 817
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Title          818 non-null    object        
 1   Released       818 non-null    object        
 2   Decade Link    818 non-null    object        
 3   Released Date  773 non-null    datetime64[ns]
dtypes: datetime64[ns](1), object(3)
memory usage: 25.7+ KB


In [7]:
movie_df[movie_df['Released Date'].isnull()]

Unnamed: 0,Title,Released,Decade Link,Released Date
90,The Jungle Book,The Jungle Book,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
134,Freaky Friday,Freaky Friday,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
136,The Many Adventures of Winnie the Pooh,The Many Adventures of Winnie the Pooh,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
138,The Rescuers,The Rescuers,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
151,The London Connection,The London Connection,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
155,The Last Flight of Noah's Ark,The Last Flight of Noah's Ark,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
173,The Black Cauldron,"July 26, 1985[1]",https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
261,A Stranger Among Us,A Stranger Among Us,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
266,Sarafina!,Sarafina!,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT
275,The Cemetery Club,The Cemetery Club,https://en.wikipedia.org//wiki/List_of_Walt_Di...,NaT


In [8]:
movie_df = movie_df.fillna(method='ffill')

movie_df = movie_df[:815]

movie_df['Released Year'] = movie_df['Released Date'].dt.year.apply(str)

movie_df.drop('Released', axis = 1, inplace=True)


In [9]:
movie_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 815 entries, 0 to 814
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Title          815 non-null    object        
 1   Decade Link    815 non-null    object        
 2   Released Date  815 non-null    datetime64[ns]
 3   Released Year  815 non-null    object        
dtypes: datetime64[ns](1), object(3)
memory usage: 25.6+ KB


### Movie Details Using OMDb API

For each movie in the list, we will generate a link and use the OMDb API to get the movie details. Then we'll clean it up a bit and then save it to a CSV to analyze from it.

More info on the OMDb API can be found on their website: http://www.omdbapi.com/

In [10]:
base_api = 'http://www.omdbapi.com/?'
api_key = '&apikey=XXXXXX'

movie_detail = []

for movie, m in movie_df.iterrows():
    print(movie)
    title_link = 't=' + movie_df['Title'][movie].replace(' ', '+')
    year_link = '&y=' + movie_df['Released Year'][movie]
    api_link = base_api + title_link + year_link + api_key
    details = req.get(api_link).json()
    movie_detail.append(details)
    

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [11]:
movie_detail[:5]

[{'Title': 'Academy Award Review of Walt Disney Cartoons',
  'Year': '1937',
  'Rated': 'Approved',
  'Released': '19 May 1937',
  'Runtime': '41 min',
  'Genre': 'Animation, Short, Comedy, Family',
  'Director': 'N/A',
  'Writer': 'N/A',
  'Actors': 'Billy Bletcher, Dorothy Compton, Eddie Holden, Mary Moder',
  'Plot': 'A compilation of five Oscar-winning Disney shorts, released to help promote the upcoming release of Snow White and the Seven Dwarfs (1937).',
  'Language': 'English',
  'Country': 'USA',
  'Awards': 'N/A',
  'Poster': 'https://m.media-amazon.com/images/M/MV5BYmYzNzM3NTUtZGM1Zi00YTg0LWI5M2QtODJhOTFmMmQ1MTE3XkEyXkFqcGdeQXVyNjk0NDQ0OTY@._V1_SX300.jpg',
  'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.1/10'}],
  'Metascore': 'N/A',
  'imdbRating': '7.1',
  'imdbVotes': '76',
  'imdbID': 'tt0263027',
  'Type': 'movie',
  'DVD': 'N/A',
  'BoxOffice': 'N/A',
  'Production': 'N/A',
  'Website': 'N/A',
  'Response': 'True'},
 {'Title': 'Snow White and the Seven D

In [12]:
detail_df = pd.DataFrame.from_dict(movie_detail)
detail_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 815 entries, 0 to 814
Data columns (total 27 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Title         772 non-null    object
 1   Year          772 non-null    object
 2   Rated         772 non-null    object
 3   Released      772 non-null    object
 4   Runtime       772 non-null    object
 5   Genre         772 non-null    object
 6   Director      772 non-null    object
 7   Writer        772 non-null    object
 8   Actors        772 non-null    object
 9   Plot          772 non-null    object
 10  Language      772 non-null    object
 11  Country       772 non-null    object
 12  Awards        772 non-null    object
 13  Poster        772 non-null    object
 14  Ratings       772 non-null    object
 15  Metascore     772 non-null    object
 16  imdbRating    772 non-null    object
 17  imdbVotes     772 non-null    object
 18  imdbID        772 non-null    object
 19  Type    

In [13]:
detail_df.head()

Unnamed: 0,Title,Year,Rated,Released,Runtime,Genre,Director,Writer,Actors,Plot,...,imdbVotes,imdbID,Type,DVD,BoxOffice,Production,Website,Response,Error,totalSeasons
0,Academy Award Review of Walt Disney Cartoons,1937,Approved,19 May 1937,41 min,"Animation, Short, Comedy, Family",,,"Billy Bletcher, Dorothy Compton, Eddie Holden,...",A compilation of five Oscar-winning Disney sho...,...,76,tt0263027,movie,,,,,True,,
1,Snow White and the Seven Dwarfs,1937,Approved,04 Feb 1938,83 min,"Animation, Family, Fantasy, Musical, Romance","William Cottrell, David Hand, Wilfred Jackson,...","Jacob Grimm (fairy tales), Wilhelm Grimm (fair...","Roy Atwell, Stuart Buchanan, Adriana Caselotti...",Exiled into the dangerous forest by her wicked...,...,181331,tt0029583,movie,27 Mar 2007,"$184,925,486",,,True,,
2,Pinocchio,1940,G,23 Feb 1940,88 min,"Animation, Comedy, Family, Fantasy, Musical","Norman Ferguson, T. Hee, Wilfred Jackson, Jack...","Carlo Collodi (from the story by), Ted Sears (...","Mel Blanc, Billy Bletcher, Don Brodie, Stuart ...","A living puppet, with the help of a cricket as...",...,131352,tt0032910,movie,,"$84,254,167",Walt Disney Productions,,True,,
3,Fantasia,1940,G,19 Sep 1941,125 min,"Animation, Family, Fantasy, Music, Musical","James Algar, Samuel Armstrong, Ford Beebe Jr.,...","Joe Grant (story direction), Dick Huemer (stor...","Deems Taylor, Leopold Stokowski, The Philadelp...",A collection of animated interpretations of gr...,...,88600,tt0032455,movie,,"$76,408,097",Walt Disney Productions,,True,,
4,The Reluctant Dragon,1941,Approved,20 Jun 1941,74 min,"Animation, Comedy, Family","Alfred L. Werker, Hamilton Luske, Jack Cutting...","Kenneth Grahame (based on the story by), Ted S...","Robert Benchley, Frances Gifford, Buddy Pepper...",Humorist Robert Benchley learns about the anim...,...,2810,tt0034091,movie,,"$872,000",Walt Disney Productions,,True,,


In [14]:
detail_df.dropna(subset=['Title'], inplace=True)

detail_df['Internet Movie Database Rating'] = detail_df['Ratings'].apply(lambda cells: next((cell['Value'] for cell in cells if cell['Source'] == 'Internet Movie Database'), None))
detail_df['Rotten Tomatoes Rating'] = detail_df['Ratings'].apply(lambda cells: next((cell['Value'] for cell in cells if cell['Source'] == 'Rotten Tomatoes'), None))
detail_df['Metacritic Rating'] = detail_df['Ratings'].apply(lambda cells: next((cell['Value'] for cell in cells if cell['Source'] == 'Metacritic'), None))

detail_df.replace('N/A', None, inplace=True)

detail_df.drop(['Response', 'Website', 'Error', 'totalSeasons', 'Ratings'], axis = 1, inplace=True)
detail_df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 772 entries, 0 to 814
Data columns (total 25 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Title                           772 non-null    object
 1   Year                            772 non-null    object
 2   Rated                           772 non-null    object
 3   Released                        772 non-null    object
 4   Runtime                         772 non-null    object
 5   Genre                           772 non-null    object
 6   Director                        772 non-null    object
 7   Writer                          772 non-null    object
 8   Actors                          772 non-null    object
 9   Plot                            772 non-null    object
 10  Language                        772 non-null    object
 11  Country                         772 non-null    object
 12  Awards                          772 non-null    ob

In [15]:
detail_df.to_csv('Walt_Disney_Movie_Details.csv', index=False)