# Setup and Analysis

In [10]:
import pandas as pd
awards = pd.read_csv("academy_awards.csv",encoding="ISO-8859-1")
print(awards.head(2))


          Year               Category        Nominee  \
0  2010 (83rd)  Actor -- Leading Role  Javier Bardem   
1  2010 (83rd)  Actor -- Leading Role   Jeff Bridges   

                 Additional Info Won? Unnamed: 5 Unnamed: 6 Unnamed: 7  \
0             Biutiful {'Uxbal'}   NO        NaN        NaN        NaN   
1  True Grit {'Rooster Cogburn'}   NO        NaN        NaN        NaN   

  Unnamed: 8 Unnamed: 9 Unnamed: 10  
0        NaN        NaN         NaN  
1        NaN        NaN         NaN  


In [11]:
awards["Unnamed: 10"].value_counts()

*    1
Name: Unnamed: 10, dtype: int64

In [12]:
awards[["Category","Additional Info"]]

Unnamed: 0,Category,Additional Info
0,Actor -- Leading Role,Biutiful {'Uxbal'}
1,Actor -- Leading Role,True Grit {'Rooster Cogburn'}
2,Actor -- Leading Role,The Social Network {'Mark Zuckerberg'}
3,Actor -- Leading Role,The King's Speech {'King George VI'}
4,Actor -- Leading Role,127 Hours {'Aron Ralston'}
5,Actor -- Supporting Role,The Fighter {'Dicky Eklund'}
6,Actor -- Supporting Role,Winter's Bone {'Teardrop'}
7,Actor -- Supporting Role,The Town {'James Coughlin'}
8,Actor -- Supporting Role,The Kids Are All Right {'Paul'}
9,Actor -- Supporting Role,The King's Speech {'Lionel Logue'}


In [13]:
awards["Category"].value_counts()

Writing                                                        888
Music (Scoring)                                                748
Cinematography                                                 572
Art Direction                                                  552
Best Picture                                                   485
Sound                                                          460
Short Film (Live Action)                                       434
Scientific and Technical (Technical Achievement Award)         428
Music (Song)                                                   413
Actress -- Leading Role                                        411
Directing                                                      410
Actor -- Leading Role                                          408
Film Editing                                                   385
Costume Design                                                 384
Actress -- Supporting Role                                    

## Cleaning and Filtering

In [14]:
awards["Year"] = awards["Year"].str[0:4].astype("int64")
awards["Year"].head(2)

0    2010
1    2010
Name: Year, dtype: int64

In [15]:
later_than_2000 = awards[awards["Year"] > 2000]
award_categories = ["Actor -- Leading Role","Actor -- Supporting Role","Actress -- Leading Role","Actress -- Supporting Role"]
nominations = later_than_2000[later_than_2000["Category"].isin(award_categories)]
won_mapping = {"NO": 0, "YES": 1}
nominations["Won"] = nominations["Won?"].map(won_mapping)
final_nominations = nominations.drop(["Won?","Unnamed: 5","Unnamed: 6","Unnamed: 7","Unnamed: 8","Unnamed: 9","Unnamed: 10"],axis=1)
final_nominations.head(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Year,Category,Nominee,Additional Info,Won
0,2010,Actor -- Leading Role,Javier Bardem,Biutiful {'Uxbal'},0
1,2010,Actor -- Leading Role,Jeff Bridges,True Grit {'Rooster Cogburn'},0


In [16]:
additional_info_one = final_nominations["Additional Info"].str.rstrip("'}")
additional_info_two = additional_info_one.str.split("\ {'")
movie_names = additional_info_two.str[0]
characters = additional_info_two.str[1]
final_nominations["Movie"] = movie_names
final_nominations["Character"] = characters
final_nominations = final_nominations.drop("Additional Info",axis=1)
final_nominations.head()

Unnamed: 0,Year,Category,Nominee,Won,Movie,Character
0,2010,Actor -- Leading Role,Javier Bardem,0,Biutiful,Uxbal
1,2010,Actor -- Leading Role,Jeff Bridges,0,True Grit,Rooster Cogburn
2,2010,Actor -- Leading Role,Jesse Eisenberg,0,The Social Network,Mark Zuckerberg
3,2010,Actor -- Leading Role,Colin Firth,1,The King's Speech,King George VI
4,2010,Actor -- Leading Role,James Franco,0,127 Hours,Aron Ralston


## Exporting to SQLite

In [17]:
import sqlite3
conn = sqlite3.connect("nominations.db")
final_nominations.to_sql("nominations",conn,index=False)
conn.close()

Query the Database

In [18]:
import sqlite3
conn = sqlite3.connect("nominations.db")
print(conn.execute("PRAGMA table_info(nominations)").fetchall())

[(0, 'Year', 'INTEGER', 0, None, 0), (1, 'Category', 'TEXT', 0, None, 0), (2, 'Nominee', 'TEXT', 0, None, 0), (3, 'Won', 'INTEGER', 0, None, 0), (4, 'Movie', 'TEXT', 0, None, 0), (5, 'Character', 'TEXT', 0, None, 0)]


In [19]:
print(conn.execute("SELECT * FROM nominations LIMIT 10").fetchall())

[(2010, 'Actor -- Leading Role', 'Javier Bardem', 0, 'Biutiful', 'Uxbal'), (2010, 'Actor -- Leading Role', 'Jeff Bridges', 0, 'True Grit', 'Rooster Cogburn'), (2010, 'Actor -- Leading Role', 'Jesse Eisenberg', 0, 'The Social Network', 'Mark Zuckerberg'), (2010, 'Actor -- Leading Role', 'Colin Firth', 1, "The King's Speech", 'King George VI'), (2010, 'Actor -- Leading Role', 'James Franco', 0, '127 Hours', 'Aron Ralston'), (2010, 'Actor -- Supporting Role', 'Christian Bale', 1, 'The Fighter', 'Dicky Eklund'), (2010, 'Actor -- Supporting Role', 'John Hawkes', 0, "Winter's Bone", 'Teardrop'), (2010, 'Actor -- Supporting Role', 'Jeremy Renner', 0, 'The Town', 'James Coughlin'), (2010, 'Actor -- Supporting Role', 'Mark Ruffalo', 0, 'The Kids Are All Right', 'Paul'), (2010, 'Actor -- Supporting Role', 'Geoffrey Rush', 0, "The King's Speech", 'Lionel Logue')]


In [22]:
conn.close()

# Further Analysis

The awards categories in older ceremonies were different than the ones we have today. What relevant information should we keep from older ceremonies?

In [26]:
awards[awards["Year"] < 1960]["Category"].value_counts()


Music (Scoring)                                                379
Writing                                                        378
Cinematography                                                 282
Art Direction                                                  262
Best Picture                                                   220
Short Film (Live Action)                                       217
Sound                                                          205
Music (Song)                                                   165
Actress -- Leading Role                                        156
Directing                                                      155
Actor -- Leading Role                                          153
Film Editing                                                   130
Short Film (Animated)                                          126
Actress -- Supporting Role                                     120
Actor -- Supporting Role                                      

In [29]:
awards[awards["Category"].str.contains("archaic")]

Unnamed: 0,Year,Category,Nominee,Additional Info,Won?,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10
6075,1962,Special Effects (archaic category),The Longest Day,Visual Effects by Robert MacDonald; Audible Ef...,YES,,,,,,
6076,1962,Special Effects (archaic category),Mutiny on the Bounty,Visual Effects by A. Arnold Gillespie; Audible...,NO,,,,,,
6204,1961,Special Effects (archaic category),The Absent Minded Professor,"Visual Effects by Robert A. Mattey, Eustace Ly...",NO,,,,,,
6205,1961,Special Effects (archaic category),The Guns of Navarone,Visual Effects by Bill Warrington; Audible Eff...,YES,,,,,,
6331,1960,Special Effects (archaic category),The Last Voyage,Visual Effects by A.J. Lohman,NO,,,,,,
6332,1960,Special Effects (archaic category),The Time Machine,"Visual Effects by Gene Warren, Tim Baar",YES,,,,,,
6457,1959,Special Effects (archaic category),Ben-Hur,"Visual Effects by A. Arnold Gillespie, Robert ...",YES,,,,,,
6458,1959,Special Effects (archaic category),Journey to the Center of the Earth,"Visual Effects by L. B. Abbott, James B. Gordo...",NO,,,,,,
6572,1958,Special Effects (archaic category),tom thumb,Visual Effects by Tom Howard,YES,,,,,,
6573,1958,Special Effects (archaic category),Torpedo Run,Visual Effects by A. Arnold Gillespie; Audible...,NO,,,,,,


I believe we can keep all of the information that we have. Some older categories seem to be missing information on what movie was associated with the nomination, so we don't have  that.'

## Formatting in "Additional Info" column

The nominations for the Art Direction category have lengthy values for Additional Info. What information is useful and how do we extract it?

In [33]:
awards[awards["Category"] == "Art Direction"]["Additional Info"]

23       Production Design: Robert Stromberg; Set Decor...
24       Production Design: Stuart Craig; Set Decoratio...
25       Production Design: Guy Hendrix Dyas; Set Decor...
26       Production Design: Eve Stewart; Set Decoration...
27       Production Design: Jess Gonchor; Set Decoratio...
160      Production Design: Rick Carter and Robert Stro...
161      Production Design: Dave Warren and Anastasia M...
162      Production Design: John Myhre; Set Decoration:...
163      Production Design: Sarah Greenwood; Set Decora...
164      Production Design: Patrice Vermette; Set Decor...
298      Art Direction: James J. Murakami; Set Decorati...
299      Art Direction: Donald Graham Burt; Set Decorat...
300      Art Direction: Nathan Crowley; Set Decoration:...
301      Art Direction: Michael Carlin; Set Decoration:...
302      Art Direction: Kristi Zea; Set Decoration: Deb...
416      Art Direction: Arthur Max; Set Decoration: Bet...
417      Art Direction: Sarah Greenwood; Set Decoration.

There is information on what people were nominated for the award and what their role was in the art direction. In a separate table we could keep this information for each film, having a column for each role listed and leaving nulls when a specific nomination did not have that role listed. For older films there is  just one person listed and we could use the role of "Art Director" for this. This table could then be linked to a table of the nominations.

### ____
Many values in Additional Info don't contain the character name the actor or actress played. Should we toss out character name altogether as we expand our data? What tradeoffs do we make by doing so?

We will need to drop character name as we expand to other categories. In order to keep this information we could store it a separate table as noted above. We'll need a separate Additional Info column for each category in our original data. This greatly increases the complexity of the database and the length of our queries because of the joins that will be necessary.

### _____

What's the best way to handle awards ceremonies that included movies from 2 years?
E.g. see 1927/28 (1st) in the Year column.

We could create a ceremonies table to keep track of this information and link this to a new column in our nominations table that  lists the ceremony (and possibly get rid of "Year" in nominations).

### ___________

