# DataFrames III: Data Extraction

In [1]:
import pandas as pd

## This Module's Dataset

   - This module's dataset is a collection of all James Bond movies.

In [2]:
bond = pd.read_csv("jamesbond.csv")
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## 1. The `set_index` and `reset_index` Methods

- The index serves as the collection of **primary identifiers**/labels/entry points for the rows.
- The **fastest way** to extract a row is from a sorted index by position/label.
- Pandas uses index labels/values when **merging** different objects together.
- The `set_index` method sets an existing column as the index of the DataFrame.
- The `reset_index` method sets the standard ascending numeric index as the index of the DataFrame.


In [3]:
bond = bond.set_index("Film") # makes column into index label
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [4]:
bond.reset_index() # takes old index label column, puts it back as normal colummn in Df, and assigns it default numeric number
bond.reset_index(drop = True) # does not retain old index label column, and assigns default numeric number

Unnamed: 0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,1967,David Niven,Ken Hughes,315.0,85.0,
5,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
6,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
7,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
8,1973,Roger Moore,Guy Hamilton,460.3,30.8,
9,1974,Roger Moore,Guy Hamilton,334.0,27.7,


In [5]:

bond.reset_index().sort_values(by = "Box Office", ascending = False).set_index("Box Office") # we can also combine: first put current index label back in Df, then set other column as index label

Unnamed: 0_level_0,Film,Year,Actor,Director,Budget,Bond Actor Salary
Box Office,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
943.5,Skyfall,2012,Daniel Craig,Sam Mendes,170.2,14.5
848.1,Thunderball,1965,Sean Connery,Terence Young,41.9,4.7
820.4,Goldfinger,1964,Sean Connery,Guy Hamilton,18.6,3.2
726.7,Spectre,2015,Daniel Craig,Sam Mendes,206.3,
581.5,Casino Royale,2006,Daniel Craig,Martin Campbell,145.3,3.3
543.8,From Russia with Love,1963,Sean Connery,Terence Young,12.6,1.6
535.0,Moonraker,1979,Roger Moore,Lewis Gilbert,91.5,
533.0,The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,45.1,
518.5,GoldenEye,1995,Pierce Brosnan,Martin Campbell,76.9,5.1
514.2,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,59.9,4.4


## 2. Retrieve Rows by Index Position with `iloc` Accessor

- The `iloc` accessor retrieves one or more rows by index position.
- Provide a pair of square brackets after the accessor.
- `iloc` accepts single values, lists, and slices.


In [6]:
# combine row and column indexing by giving 2nd argument

bond.loc["Diamonds Are Forever", "Director"] # specify where to look for row (first arg) & which column (second arg)
bond.loc[["Diamonds Are Forever", "GoldenEye"], "Director"] # look into multiple rows by putting inside list
bond.loc[["Diamonds Are Forever", "GoldenEye"], "Director":"Budget"] #look into 2 rows and from 1 column to another one

Unnamed: 0_level_0,Director,Box Office,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Diamonds Are Forever,Guy Hamilton,442.5,34.7
GoldenEye,Martin Campbell,518.5,76.9


In [7]:
# works same for iloc, both rows and columns have numeric indices

bond.iloc[0,2]
bond.iloc[[2,7],:3]

Unnamed: 0_level_0,Year,Actor,Director
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Goldfinger,1964,Sean Connery,Guy Hamilton
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton


## 3. Overwrite Value in a DataFrame

- Use the `iloc` or `loc` accessor on the DataFrame to target a value, then provide the equal sign and a new value.

In [8]:
# simply add an equal sign and a new value to assign a new value

bond.loc["Diamonds Are Forever", "Director"] = "Joske van op't hoekse"
bond

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
Diamonds Are Forever,1971,Sean Connery,Joske van op't hoekse,442.5,34.7,5.8
Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


## 4. Overwrite Multiple Values in a DataFrame

- The `replace` method replaces all occurrences of a Series value with another value (think of it like "Find and Replace").
- To overwrite multiple values in a DataFrame, remember to use an accessor on the DataFrame itself.
- Accessors like `loc` and `iloc` can accept Boolean Series. Use them to target the values to overwrite.


In [9]:
bond = pd.read_csv("jamesbond.csv")
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [10]:
# if we want to replace all values of "Sean" by "Sir", there are several ways to go:

# 1) use replace() method & assign

bond["Actor"] = bond["Actor"].replace("Sean Connery", "Sir Connery")
bond

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sir Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sir Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sir Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sir Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
5,You Only Live Twice,1967,Sir Connery,Lewis Gilbert,514.2,59.9,4.4
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
7,Diamonds Are Forever,1971,Sir Connery,Guy Hamilton,442.5,34.7,5.8
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


In [11]:
# 2) use .loc and boolean series

bond = pd.read_csv("jamesbond.csv")
is_sean_connery = bond["Actor"] == "Sean Connery"
bond.loc[is_sean_connery, "Actor"] = "Sir Sean Connery"
bond

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sir Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sir Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sir Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sir Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
5,You Only Live Twice,1967,Sir Sean Connery,Lewis Gilbert,514.2,59.9,4.4
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
7,Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


## 5. Rename Index Labels or Columns in a DataFrame

- The `rename` method accepts a dictionary for either its `columns` or `index` parameters.
- The dictionary keys represent the existing names, and the values represent the new names.
- We can replace all columns by overwriting the DataFrame's `columns` attribute.

In [12]:
# load dataset

bond = pd.read_csv("jamesbond.csv", index_col = "Film").sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [13]:
bond = bond.rename(columns={"Year": "Year of Release", "Box Office":"Revenue"}).head() # to replace names of columns: use dict, key = current name, value = new name
bond

Unnamed: 0_level_0,Year of Release,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [14]:
# change names of rows, works the same, but with parameter index

bond = bond.rename(index = {"A View to a Kill": "View", "Casino Royale": "Casino"})
bond

Unnamed: 0_level_0,Year of Release,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
View,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


## 6. Delete Rows or Columns from a DataFrame

- The `drop` method deletes one or more rows/columns from a DataFrame.
- Pass the `index` or `columns` parameters a list of the column names to remove.
- The `pop` method removes and returns a single Series (it mutates the DataFrame in the process).
- Python's `del` keyword also removes a single Series.


In [15]:
# load dataset

bond = pd.read_csv("jamesbond.csv", index_col = "Film").sort_index()
bond


Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [16]:
# 3 ways to remove

# a) columns and rows: .drop() method

bond.drop(columns = ["Actor", "Bond Actor Salary"]) # remove selected columns 
bond.drop(index = ["From Russia with Love", "Casino Royale"]) # remove selected rows (if name occurs multiple times, all instances will be removed)
bond.drop(index = ["From Russia with Love", "Casino Royale"], columns = ["Actor", "Bond Actor Salary"]) # remove both selected columns and rows

Unnamed: 0_level_0,Year,Director,Box Office,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A View to a Kill,1985,John Glen,275.2,54.5
Diamonds Are Forever,1971,Guy Hamilton,442.5,34.7
Die Another Day,2002,Lee Tamahori,465.4,154.2
Dr. No,1962,Terence Young,448.8,7.0
For Your Eyes Only,1981,John Glen,449.4,60.2
GoldenEye,1995,Martin Campbell,518.5,76.9
Goldfinger,1964,Guy Hamilton,820.4,18.6
Licence to Kill,1989,John Glen,250.9,56.7
Live and Let Die,1973,Guy Hamilton,460.3,30.8
Moonraker,1979,Lewis Gilbert,535.0,91.5


In [17]:
# b) columns: .pop() method

bond.pop("Actor") # targets single column; gives back a Series, deletes Series from Df & changes Df in-place

Film
A View to a Kill                      Roger Moore
Casino Royale                        Daniel Craig
Casino Royale                         David Niven
Diamonds Are Forever                 Sean Connery
Die Another Day                    Pierce Brosnan
Dr. No                               Sean Connery
For Your Eyes Only                    Roger Moore
From Russia with Love                Sean Connery
GoldenEye                          Pierce Brosnan
Goldfinger                           Sean Connery
Licence to Kill                    Timothy Dalton
Live and Let Die                      Roger Moore
Moonraker                             Roger Moore
Never Say Never Again                Sean Connery
Octopussy                             Roger Moore
On Her Majesty's Secret Service    George Lazenby
Quantum of Solace                    Daniel Craig
Skyfall                              Daniel Craig
Spectre                              Daniel Craig
The Living Daylights               Timothy Da

In [18]:
bond # actor column is deleted

Unnamed: 0_level_0,Year,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A View to a Kill,1985,John Glen,275.2,54.5,9.1
Casino Royale,2006,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,John Glen,449.4,60.2,
From Russia with Love,1963,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Guy Hamilton,820.4,18.6,3.2


In [19]:
# c) columns: del keyword

del bond["Year"] # just deletes column, does not give it back to you as a Series

In [20]:
bond # Year colummn is deleted

Unnamed: 0_level_0,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A View to a Kill,John Glen,275.2,54.5,9.1
Casino Royale,Martin Campbell,581.5,145.3,3.3
Casino Royale,Ken Hughes,315.0,85.0,
Diamonds Are Forever,Guy Hamilton,442.5,34.7,5.8
Die Another Day,Lee Tamahori,465.4,154.2,17.9
Dr. No,Terence Young,448.8,7.0,0.6
For Your Eyes Only,John Glen,449.4,60.2,
From Russia with Love,Terence Young,543.8,12.6,1.6
GoldenEye,Martin Campbell,518.5,76.9,5.1
Goldfinger,Guy Hamilton,820.4,18.6,3.2


## 7. Create Random Sample with the `sample` Method

- The `sample` method returns a specified one or more random rows from the DataFrame.
- Customize the `axis` parameter to extract random columns.


In [21]:
# load dataset

bond = pd.read_csv("jamesbond.csv", index_col = "Film").sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [22]:
bond.sample() # pull out a random row 

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Tomorrow Never Dies,1997,Pierce Brosnan,Roger Spottiswoode,463.2,133.9,10.0


In [23]:
bond.sample(n=5) # pull out 5 rows at random

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Octopussy,1983,Roger Moore,John Glen,373.8,53.9,7.8
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,


In [24]:
# we can also use to randomly extract columns

bond.sample(n=3, axis = 0) # pull out 3 rows at random: axis = 0 is default value
bond.sample(n=3, axis = 1) # pull out 3 rows at random: we can set to axis = 1

Unnamed: 0_level_0,Box Office,Budget,Year
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,275.2,54.5,1985
Casino Royale,581.5,145.3,2006
Casino Royale,315.0,85.0,1967
Diamonds Are Forever,442.5,34.7,1971
Die Another Day,465.4,154.2,2002
Dr. No,448.8,7.0,1962
For Your Eyes Only,449.4,60.2,1981
From Russia with Love,543.8,12.6,1963
GoldenEye,518.5,76.9,1995
Goldfinger,820.4,18.6,1964


## 8. The `nsmallest` and `nlargest` Methods

- The `nlargest` method returns a specified number of rows with the largest values from a given column.
- The `nsmallest` method returns rows with the smallest values from a given column.
- The `nlargest` and `nsmallest` methods are more efficient than sorting the entire DataFrame.


In [25]:
# load dataset

bond = pd.read_csv("jamesbond.csv", index_col = "Film").sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [26]:
# Retrieve 4 films with largest Box Office gross

# Way 1: use sort_values 
bond.sort_values("Box Office", ascending = False).head(4)
                 
# Way 2: n-largest (computationally more efficient)
bond.nlargest(n=4, columns = "Box Office") # you can also provide more than 1 column: first sort by first column, for same values in that column, sort on 2nd column etc

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,


In [27]:
# works in same way as nlargest
bond.nsmallest(n=3, columns = "Bond Actor Salary")

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6


## 9. Filtering with the `where` Method

Another filtering method, next to `[ ]` and `loc`: 

- Similar to square brackets or `loc`, the `where` method filters the original DataFrame with a Boolean Series.
- Different from square brackets and `loc` Pandas will populate rows that do not match the criteria with `NaN` values.
- Leaving in the `NaN` values can be advantageous for certain merge and visualization operations.


In [28]:
# load dataset

bond = pd.read_csv("jamesbond.csv", index_col = "Film").sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [29]:
# 3 different ways of filtering:

# create Boolean Series
actor_is_sean_connery = bond["Actor"] == "Sean Connery"

# way 1: subset with square brackets
bond[actor_is_sean_connery]

# way 2: use .loc[]
bond.loc[actor_is_sean_connery]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


In [30]:
# way 3: use .where-method
bond.where(actor_is_sean_connery) # pass in boolean Series as first parameter: 'condition'
# difference in output: all rows retained, but filled with NaN if not applicable

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,,,,,,
Casino Royale,,,,,,
Casino Royale,,,,,,
Diamonds Are Forever,1971.0,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,,,,,,
Dr. No,1962.0,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,,,,,,
From Russia with Love,1963.0,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,,,,,,
Goldfinger,1964.0,Sean Connery,Guy Hamilton,820.4,18.6,3.2


## 10. The `apply` Method with DataFrames

- The `apply` method invokes a function on every column or every row in the DataFrame.
- Pass the uninvoked function as the first argument to the `apply` method.
- Pass the `axis` parameter an argument of `"columns"` to invoke the function on every row.
- Pandas will pass in the row's values as a Series object. We can use accessors like `loc` and `iloc` to extract the column's values for that row.


In [31]:
# load dataset

bond = pd.read_csv("jamesbond.csv", index_col = "Film").sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [32]:
# apply method takes a function as an argument
# when invoked on a Series apply method simply calls function on every value
bond["Actor"].apply(len)

Film
A View to a Kill                   11
Casino Royale                      12
Casino Royale                      11
Diamonds Are Forever               12
Die Another Day                    14
Dr. No                             12
For Your Eyes Only                 11
From Russia with Love              12
GoldenEye                          14
Goldfinger                         12
Licence to Kill                    14
Live and Let Die                   11
Moonraker                          11
Never Say Never Again              12
Octopussy                          11
On Her Majesty's Secret Service    14
Quantum of Solace                  12
Skyfall                            12
Spectre                            12
The Living Daylights               14
The Man with the Golden Gun        11
The Spy Who Loved Me               11
The World Is Not Enough            14
Thunderball                        12
Tomorrow Never Dies                14
You Only Live Twice                12
Name: A

In [33]:
# when invoked on Dataframe apply method can call it on every row or every column of values

bond.apply(print,axis=1)

Year                        1985
Actor                Roger Moore
Director               John Glen
Box Office                 275.2
Budget                      54.5
Bond Actor Salary            9.1
Name: A View to a Kill, dtype: object
Year                            2006
Actor                   Daniel Craig
Director             Martin Campbell
Box Office                     581.5
Budget                         145.3
Bond Actor Salary                3.3
Name: Casino Royale, dtype: object
Year                        1967
Actor                David Niven
Director              Ken Hughes
Box Office                 315.0
Budget                      85.0
Bond Actor Salary            NaN
Name: Casino Royale, dtype: object
Year                         1971
Actor                Sean Connery
Director             Guy Hamilton
Box Office                  442.5
Budget                       34.7
Bond Actor Salary             5.8
Name: Diamonds Are Forever, dtype: object
Year                        

Film
A View to a Kill                   None
Casino Royale                      None
Casino Royale                      None
Diamonds Are Forever               None
Die Another Day                    None
Dr. No                             None
For Your Eyes Only                 None
From Russia with Love              None
GoldenEye                          None
Goldfinger                         None
Licence to Kill                    None
Live and Let Die                   None
Moonraker                          None
Never Say Never Again              None
Octopussy                          None
On Her Majesty's Secret Service    None
Quantum of Solace                  None
Skyfall                            None
Spectre                            None
The Living Daylights               None
The Man with the Golden Gun        None
The Spy Who Loved Me               None
The World Is Not Enough            None
Thunderball                        None
Tomorrow Never Dies                