# DataFrames III: Data Extraction

In [1]:
import pandas as pd

## This Module's Dataset
- This module's dataset is a collection of all James Bond movies.

In [2]:
# NOTE: The budgets are in millions
jb = pd.read_csv("data_files/jamesbond.csv")

## The set_index and reset_index Methods
- The index serves as the collection of primary identifiers/labels/entrypoints for the rows.
- The fastest way to extract a row is from a sorted index by position/label.
- Pandas uses index labels/values when merging different objects together.
- The `set_index` method sets an existing column as the index of the **DataFrame**.
- The `reset_index` method sets the standard ascending numeric index as the index of the **DataFrame**.

It's important to first take a look at what an
is and how its used. An index should be thought
of as the primary identifiers to retrieve rows.

So what is the core identifier? The film's name.
Everything else revolves around that.

In [3]:
jb.head(3)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [4]:
jb.set_index("Film", inplace=True)

jb.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [5]:
jb.reset_index(inplace=True)

jb.head(3)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


---

## Retrieve Rows by Index Position with iloc Accessor
- The `iloc` accessor retrieves one or more rows by index position.
- Provide a pair of square brackets after the accessor.
- `iloc` accepts single values, lists, and slices.

In [6]:
# Series
jb.iloc[0]

Film                        Dr. No
Year                          1962
Actor                 Sean Connery
Director             Terence Young
Box Office                   448.8
Budget                         7.0
Bond Actor Salary              0.6
Name: 0, dtype: object

In [7]:
# str, int, float, etc.
jb.iloc[0]["Film"]

'Dr. No'

In [8]:
# DataFrame
jb.iloc[ [0, 1, 2] ]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [9]:
# Series
jb.iloc[ [0, 1, 2] ]["Film"]

0                   Dr. No
1    From Russia with Love
2               Goldfinger
Name: Film, dtype: object

In [10]:
jb.iloc[24:]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
25,Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,30.0
26,No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0


In [11]:
jb.iloc[:3]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


---

## Retrieve Rows by Index Label with loc Accessor
- The `loc` accessor retrieves one or more rows by index label.
- Provide a pair of square brackets after the accessor.

In [12]:
jb2 = jb.set_index("Film")

jb2.loc[ "GoldenEye" ]

Year                            1995
Actor                 Pierce Brosnan
Director             Martin Campbell
Box Office                     518.5
Budget                          76.9
Bond Actor Salary                5.1
Name: GoldenEye, dtype: object

In [13]:
jb.loc[ jb["Film"] == "GoldenEye" ]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
18,GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1


In [14]:
jb2.loc[ "Casino Royale" ]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3


In [15]:
jb2.loc[ ["Dr. No", "Skyfall", "GoldenEye"] ]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1


In [16]:
jb2.loc["Diamonds Are Forever":"Moonraker"]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,
The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,


---

## Second Arguments to loc and iloc Accessors
- The second value inside the square brackets targets the columns.
- The `iloc` requires numeric positions for rows and columns.
- The `loc` requires labels for rows and columns.

In [17]:
jb2.loc[ "Moonraker", "Actor" ]  # 'Roger Moore'
jb2.loc[ "Moonraker", "Box Office" ]  # 535.0
jb2.loc[ "Moonraker", "Year" ]  # 1979

jb2.loc[ "Moonraker", ["Actor", "Box Office", "Year"] ]

Actor         Roger Moore
Box Office          535.0
Year                 1979
Name: Moonraker, dtype: object

In [18]:
jb2.iloc[ 11, 1 ]  # 'Roger Moore'
jb2.iloc[ 11, 3 ]  # 535.0
jb2.iloc[ 11, 0 ]  # 1979

jb2.iloc[ 11, [1, 3, 0]]

Actor         Roger Moore
Box Office          535.0
Year                 1979
Name: Moonraker, dtype: object

---

## Overwrite Value in a DataFrame
- Use the `iloc` or `loc` accessor on the **DataFrame** to target a value, then provide the equal sign and a new value.

In [19]:
jb2.loc[ "GoldenEye" ][["Year", "Actor", "Box Office"]]

Year                    1995
Actor         Pierce Brosnan
Box Office             518.5
Name: GoldenEye, dtype: object

In [20]:
jb2.loc[ "GoldenEye", "Year" ] = 2000
jb2.loc[ "GoldenEye", "Actor" ] = "Marilyn Monroe"
jb2.loc[ "GoldenEye", "Box Office" ] = 666

In [21]:
jb2.loc[ "GoldenEye" ][["Year", "Actor", "Box Office"]]

Year                    2000
Actor         Marilyn Monroe
Box Office             666.0
Name: GoldenEye, dtype: object

---

##  Overwrite Multiple Values in a DataFrame
- The `replace` method replaces all occurrences of a **Series** value with another value (think of it like "Find and Replace").
- To overwrite multiple values in a **DataFrame**, remember to use an accessor on the **DataFrame** itself.
- Accessors like `loc` and `iloc` can accept Boolean Series. Use them to target the values to overwrite.

In [22]:
jb2.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [23]:
#                        The thing being replaced,   what your replaceing it with.
jb2['Actor'] = jb2['Actor'].replace('Sean Connery', 'Mr. Sean Connery')

jb2.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Mr. Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Mr. Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Mr. Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [24]:
jb_Connery = jb2.loc[ jb2['Actor'] == "Mr. Sean Connery" ]  # Find all the rows with Sean Connery

# NOTE: we need the : to pass in every row from the df
jb_Connery.loc[:, 'Actor' ] = "Sir Sean Connery"

jb_Connery.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sir Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sir Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sir Sean Connery,Guy Hamilton,820.4,18.6,3.2


---

## Rename Index Labels or Columns in a DataFrame
- The `rename` method accepts a dictionary for either its `columns` or `index` parameters.
- The dictionary keys represent the existing names and the values represent the new names.
- We can replace all columns by overwriting the **DataFrame's** `columns` attribute.

In [25]:
jb.columns[[0,-1]]

Index(['Film', 'Bond Actor Salary'], dtype='object')

In [26]:
jb.rename(columns={'Film': 'Moive', 'Bond Actor Salary': 'Actor Salary'}, inplace=True)

jb.columns[[0,-1]]

Index(['Moive', 'Actor Salary'], dtype='object')

In [27]:
jb2.iloc[[0, 18, 20]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Mr. Sean Connery,Terence Young,448.8,7.0,0.6
GoldenEye,2000,Marilyn Monroe,Martin Campbell,666.0,76.9,5.1
The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5


In [28]:
swaps = {
    'Dr. No': 'Dr No',
    'GoldenEye': 'Golden Eye',
    'The World Is Not Enough': 'Best Bond Movie'
}
jb2.rename(index=swaps, inplace=True)

jb2.iloc[[0, 18, 20]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr No,1962,Mr. Sean Connery,Terence Young,448.8,7.0,0.6
Golden Eye,2000,Marilyn Monroe,Martin Campbell,666.0,76.9,5.1
Best Bond Movie,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5


In [29]:
jb.columns

Index(['Moive', 'Year', 'Actor', 'Director', 'Box Office', 'Budget',
       'Actor Salary'],
      dtype='object')

In [30]:
# NOTE: This is only really useful if you want to rename all or most columns.
# This is because the length of the rename MUST be the same of the amount of columns.

jb.columns = ['Movie Name', 'Year', 'Bond Guy', 'Camera Dude', 'Revenue', 'Cost', 'Bond Salary']

jb.columns

Index(['Movie Name', 'Year', 'Bond Guy', 'Camera Dude', 'Revenue', 'Cost',
       'Bond Salary'],
      dtype='object')

---

## Delete Rows or Columns from a DataFrame
- The `drop` method deletes one or more rows/columns from a **DataFrame**.
- Pass the `index` or `columns` parameters a list of the column names to remove.
- The `pop` method removes and returns a single **Series** (it mutates the **DataFrame** in the process).
- Python's `del` keyword also removes a single **Series**.

In [31]:
jb2.drop(columns=["Box Office", "Budget"]).head(3)

Unnamed: 0_level_0,Year,Actor,Director,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Dr No,1962,Mr. Sean Connery,Terence Young,0.6
From Russia with Love,1963,Mr. Sean Connery,Terence Young,1.6
Goldfinger,1964,Mr. Sean Connery,Guy Hamilton,3.2


In [32]:
jb2.drop(index=['Goldfinger']).head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr No,1962,Mr. Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Mr. Sean Connery,Terence Young,543.8,12.6,1.6
Thunderball,1965,Mr. Sean Connery,Terence Young,848.1,41.9,4.7


In [33]:
jb2.drop(index=['Goldfinger'], columns=['Box Office', 'Budget']).head(3)

Unnamed: 0_level_0,Year,Actor,Director,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Dr No,1962,Mr. Sean Connery,Terence Young,0.6
From Russia with Love,1963,Mr. Sean Connery,Terence Young,1.6
Thunderball,1965,Mr. Sean Connery,Terence Young,4.7


In [34]:
# pop will remove a single column
# NOTE: This will return the column that was deleted as a series, and
#       will remove the column from the df without needing to specify it.
jb2.pop('Actor')

Film
Dr No                              Mr. Sean Connery
From Russia with Love              Mr. Sean Connery
Goldfinger                         Mr. Sean Connery
Thunderball                        Mr. Sean Connery
Casino Royale                           David Niven
You Only Live Twice                Mr. Sean Connery
On Her Majesty's Secret Service      George Lazenby
Diamonds Are Forever               Mr. Sean Connery
Live and Let Die                        Roger Moore
The Man with the Golden Gun             Roger Moore
The Spy Who Loved Me                    Roger Moore
Moonraker                               Roger Moore
For Your Eyes Only                      Roger Moore
Never Say Never Again              Mr. Sean Connery
Octopussy                               Roger Moore
A View to a Kill                        Roger Moore
The Living Daylights                 Timothy Dalton
Licence to Kill                      Timothy Dalton
Golden Eye                           Marilyn Monroe
Tomorro

In [35]:
# del is a python keyword and will do the same thing
del jb2['Year']

jb2.head(3)

Unnamed: 0_level_0,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Dr No,Terence Young,448.8,7.0,0.6
From Russia with Love,Terence Young,543.8,12.6,1.6
Goldfinger,Guy Hamilton,820.4,18.6,3.2


---

## Create Random Sample with the sample Method
- The `sample` method returns a specified one or more random rows from the **DataFrame**.
- Customize the `axis` parameter to extract random columns.

In [36]:
# This will give you a random row
jb.sample()

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
11,Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,


In [37]:
# NOTE: They will not be in order
# It will literally be 5 random rows
jb.sample(n=3)

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
15,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
14,Octopussy,1983,Roger Moore,John Glen,373.8,53.9,7.8


In [38]:
jb.sample(axis="columns").head()

Unnamed: 0,Cost
0,7.0
1,12.6
2,18.6
3,41.9
4,85.0


In [39]:
jb.sample(n=3, axis="columns").head()

Unnamed: 0,Bond Guy,Camera Dude,Movie Name
0,Sean Connery,Terence Young,Dr. No
1,Sean Connery,Terence Young,From Russia with Love
2,Sean Connery,Guy Hamilton,Goldfinger
3,Sean Connery,Terence Young,Thunderball
4,David Niven,Ken Hughes,Casino Royale


---

## The nsmallest and nlargest Methods
- The `nlargest` method returns a specified number of rows with the largest values from a given column.
- The `nsmallest` method returns rows with the smallest values from a given column.
- The `nlargest` and `nsmallest` methods are more efficient than sorting the entire **DataFrame**.

In [40]:
# Return the 3 films with the highest box office

# Soy boy way
jb.sort_values(by='Revenue', ascending=False)[:3]

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [41]:
# The giga chad way

jb.nlargest(n=3, columns='Revenue')

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [42]:
# Vise versa for the lowest

jb.nsmallest(n=3, columns='Revenue')

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
17,Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
15,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6


---

## Filtering with the where Method
- Similar to square brackets or `loc`, the `where` method filters the original `DataFrame` with a Boolean Series.
- Pandas will populate rows that do **not** match the criteria with `NaN` values.
- Leaving in the `NaN` values can be advantageous for certain merge and visualization operations.

In [44]:
# Single arg
jb.where( jb['Bond Guy'] == 'Sean Connery' )

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


In [47]:
# Multi OR arg
jb.where( (jb['Camera Dude'] == 'Guy Hamilton') | (jb['Bond Guy'] == 'Sean Connery') )

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
0,Dr. No,1962.0,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963.0,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964.0,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965.0,Sean Connery,Terence Young,848.1,41.9,4.7
4,,,,,,,
5,You Only Live Twice,1967.0,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
6,,,,,,,
7,Diamonds Are Forever,1971.0,Sean Connery,Guy Hamilton,442.5,34.7,5.8
8,Live and Let Die,1973.0,Roger Moore,Guy Hamilton,460.3,30.8,
9,The Man with the Golden Gun,1974.0,Roger Moore,Guy Hamilton,334.0,27.7,


In [50]:
# Multi AND arg
jb.where( (jb['Camera Dude'] == 'Guy Hamilton') & (jb['Bond Guy'] == 'Sean Connery') )

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary
0,,,,,,,
1,,,,,,,
2,Goldfinger,1964.0,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,,,,,,,
4,,,,,,,
5,,,,,,,
6,,,,,,,
7,Diamonds Are Forever,1971.0,Sean Connery,Guy Hamilton,442.5,34.7,5.8
8,,,,,,,
9,,,,,,,


---

## The apply Method with DataFrames
- The `apply` method invokes a function on every column or every row in the **DataFrame**.
- Pass the uninvoked function as the first argument to the `apply` method.
- Pass the `axis` parameter an argument of `"columns"` to invoke the function on every row.
- Pandas will pass in the row's values as a **Series** object. We can use accessors like `loc` and `iloc` to extract the column's values for that row.

In [64]:
jb.columns

Index(['Movie Name', 'Year', 'Bond Guy', 'Camera Dude', 'Revenue', 'Cost',
       'Bond Salary'],
      dtype='object')

In [60]:
jb2['Director'].apply(len).head()

Film
Dr No                    13
From Russia with Love    13
Goldfinger               12
Thunderball              13
Casino Royale            10
Name: Director, dtype: int64

In [82]:
# Custom functions

"""
Conditions:
80s movie        -> 'Great 80s flick'
Dir = John Glen  -> 'Best bond ever'
budget > 100m    -> 'expensive af'
others           -> 'no comment'
"""

def movie_comment(row):
    if row['Year'] > 1980 and row['Year'] < 1989:
        row['Comment'] = 'Great 80s flick'

    elif row['Camera Dude'] == 'John Glen':
        row['Comment'] = 'Best Bond Ever'

    elif row['Cost'] > 100:
        row['Comment'] = 'Expensive af'

    else:
        row['Comment'] = 'No comment'

    return row

In [83]:
jb.apply(movie_comment, axis='columns')

Unnamed: 0,Movie Name,Year,Bond Guy,Camera Dude,Revenue,Cost,Bond Salary,Comment
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6,No comment
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6,No comment
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2,No comment
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7,No comment
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,,No comment
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4,No comment
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6,No comment
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8,No comment
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,,No comment
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,,No comment
