# DataFrames III: Data Extraction

In [1]:
import pandas as pd

## This Module's Dataset
- This module's dataset is a collection of all James Bond movies.

In [3]:
bond= pd.read_csv('jamesbond.csv')
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## The set_index and reset_index Methods
- The index serves as the collection of primary identifiers/labels/entrypoints for the rows.
- The fastest way to extract a row is from a sorted index by position/label.
- Pandas uses index labels/values when merging different objects together.
- The `set_index` method sets an existing column as the index of the **DataFrame**.
- The `reset_index` method sets the standard ascending numeric index as the index of the **DataFrame**.

In [None]:
# in general, is always easier to find a value inside a sorted by index collection than by an unsorted one

# and that's why whener people find the proper index column for a given dataframe, they usually sort the whole set of data around it

In [5]:
bond= pd.read_csv('jamesbond.csv', index_col='Film') # one way to do it
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [13]:
# another way to do it using the set index method
bond= pd.read_csv('jamesbond.csv')

bond.set_index('Film') # this yields a new dataframe

# to make changes permanent, we can do by 2 manners:
#bond.set_index('Film', inplace= True)
bond= bond.set_index('Film')

In [17]:
bond.reset_index()
bond.reset_index(drop= True).head()

Unnamed: 0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,1967,David Niven,Ken Hughes,315.0,85.0,


In [20]:
#bond.set_index('Year') 
# if we directly apply the above code we'll lose the previous defined index

bond= bond.reset_index().set_index('Year')
bond.head()

Unnamed: 0_level_0,Film,Actor,Director,Box Office,Budget,Bond Actor Salary
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1962,Dr. No,Sean Connery,Terence Young,448.8,7.0,0.6
1963,From Russia with Love,Sean Connery,Terence Young,543.8,12.6,1.6
1964,Goldfinger,Sean Connery,Guy Hamilton,820.4,18.6,3.2
1965,Thunderball,Sean Connery,Terence Young,848.1,41.9,4.7
1967,Casino Royale,David Niven,Ken Hughes,315.0,85.0,


## Retrieve Rows by Index Position with iloc Accessor
- The `iloc` accessor retrieves one or more rows by index position.
- Provide a pair of square brackets after the accessor.
- `iloc` accepts single values, lists, and slices.

- Pandas will always take care of the ordered number identifiers of the pandas object, even if they are not visible depending on the index choice

In [26]:
bond= pd.read_csv('jamesbond.csv')
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [33]:
bond.iloc[5]
bond.iloc[[15, 20]]
bond.iloc[4:8]
bond.iloc[:6]
bond.iloc[20:]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
20,The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5
21,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
22,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
23,Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
25,Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,30.0
26,No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0


In [34]:
bond.dtypes

Film                  object
Year                   int64
Actor                 object
Director              object
Box Office           float64
Budget               float64
Bond Actor Salary    float64
dtype: object

## Retrieve Rows by Index Label with loc Accessor
- The `loc` accessor retrieves one or more rows by index label.
- Provide a pair of square brackets after the accessor.

- This method must be used whenever we assign a custom index label to our dataframe

In [35]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film')
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [48]:
bond.loc['Goldfinger']
bond.loc[['GoldenEye']]
bond.loc['Casino Royale'] # our indexes must ideally be unique, but that's not always true

#bond.loc['Sacred Bond']

bond.loc[['Octopussy', 'Moonraker']]
bond.loc[[ 'Moonraker', 'Octopussy']]
bond.loc[[ 'Moonraker', 'Octopussy', 'Casino Royale']]
bond.loc['Diamonds Are Forever': 'Moonraker'] # with loc method, th final selected index is included in the query

bond.loc['GoldenEye':]
bond.loc[:"On Her Majesty's Secret Service"]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6


## ChatGPT problem set

In [51]:
bond= pd.read_csv('jamesbond.csv')
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [61]:
#1) Use .loc to filter rows where the Year is greater than or equal to 2000. Return only the Film and Year columns
#bond[bond['Year'] >= 2000][['Film', 'Year']]
bond.loc[bond['Year'] >= 2000][['Film', 'Year']]

Unnamed: 0,Film,Year
21,Die Another Day,2002
22,Casino Royale,2006
23,Quantum of Solace,2008
24,Skyfall,2012
25,Spectre,2015
26,No Time to Die,2021


In [63]:
#2) Use .loc to filter rows where the Actor is "Daniel Craig" and return the Film, Year, and Box Office columns.
bond.loc[bond['Actor'] == 'Daniel Craig'][['Film', 'Year', 'Box Office']]

Unnamed: 0,Film,Year,Box Office
22,Casino Royale,2006,581.5
23,Quantum of Solace,2008,514.2
24,Skyfall,2012,943.5
25,Spectre,2015,726.7
26,No Time to Die,2021,774.2


In [64]:
#3) Use .loc to filter rows where the Box Office is greater than 500 million and the Year is after 2000. Return all columns for the filtered rows.
bond[ (bond['Box Office'] > 500000) & (bond['Year'] > 2000) ]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary


In [65]:
#4) Use .loc to filter rows where the Actor is either "Pierce Brosnan" or "Sean Connery". Return only the Actor, Film, and Box Office columns
actors_of_interest= ['Pierce Brosnan', 'Sean Connery']
columns_to_show= ['Actor', 'Film', 'Box Office']

bond[ bond['Actor'].isin(actors_of_interest) ][columns_to_show]

Unnamed: 0,Actor,Film,Box Office
0,Sean Connery,Dr. No,448.8
1,Sean Connery,From Russia with Love,543.8
2,Sean Connery,Goldfinger,820.4
3,Sean Connery,Thunderball,848.1
5,Sean Connery,You Only Live Twice,514.2
7,Sean Connery,Diamonds Are Forever,442.5
13,Sean Connery,Never Say Never Again,380.0
18,Pierce Brosnan,GoldenEye,518.5
19,Pierce Brosnan,Tomorrow Never Dies,463.2
20,Pierce Brosnan,The World Is Not Enough,439.5


In [70]:
#5) Use .iloc to return the first 5 rows and the first 3 columns from the DataFrame.
bond.iloc[[row_index for row_index in range(5)], [col_index for col_index in range(3)]]
#bonc.iloc[ [0,1,2,3,4], [0,1,2] ]

Unnamed: 0,Film,Year,Actor
0,Dr. No,1962,Sean Connery
1,From Russia with Love,1963,Sean Connery
2,Goldfinger,1964,Sean Connery
3,Thunderball,1965,Sean Connery
4,Casino Royale,1967,David Niven


In [72]:
#6) Use .loc to filter rows where the Box Office is between 200 and 500 million. Return only the Film and Box Office columns.
bond[ bond['Box Office'].between(200000000, 500000000) ][['Film', 'Box Office']]


Unnamed: 0,Film,Box Office


In [83]:
#7) Use .loc to filter rows where the Director is "Martin Campbell", the Budget is less than 100 million, and the Box Office is greater than 300 million. Return the Film, Director, Box Office, and Budget columns.

have_Martin_as_director= bond['Director'] == 'Martin Campbell'
budget_is_less_than_100M= bond['Budget'] < 100000000
box_office_is_greater_than_300M= bond['Box Office'] > 300000000

bond[
    have_Martin_as_director
    & budget_is_less_than_100M
    & box_office_is_greater_than_300M
][['Film', 'Director', 'Box Office', 'Budget']]

Unnamed: 0,Film,Director,Box Office,Budget


In [84]:
#8)  Use .loc to filter rows where the Year is between 1980 and 2000, and the Actor is not "Roger Moore". Return the Film, Year, and Actor columns.
bond[ bond['Year'].between(1980, 2000) & ( bond['Actor'] != 'Roger Moore' ) ][['Film', 'Year', 'Actor']]

Unnamed: 0,Film,Year,Actor
13,Never Say Never Again,1983,Sean Connery
16,The Living Daylights,1987,Timothy Dalton
17,Licence to Kill,1989,Timothy Dalton
18,GoldenEye,1995,Pierce Brosnan
19,Tomorrow Never Dies,1997,Pierce Brosnan
20,The World Is Not Enough,1999,Pierce Brosnan


In [88]:
#9) Use .loc to filter the rows where the Year is after 1990, and either the Box Office is less than 200 million or the Budget is greater than 100 million. Return all columns for the filtered rows.
bond.loc[ ( bond['Year'] > 1990 ) & ( (bond['Box Office'] < 200000000) | (bond['Budget'] > 100000000) ) ]

film_was_produced_after_90s= bond['Year'] > 1990
box_office_is_less_than_200M= bond['Box Office'] < 200000000
budget_is_greater_than_100M= bond['Budget'] > 100000000
bond[ film_was_produced_after_90s & (box_office_is_less_than_200M | budget_is_greater_than_100M)]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
18,GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
19,Tomorrow Never Dies,1997,Pierce Brosnan,Roger Spottiswoode,463.2,133.9,10.0
20,The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5
21,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
22,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
23,Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
25,Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,30.0
26,No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0


In [94]:
round(2.387473846,2)

2.39

In [102]:
#10) Use .iloc to filter the rows where the Box Office is greater than the average Box Office value. Return only the Film, Year, and Box Office columns for these rows.

box_office_mean= bond['Box Office'].mean()
bond[ bond['Box Office'] > box_office_mean ][['Film', 'Year', 'Box Office']]

Unnamed: 0,Film,Year,Box Office
1,From Russia with Love,1963,543.8
2,Goldfinger,1964,820.4
3,Thunderball,1965,848.1
5,You Only Live Twice,1967,514.2
10,The Spy Who Loved Me,1977,533.0
11,Moonraker,1979,535.0
18,GoldenEye,1995,518.5
22,Casino Royale,2006,581.5
23,Quantum of Solace,2008,514.2
24,Skyfall,2012,943.5


## Extra Exercises

In [106]:
#1) Filter the bond DataFrame to show films where the Actor is either "Daniel Craig" or "Pierce Brosnan", the Box Office is greater than 500 million, and the Budget is less than 150 million. Return the columns Film, Actor, Box Office, and Budget

columns_to_show=['Film', 'Actor', 'Box Office', 'Budget']
bond[ 
    bond['Actor'].isin(["Daniel Craig","Pierce Brosnan"]) 
    & (bond['Box Office'] > 500)
    & (bond['Budget'] < 150)
][columns_to_show]

Unnamed: 0,Film,Actor,Box Office,Budget
18,GoldenEye,Pierce Brosnan,518.5,76.9
22,Casino Royale,Daniel Craig,581.5,145.3


In [107]:
#2) Filter the rows where the Year is between 1990 and 2010, the Actor is not "Roger Moore", and either the Box Office is less than 200 million or the Budget is greater than 120 million.

bond[ 
    bond['Year'].isin([1980, 2010]) 
    & (bond['Actor'] != 'Roger Moore')
    & ( (bond['Box Office'] < 200) | (bond['Budget'] > 120) )
]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary


In [171]:
#3) Group the bond DataFrame by Actor and calculate the average Box Office, the total Budget, and the maximum Bond Actor Salary for each actor. Filter the results to show only actors with an average Box Office greater than 500 million.

bond_box_office_summary= bond.groupby('Actor')[['Box Office', 'Budget', 'Bond Actor Salary']].agg(['mean', 'max', 'sum'])['Box Office']
bond_box_office_summary.rename(columns={
    'mean': 'AverageBO'
    ,'max': 'MaxBO'
    ,'sum': 'TotalBO'
}, inplace= True)

bond_budget_summary= bond.groupby('Actor')[['Box Office', 'Budget', 'Bond Actor Salary']].agg(['mean', 'max', 'sum'])['Budget']
bond_budget_summary.rename(columns={
    'mean': 'AverageBudget'
    ,'max': 'MaxBudget'
    ,'sum': 'TotalBudget'
}, inplace= True)

bond_salary_summary= bond.groupby('Actor')[['Box Office', 'Budget', 'Bond Actor Salary']].agg(['mean', 'max', 'sum'])['Bond Actor Salary']
bond_salary_summary.rename(columns={
    'mean': 'AverageSalary'
    ,'max': 'MaxSalary'
    ,'sum': 'TotalSalary'
}, inplace= True)

df= pd.concat([bond_box_office_summary['AverageBO'], bond_budget_summary['TotalBudget'], bond_salary_summary['MaxSalary']], axis=1)
df

# just to improve data visualization
def round_2_houses(x):
    return round(x,2)

df['AverageBO']= df['AverageBO'].apply(round_2_houses)
df

# final code script
df[ df['AverageBO']> 500 ]

Unnamed: 0_level_0,AverageBO,TotalBudget,MaxSalary
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Daniel Craig,708.02,1004.2,30.0
Sean Connery,571.11,260.7,5.8


In [None]:
# another way to solve
bond.groupby('Actor').agg(
    avg_box_office=('Box Office', 'mean'),
    total_budget=('Budget', 'sum'),
    max_salary=('Bond Actor Salary', 'max')
)

# combining gpt answer with mine:
bond.groupby('Actor').agg(
    AverageBO= ('Box Office', 'mean')
    ,TotalBudget= ('Budget', 'sum')
    ,MaxSalary= ('Bond Actor Salary', 'max')
)

Unnamed: 0_level_0,AverageBO,TotalBudget,MaxSalary
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Daniel Craig,708.02,1004.2,30.0
David Niven,315.0,85.0,
George Lazenby,291.5,37.3,0.6
Pierce Brosnan,471.65,523.3,17.9
Roger Moore,422.957143,363.7,9.1
Sean Connery,571.114286,260.7,5.8
Timothy Dalton,282.2,125.5,7.9


In [147]:
#4) Filter the rows where Bond Actor Salary is missing (NaN) and the Box Office is greater than 400 million. Return the columns Film, Actor, Box Office, and Bond Actor Salary.

bond[ bond['Bond Actor Salary'].isna() & (bond['Box Office'] > 400) ][['Film', 'Actor', 'Box Office', 'Bond Actor Salary']]

Unnamed: 0,Film,Actor,Box Office,Bond Actor Salary
8,Live and Let Die,Roger Moore,460.3,
10,The Spy Who Loved Me,Roger Moore,533.0,
11,Moonraker,Roger Moore,535.0,
12,For Your Eyes Only,Roger Moore,449.4,


In [152]:
#5) Sort the bond DataFrame by Box Office in descending order and select the top 10 films where the Budget is greater than 100 million. Return the columns Film, Box Office, and Budget.

bond[ bond['Budget'] > 100].sort_values(by='Box Office', ascending= False).head(10)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
26,No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0
25,Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,30.0
22,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
23,Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
21,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
19,Tomorrow Never Dies,1997,Pierce Brosnan,Roger Spottiswoode,463.2,133.9,10.0
20,The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5


In [153]:
#6) Filter the rows where Box Office is between 300 and 600 million, and the Budget is between 80 and 150 million. Return only the columns Film, Year, Box Office, and Budget.

bond[ bond['Box Office'].between(300, 600) & bond['Budget'].between(80, 150) ][['Film', 'Year', 'Box Office', 'Budget']]

Unnamed: 0,Film,Year,Box Office,Budget
4,Casino Royale,1967,315.0,85.0
11,Moonraker,1979,535.0,91.5
13,Never Say Never Again,1983,380.0,86.0
19,Tomorrow Never Dies,1997,463.2,133.9
22,Casino Royale,2006,581.5,145.3


In [161]:
#7) Create a custom function that checks whether the Box Office is more than 5 times the Budget (i.e., Box Office > 5 * Budget). Use .apply() to filter the bond DataFrame based on this condition. Return the columns Film, Box Office, and Budget.

df= bond[ bond['Box Office'] > 5*bond['Budget']][['Film', 'Box Office', 'Budget']]
df['Validation']= df['Box Office']/df['Budget']
df['Validation']= df['Validation'].apply(round_2_houses)
df.sort_values(by= 'Validation', ascending= False)

Unnamed: 0,Film,Box Office,Budget,Validation
0,Dr. No,448.8,7.0,64.11
2,Goldfinger,820.4,18.6,44.11
1,From Russia with Love,543.8,12.6,43.16
3,Thunderball,848.1,41.9,20.24
8,Live and Let Die,460.3,30.8,14.94
7,Diamonds Are Forever,442.5,34.7,12.75
9,The Man with the Golden Gun,334.0,27.7,12.06
10,The Spy Who Loved Me,533.0,45.1,11.82
5,You Only Live Twice,514.2,59.9,8.58
6,On Her Majesty's Secret Service,291.5,37.3,7.82


In [162]:
#8) Filter the rows where the Actor is either "Sean Connery", "Pierce Brosnan", or "Roger Moore", and the Box Office is greater than 400 million. Sort the results by Box Office in descending order and return the columns Film, Actor, and Box Office

bond[ bond['Actor'].isin(['Sean Connery', 'Pierce Brosnan', 'Roger Moore']) & (bond['Box Office'] > 400) ][['Film', 'Actor', 'Box Office']]\
.sort_values(by=['Box Office'], ascending= [False])

Unnamed: 0,Film,Actor,Box Office
3,Thunderball,Sean Connery,848.1
2,Goldfinger,Sean Connery,820.4
1,From Russia with Love,Sean Connery,543.8
11,Moonraker,Roger Moore,535.0
10,The Spy Who Loved Me,Roger Moore,533.0
18,GoldenEye,Pierce Brosnan,518.5
5,You Only Live Twice,Sean Connery,514.2
21,Die Another Day,Pierce Brosnan,465.4
19,Tomorrow Never Dies,Pierce Brosnan,463.2
8,Live and Let Die,Roger Moore,460.3


In [169]:
#9) Apply a condition to the bond DataFrame where you need to find films where the difference between the Box Office and Budget is less than 50 million. Return the columns Film, Box Office, and Budget for the selected films.

bond[ (abs(bond['Box Office'] - bond['Budget']) < 50) ][['Film', 'Box Office', 'Budget']]

Unnamed: 0,Film,Box Office,Budget


In [170]:
#10) Use .isin() to filter the bond DataFrame where the Actor is either "Daniel Craig", "Pierce Brosnan", or "Sean Connery", the Year is after 1990, and the Box Office is greater than 300 million. Return the columns Film, Actor, Year, and Box Office.

bond[ bond['Actor'].isin(['Daniel Craig', 'Pierce Brosnan', 'Sean Connery']) & (bond['Year'] == 1990) & (bond['Box Office'] > 300) ][['Film', 'Actor', 'Year', 'Box Office']]

Unnamed: 0,Film,Actor,Year,Box Office


In [183]:
#11) Create a new column called Box Office to Budget Ratio that stores the ratio of Box Office to Budget (i.e., Box Office / Budget). Return the first 10 rows to inspect the new column.

bond['Box Office to Budget Ratio']= bond['Box Office']/bond['Budget']
bond['Box Office to Budget Ratio']=bond['Box Office to Budget Ratio'].apply(round_2_houses)
bond.head(10)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary,Box Office to Budget Ratio
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6,64.11
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6,43.16
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2,44.11
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7,20.24
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,,3.71
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4,8.58
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6,7.82
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8,12.75
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,,14.94
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,,12.06


In [None]:
#12) For rows where Bond Actor Salary is missing (NaN), fill the missing values with the mean of the Bond Actor Salary column. Return the first 10 rows to check if the missing values were replaced.

missing_values_index= bond[bond['Bond Actor Salary'].isna()].index # auxiliar variable to validate the final result
display(bond.iloc[missing_values_index]) # ate aqui ok
# the treatment above is redundant (the first line wouuld already give the result we want), but I did this because I need to store the row indexes specifically in order to search them again after applying the treatments on salary column

bond['Bond Actor Salary']= bond['Bond Actor Salary'].fillna(round(bond['Bond Actor Salary'].mean(),1))
display(bond.iloc[missing_values_index]) # validação

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,
10,The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,
11,Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,
12,For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
13,Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,


Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,8.9
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,8.9
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,8.9
10,The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,8.9
11,Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,8.9
12,For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,8.9
13,Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,8.9


In [214]:
(bond['Box Office'] > 400) & (bond['Budget'] < 150)

0      True
1      True
2      True
3      True
4     False
5      True
6     False
7      True
8      True
9     False
10     True
11     True
12     True
13    False
14    False
15    False
16    False
17    False
18     True
19     True
20    False
21    False
22     True
23    False
24    False
25    False
26    False
dtype: bool

In [209]:
#13) Create a new column called Successful Film that is True if the Box Office is greater than 400 million and the Budget is less than 150 million, otherwise False. Return the first 10 rows.

bond['Succesful Film']= ''

aux=[]
for index in bond.index:
    if bond.iloc[index]['Box Office'] > 400 and bond.iloc[index]['Budget'] < 150:
        aux.append(True)
    else:
        aux.append(False)
bond['Succesful Film']= pd.Series(aux)
bond.head(10)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary,Succesful Film
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6,True
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6,True
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2,True
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7,True
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,8.9,False
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4,True
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6,False
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8,True
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,8.9,True
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,8.9,False


In [215]:
bond['Succesful Film 2']= (bond['Box Office'] > 400) & (bond['Budget'] < 150)
bond.head(10)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary,Succesful Film,Succesful Film 2
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6,True,True
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6,True,True
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2,True,True
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7,True,True
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,8.9,False,False
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4,True,True
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6,False,False
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8,True,True
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,8.9,True,True
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,8.9,False,False


In [221]:
#14) Drop all rows where either the Box Office or Bond Actor Salary column contains missing values (NaN). Return the shape of the resulting DataFrame to see how many rows were removed.

print(bond.shape)
bond.dropna(subset=['Box Office', 'Bond Actor Salary'])
print(bond.shape)


(27, 9)
(27, 9)


In [229]:
#15) Format the Box Office column to display the values as currency (i.e., with a dollar sign and two decimal places). Create a new column called Formatted Box Office with the formatted values and return the first 10 rows.

bond.head()
def format_box_office(x):
    return 'US$ '+str(x)

bond['Formatted Box Office']= bond['Box Office'].apply(format_box_office)
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary,Succesful Film,Succesful Film 2,Formatted Box Office
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6,True,True,US$ 448.8
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6,True,True,US$ 543.8
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2,True,True,US$ 820.4
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7,True,True,US$ 848.1
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,8.9,False,False,US$ 315.0


## Second Arguments to loc and iloc Accessors
- The second value inside the square brackets targets the columns.
- The `iloc` requires numeric positions for rows and columns.
- The `loc` requires labels for rows and columns.

In [5]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [21]:
bond.loc['Diamonds Are Forever', 'Director'] # here we are telling pandas to return us the intersection between the first argument and the second one (it's only recommended to do this with a defined index)

bond.loc[['Octopussy', 'GoldenEye'], 'Director']
bond.loc[['Octopussy', 'GoldenEye'], 'Director':'Budget']
bond.loc['GoldenEye':'Octopussy', 'Director':'Budget']
bond.loc['GoldenEye':'Octopussy', ['Actor', 'Bond Actor Salary', 'Year']]

Unnamed: 0_level_0,Actor,Bond Actor Salary,Year
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GoldenEye,Pierce Brosnan,5.1,1995
Goldfinger,Sean Connery,3.2,1964
Licence to Kill,Timothy Dalton,7.9,1989
Live and Let Die,Roger Moore,,1973
Moonraker,Roger Moore,,1979
Never Say Never Again,Sean Connery,,1983
No Time to Die,Daniel Craig,25.0,2021
Octopussy,Roger Moore,7.8,1983


In [26]:
# same syntax applies for iloc, but the difference is that when dealing with iloc we need to use indexes instead of properly defined labels

bond.iloc[0] #pulling out the very first row as a Series
bond.iloc[0, 2] #pulling out the 'Director' value of the first row (since the 'Director' column has the numeric index 2)
bond.iloc[3, 5]

bond.iloc[[0,2], 3]
bond.iloc[[0,2], [3,5]]
bond.iloc[:7,:3]

Unnamed: 0_level_0,Year,Actor,Director
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,1985,Roger Moore,John Glen
Casino Royale,2006,Daniel Craig,Martin Campbell
Casino Royale,1967,David Niven,Ken Hughes
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton
Die Another Day,2002,Pierce Brosnan,Lee Tamahori
Dr. No,1962,Sean Connery,Terence Young
For Your Eyes Only,1981,Roger Moore,John Glen


## Overwrite Value in a DataFrame
- Use the `iloc` or `loc` accessor on the **DataFrame** to target a value, then provide the equal sign and a new value.

In [40]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [32]:
bond['Actor'].loc['Diamonds Are Forever'] = 'Sir Sean Connery'
# bond['Actor'] returns a series, which in this case is a view from the initial dataframe. So modifying any value in here will already change the value inside the original data set

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  bond['Actor'].loc['Diamonds Are Forever'] = 'Sir Sean Connery'


In [41]:
bond.loc['Diamonds Are Forever', 'Actor']= 'Sir Sean Connery' # in this case we are dealing with a copy of the original dataframe, not a view (so changing data using this kind of syntax will not modify the original dataframe)
bond

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [39]:
bond

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


- It's highly recommended to use the accessers at the top level object instead at slices or views of it. The tip is: don't use loc (iloc) on a smaller component of a larger object.

- With this we will not be subject to the risk of manipulating an entire copy instead of a view (these concepts are a little bit messy while dealing with pandas). If we accidentaly receive a copy, our manipulations on data will not be applied to the original object, this is the point.

##  Overwrite Multiple Values in a DataFrame
- The `replace` method replaces all occurrences of a **Series** value with another value (think of it like "Find and Replace").
- To overwrite multiple values in a **DataFrame**, remember to use an accessor on the **DataFrame** itself.
- Accessors like `loc` and `iloc` can accept Boolean Series. Use them to target the values to overwrite.

In [53]:
bond= pd.read_csv('jamesbond.csv', index_col='Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [57]:
bond['Actor'].replace('Sean Connery', 'Sir Sean Connery')
# the method above returns a copy of the original dataframe (not a view), so the changes are not directly applied yet

# but, we can apply them to respective column itself
bond['Actor']= bond['Actor'].replace('Sean Connery', 'Sir Sean Connery')
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [62]:
# In the vast majority of cases in which we use the square brackets syntax df[ ... ] we are extracting a copy of the original object, and then any changes applied to it will not reflect the initial dataframe. The best way of overwriting data is, indeed, with the help of accessors of loc/iloc, because they act directly on the original dataframe.

bond= pd.read_csv('jamesbond.csv', index_col='Film').sort_index()

is_sean_connery= bond['Actor'] == 'Sean Connery'
bond.loc[is_sean_connery, 'Actor'] = 'Sir Sean Connery'
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


## Rename Index Labels or Columns in a DataFrame
- The `rename` method accepts a dictionary for either its `columns` or `index` parameters.
- The dictionary keys represent the existing names and the values represent the new names.
- We can replace all columns by overwriting the **DataFrame's** `columns` attribute.

In [2]:
import pandas as pd

In [3]:
# On the last topics we've seen how to overwrite values within the dataframe. Now we are going to look how to replace indexes/columns labels themselves
bond= pd.read_csv('jamesbond.csv', index_col='Film')
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [8]:
bond.rename(columns={'Year': 'Year of Release', 'Box Office': 'Revenue'}, inplace= True)

In [12]:
swaps= {
    "Dr. No": "Dr No",
    "GoldenEye": "Golden Eye",
    "The World Is Not Enough": "Best Bond Movie Ever"
}
bond.rename(index= swaps, inplace= True)#.sort_index()

In [15]:
type(bond.columns)

pandas.core.indexes.base.Index

In [16]:
bond.columns= ('Year', 'Bond Guy', 'Camera Dude', 'Revenues', 'Cost', 'Salary')
bond.head()

Unnamed: 0_level_0,Year,Bond Guy,Camera Dude,Revenues,Cost,Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- If we are desiring to use the syntax above we need to specify all the dataframe columns (it will not accepect subsets of schema like we applied with rename method)

## Delete Rows or Columns from a DataFrame
- The `drop` method deletes one or more rows/columns from a **DataFrame**.
- Pass the `index` or `columns` parameters a list of the column names to remove.
- The `pop` method removes and returns a single **Series** (it mutates the **DataFrame** in the process).
- Python's `del` keyword also removes a single **Series**.

In [31]:
bond.dtypes

Year                   int64
Actor                 object
Director              object
Box Office           float64
Budget               float64
Bond Actor Salary    float64
dtype: object

In [30]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [25]:
bond.drop(columns=['Box Office', 'Budget']) # this generates a copy of the new dataframe without the columns Box Office/ Budget (the original one will not be changed)

bond.drop(index='Casino Royale')

bond.drop(index= ['No Time to Die', 'Casino Royale'], columns= ['Box Office', 'Budget']).head()

Unnamed: 0_level_0,Year,Actor,Director,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A View to a Kill,1985,Roger Moore,John Glen,9.1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,17.9
Dr. No,1962,Sean Connery,Terence Young,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,


In [26]:
bond.pop('Actor') # mutational method (it will apply the changes directly on the original dataframe)
bond

Unnamed: 0_level_0,Year,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A View to a Kill,1985,John Glen,275.2,54.5,9.1
Casino Royale,2006,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,John Glen,449.4,60.2,
From Russia with Love,1963,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Guy Hamilton,820.4,18.6,3.2


In [27]:
del bond['Year']

In [29]:
bond.head()

Unnamed: 0_level_0,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A View to a Kill,John Glen,275.2,54.5,9.1
Casino Royale,Martin Campbell,581.5,145.3,3.3
Casino Royale,Ken Hughes,315.0,85.0,
Diamonds Are Forever,Guy Hamilton,442.5,34.7,5.8
Die Another Day,Lee Tamahori,465.4,154.2,17.9


## Some Chat GPT Exercises

In [49]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film')
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [51]:
#1) Rename the index of the DataFrame from "Film" to "Movie Title".
bond.reset_index(inplace= True)

from_to= {'Film': 'Movie Title'}
bond.rename(columns= from_to, inplace= True)

bond.set_index('Movie Title', inplace= True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Movie Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [52]:
bond.index.name

'Movie Title'

In [55]:
#1) (Answer) Another way to solve this problem (smarter one)
bond= pd.read_csv('jamesbond.csv', index_col= 'Film')
display(bond.head())

print()

bond.index.name= 'Movie Title'
display(bond.head())

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,





Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Movie Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [57]:
#2) Rename the column "Box Office" to "Revenue" and "Bond Actor Salary" to "Actor Salary".
columns_from_to= {'Box Office': 'Revenue', 'Bond Actor Salary': 'Actor Salary'}

bond.rename(columns= columns_from_to, inplace= True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Revenue,Budget,Actor Salary
Movie Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [58]:
#3) Rename all columns to the following list: ["Year Released", "Main Actor", "Director", "Revenue", "Production Budget", "Actor Salary"].
bond= pd.read_csv('jamesbond.csv', index_col='Film')

bond.columns= ["Year Released", "Main Actor", "Director", "Revenue", "Production Budget", "Actor Salary"]
bond.head()

Unnamed: 0_level_0,Year Released,Main Actor,Director,Revenue,Production Budget,Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [63]:
#4) Drop the column "Budget" from the DataFrame.
bond= pd.read_csv('jamesbond.csv', index_col='Film')
bond.drop(columns=['Budget']) # inplace= True
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [66]:
#5) Drop the columns "Director" and "Bond Actor Salary".

bond.drop(columns= ['Director', 'Bond Actor Salary']).head() # inplace= True

Unnamed: 0_level_0,Year,Actor,Box Office,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Dr. No,1962,Sean Connery,448.8,7.0
From Russia with Love,1963,Sean Connery,543.8,12.6
Goldfinger,1964,Sean Connery,820.4,18.6
Thunderball,1965,Sean Connery,848.1,41.9
Casino Royale,1967,David Niven,315.0,85.0


In [75]:
#6) Drop the row corresponding to the film that is at index "Skyfall"
bond.drop(index= 'Skyfall').sort_index().head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [88]:
bond[bond['Revenue'] < 300]

Unnamed: 0_level_0,Year,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9


In [90]:
#7) Drop all rows where the "Box Office" (now renamed "Revenue") is less than 300 million.
#bond.rename(columns={'Box Office': 'Revenue'}, inplace= True)

revenue_smaller_than_300M= bond[bond['Revenue'] < 300].index
bond.drop(index= revenue_smaller_than_300M).sort_index().head()

# another way to solve this problem would be by defining a new dataframe containing the inverse filter (all the rows in which Revenue is greater than or equal to 300M)

Unnamed: 0_level_0,Year,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6


In [92]:
#8) Rename the columns using the following dictionary: {"Year": "Release Year", "Actor": "Lead Actor", "Director": "Film Director"}

column_remap= {"Year": "Release Year", "Actor": "Lead Actor", "Director": "Film Director"}
bond.rename(columns= column_remap).sort_index().head()

Unnamed: 0_level_0,Release Year,Lead Actor,Film Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [None]:
#9) Drop any rows in the DataFrame that contain missing or NaN values.
print(bond.shape)
aux= bond.dropna(how='any') #inplace== True
print(aux.shape)

(27, 6)
(20, 6)


In [100]:
#10) Reset the index and convert the old index (Film titles) into a new column called "Film Title".
bond.reset_index(inplace=True)
bond.rename(columns={'Film': 'Film Title'}, inplace=True)
bond.head()

Unnamed: 0,Film Title,Year,Actor,Director,Revenue,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## Create Random Sample with the sample Method
- The `sample` method returns a specified one or more random rows from the **DataFrame**.
- Customize the `axis` parameter to extract random columns.

In [101]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [107]:
len(bond)

27

In [113]:
bond.sample()
bond.sample(n= 5) # extracting 5 entire random rows from the data frame
bond.sample(n= 3, axis= 'rows') # extracting 3 random rows from the data frame ('rows' value is the method default)
bond.sample(n= 3, axis= 'columns').head() # extracting 3 random columns from the data frame

Unnamed: 0_level_0,Bond Actor Salary,Box Office,Director
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,9.1,275.2,John Glen
Casino Royale,3.3,581.5,Martin Campbell
Casino Royale,,315.0,Ken Hughes
Diamonds Are Forever,5.8,442.5,Guy Hamilton
Die Another Day,17.9,465.4,Lee Tamahori


## The nsmallest and nlargest Methods
- The `nlargest` method returns a specified number of rows with the largest values from a given column.
- The `nsmallest` method returns rows with the smallest values from a given column.
- The `nlargest` and `nsmallest` methods are more efficient than sorting the entire **DataFrame**.

In [114]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [115]:
# Retrieve the 4 filmes with the highest box office gross
bond.sort_values(by='Box Office', ascending= False).head(4)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0


In [117]:
# Getting a very same result
bond.nlargest(4, columns='Box Office')
bond['Box Office'].nlargest(4) # (in this case we're applying the method on the series)

Film
Skyfall           943.5
Thunderball       848.1
Goldfinger        820.4
No Time to Die    774.2
Name: Box Office, dtype: float64

In [119]:
# 3 rows with the smallest value in Bond Actor Salary
bond.nsmallest(3, columns='Bond Actor Salary')
bond['Bond Actor Salary'].nsmallest(3)

Film
Dr. No                             0.6
On Her Majesty's Secret Service    0.6
From Russia with Love              1.6
Name: Bond Actor Salary, dtype: float64

## Filtering with the where Method
- Similar to square brackets or `loc`, the `where` method filters the original `DataFrame` with a Boolean Series.
- Pandas will populate rows that do **not** match the criteria with `NaN` values.
- Leaving in the `NaN` values can be advantageous for certain merge and visualization operations.

In [120]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [122]:
actor_is_sean_connery= bond['Actor'] == 'Sean Connery'
bond[actor_is_sean_connery]
bond.loc[actor_is_sean_connery] # equivalent to the previous one, but more useful when we want to overwrite values
bond.where(actor_is_sean_connery)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,,,,,,
Casino Royale,,,,,,
Casino Royale,,,,,,
Diamonds Are Forever,1971.0,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,,,,,,
Dr. No,1962.0,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,,,,,,
From Russia with Love,1963.0,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,,,,,,
Goldfinger,1964.0,Sean Connery,Guy Hamilton,820.4,18.6,3.2


## The apply Method with DataFrames
- The `apply` method invokes a function on every column or every row in the **DataFrame**.
- Pass the uninvoked function as the first argument to the `apply` method.
- Pass the `axis` parameter an argument of `"columns"` to invoke the function on every row.
- Pandas will pass in the row's values as a **Series** object. We can use accessors like `loc` and `iloc` to extract the column's values for that row.

In [124]:
bond= pd.read_csv('jamesbond.csv', index_col= 'Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [133]:
# MOVIE RANKING SYSTEM
#
# CONDITION             --> RESIGNATION
# 80s Movie             --> "Great 80's flick"
# Pierce Brosnan        --> "The best Bond ever"
# Budget > 100          --> "Expensive movie, fun"
# Others                --> "No comment"
#

# We already saw that using the 'apply' method on a series will apply a pre-defined function (either user defined or python built-in) into the series values. Now we're going to take a look at the apply method applied to a dataframe itself.

# In general, it will apply the funcion over a ser of axes. We can choose if, given a certain row we pass the funcion over the columns or, given a certain column, we pass the function over all the rows (which is pretty similiar to applyng at a Series actually)

def rank_movie(row):
    print(row)

bond.apply(rank_movie, axis= 'columns').head(0) # given a row, applying it to the columns values

# (for each row we get a series with the the column values and their corresponding labels)

Year                        1985
Actor                Roger Moore
Director               John Glen
Box Office                 275.2
Budget                      54.5
Bond Actor Salary            9.1
Name: A View to a Kill, dtype: object
Year                            2006
Actor                   Daniel Craig
Director             Martin Campbell
Box Office                     581.5
Budget                         145.3
Bond Actor Salary                3.3
Name: Casino Royale, dtype: object
Year                        1967
Actor                David Niven
Director              Ken Hughes
Box Office                 315.0
Budget                      85.0
Bond Actor Salary            NaN
Name: Casino Royale, dtype: object
Year                         1971
Actor                Sean Connery
Director             Guy Hamilton
Box Office                  442.5
Budget                       34.7
Bond Actor Salary             5.8
Name: Diamonds Are Forever, dtype: object
Year                        

Series([], dtype: object)

In [144]:
def rank_movie(row):
    # for a given row, we have a series with column names as labels
    
    if row.loc['Year'] >= 1980 and row.loc['Year'] < 1990:
        return "Great 80's flick"

    if row.loc['Actor'] == 'Pierce Brosnan':
        return 'The best Bond ever'

    if row.loc['Budget'] > 100:
        return 'Expensive movie, fun'
    
    return 'No comment'

bond.apply(rank_movie, axis= 'columns') # or 1


Film
A View to a Kill                       Great 80's flick
Casino Royale                      Expensive movie, fun
Casino Royale                                No comment
Diamonds Are Forever                         No comment
Die Another Day                      The best Bond ever
Dr. No                                       No comment
For Your Eyes Only                     Great 80's flick
From Russia with Love                        No comment
GoldenEye                            The best Bond ever
Goldfinger                                   No comment
Licence to Kill                        Great 80's flick
Live and Let Die                             No comment
Moonraker                                    No comment
Never Say Never Again                  Great 80's flick
No Time to Die                     Expensive movie, fun
Octopussy                              Great 80's flick
On Her Majesty's Secret Service              No comment
Quantum of Solace                  Expensiv

In [145]:
# and we can use it to define new columns
bond['Movie Rank']= bond.apply(rank_movie, axis= 'columns') 
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary,Movie Rank
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1,Great 80's flick
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3,"Expensive movie, fun"
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,,No comment
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8,No comment
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9,The best Bond ever


## Some Chat GPT exercises

In [148]:
#1) Apply a function to the "Box Office" column to convert all the values into billions (i.e., divide each value by 1,000).

def box_office_converter(x):
    return x/1000

bond['Box Office'].apply(box_office_converter).head()

Film
A View to a Kill        0.2752
Casino Royale           0.5815
Casino Royale           0.3150
Diamonds Are Forever    0.4425
Die Another Day         0.4654
Name: Box Office, dtype: float64

In [149]:
None

In [158]:
#2) Create a new column called "Revenue to Budget Ratio" by applying a function across each row to divide "Revenue" by "Budget".
import numpy as np
def values_ratio(row):
    return row['Box Office']/(row['Budget'] if row['Budget'] != 0  and not np.isnan(row['Budget']) else 999999999999)

bond.apply(values_ratio, axis= 1) # axis 1 = columns (each row applied) // axis 0 = index (each column applied)

Film
A View to a Kill                    5.049541
Casino Royale                       4.002065
Casino Royale                       3.705882
Diamonds Are Forever               12.752161
Die Another Day                     3.018158
Dr. No                             64.114286
For Your Eyes Only                  7.465116
From Russia with Love              43.158730
GoldenEye                           6.742523
Goldfinger                         44.107527
Licence to Kill                     4.425044
Live and Let Die                   14.944805
Moonraker                           5.846995
Never Say Never Again               4.418605
No Time to Die                      2.572093
Octopussy                           6.935065
On Her Majesty's Secret Service     7.815013
Quantum of Solace                   2.834620
Skyfall                             5.543478
Spectre                             3.522540
The Living Daylights                4.556686
The Man with the Golden Gun        12.057762
The S

In [161]:
#3) Create a new column called "High Revenue" where the value is "Yes" if the "Revenue" is greater than 300 million, and "No" otherwise.

def high_revenue(x):
    if x['Box Office'] > 300:
        return 'Yes'
    else:
        return 'No'
    
bond['Revenue Rank']= bond.apply(high_revenue, axis= 'columns')
bond[['Box Office', 'Revenue Rank']].head()

Unnamed: 0_level_0,Box Office,Revenue Rank
Film,Unnamed: 1_level_1,Unnamed: 2_level_1
A View to a Kill,275.2,No
Casino Royale,581.5,Yes
Casino Royale,315.0,Yes
Diamonds Are Forever,442.5,Yes
Die Another Day,465.4,Yes


In [163]:
#4) Apply a function that sums "Revenue" and "Budget" for each row and creates a new column "Total" with the sum of these two columns.

bond['Total']= bond.apply(lambda row: row['Box Office'] + row['Budget'], axis= 'columns')
bond[['Box Office', 'Budget', 'Total']].head()

Unnamed: 0_level_0,Box Office,Budget,Total
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,275.2,54.5,329.7
Casino Royale,581.5,145.3,726.8
Casino Royale,315.0,85.0,400.0
Diamonds Are Forever,442.5,34.7,477.2
Die Another Day,465.4,154.2,619.6


In [166]:
#5) Create a new column "Actor Salary Adjustment" by applying a lambda function where the "Actor Salary" is increased by 20% if the "Year" is after 2012, otherwise, no adjustment.

bond['Actor Salary Adjustment']= bond.apply(lambda x: x['Bond Actor Salary']*(1+0.2) if x['Year'] >= 2012 else x['Bond Actor Salary'], axis= 'columns' )
bond.sort_values(by='Year', ascending= False)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary,Movie Rank,Revenue Rank,Total,Actor Salary Adjustment
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0,"Expensive movie, fun",Yes,1075.2,30.0
Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,30.0,"Expensive movie, fun",Yes,933.0,36.0
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5,"Expensive movie, fun",Yes,1113.7,14.5
Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1,"Expensive movie, fun",Yes,695.6,8.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3,"Expensive movie, fun",Yes,726.8,3.3
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9,The best Bond ever,Yes,619.6,17.9
The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5,The best Bond ever,Yes,597.8,13.5
Tomorrow Never Dies,1997,Pierce Brosnan,Roger Spottiswoode,463.2,133.9,10.0,The best Bond ever,Yes,597.1,10.0
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1,The best Bond ever,Yes,595.4,5.1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9,Great 80's flick,No,307.6,7.9
