![DSB logo](img/Dolan.jpg)
# Apply Functions to Your DataFrame

## PD4E Chapter 9: Apply
### How do you read/manipulate/store data in Python?

# What You Learned in Python/Pandas that could Apply Here

You will need following knowledge from the first half of this course:
1. functions
2. subsetting/slicing data
3. Loops

# What You will Learn in this Chapter

You will learn following techniques in this chapter:
1. how to apply functions to columns, rows, or the whole DataFrame
2. Different use cases between `.apply()`, `.map()`, and `.applymap()`
3. `lambda` - the nameless, defintion-less functions

# Review of Functions

- Functions are __reusable__ code blocks 
    - where we group some statements together
- In `pandas`, we use functions a lot, particularly in the data preprocessing step
    - e.g., write a function to calculate some values, for consistency we want to use it to all applicable columns
- Functions can be categorized as _fruitful_ and _void_
    - here we mostly care about _fruitful_ functions

In [1]:
# example of a fruitful function
def avg_2(x, y = 10):
    return (x + y) / 2

avg_2(4)

7.0

# Why `.apply()`?

- when you want to use a function on a DataFrame, directly calling the function on it, or its columns will  actually work
    - but sometimes it does not work as we expected
- consider `.apply()` as `pandas` way of calling functions
    - note that you still have to define your function 

In [2]:
# an example
import pandas as pd

df1=pd.DataFrame({'a':[10,20,30],
                 'b':[20,30,40]})
df1

Unnamed: 0,a,b
0,10,20
1,20,30
2,30,40


In [3]:
# function def. - calculate square
def my_sq(x):
    return x ** 2

In [4]:
# let's try calling the function the normal way
my_sq(df1['a'])

0    100
1    400
2    900
Name: a, dtype: int64

In [5]:
# how about the whole DF?
my_sq(df1)

Unnamed: 0,a,b
0,100,400
1,400,900
2,900,1600


# What happened above?

- Looks like we can call the function (`my_sq()`) the normal way, and it does work on either a column or the whole DF
- Now why do we need `.apply()`?
    - we know functions can take arguments, maybe it does not work with arguments?
    - Look at the example below

In [6]:
# function def. - calculate square
def my_exp(x, e):
    return x ** e

In [7]:
my_exp(2, 3)

8

In [8]:
# it appears that taking parameters is not a problem
# let's come back to the 'why' part later
my_exp(df1['a'], 2)

0    100
1    400
2    900
Name: a, dtype: int64

# How `.apply()` works?

- `.apply()` is essentially a Series method 
    - which means natively we can _apply_ a function to a Series (column)
    - what `.apply()` does is that for every element in the series, the function is applied to it
        - and the results are returned as a Series of the same length

In [9]:
sq = df1['a'].apply(my_sq)
sq

0    100
1    400
2    900
Name: a, dtype: int64

In [10]:
cb = df1['a'].apply(my_exp, e=2)
cb

0    100
1    400
2    900
Name: a, dtype: int64

In [11]:
cb1 = []
for v in df1['a'].values:
    #print(v)
    cb1.append(my_exp(v, 2))
pd.Series(cb1)

0    100
1    400
2    900
dtype: int64

# What happened above?

- as you saw in these examples, `.apply()` works like with a `for` loop embedded
    - the function (e.g., `my_exp()`) is broadcasted to all the values in the Series (`df1['a']`)
    - and the return value is automatically converted to a `pandas.Series`
- this is how we avoid using `for` loops in `pandas`
    - as we said before, `for` loops are expensive, try avoiding them whenever you can
    - this is the first benefit of using `.apply()`

In [12]:
# we can do the same to a DF
# note that one different between `.apply()` and the regular function call is
# in `.apply()` you have to say explicitly what is the name of the argument (`e`)
df1.apply(my_exp, e=2)

Unnamed: 0,a,b
0,100,400
1,400,900
2,900,1600


In [13]:
# but if you try to apply a function with unmatched number of inputs
# it will raise an error - see this example

# this function takes three inputs
def avg_3(x, y, z):
    return (x + y + z) / 3

In [14]:
# when you apply the function to `df1` - since `df1` only has two columns
# this will raise an error
df1.apply(avg_3)

TypeError: ("avg_3() missing 2 required positional arguments: 'y' and 'z'", 'occurred at index a')

In [30]:
# consider the logic above - maybe we want to take the average of each column?
# we can rewrit the function like below
def avg_3_apply(col):
    x = col[0]
    y = col[1]
    z = col[2]
    return (x + y + z) / 3

In [31]:
# now it works
df1.apply(avg_3_apply)

a    20.0
b    30.0
dtype: float64

In [32]:
# Your Turn Here

#Explain why above code works.
# works because we have three rows and the defined avg_3_apply function is referencing three rows of each column a and b.  

# `.apply()` Works on Columns Natively

- Above example shows an important thing
    - do you want to apply the funtion to each column or each row
    - natively `.apply()` works on columns
    - but you can change that by adding an argument `axis=0` so it applies on _rows_

- note that in `pandas`, `axis=0` always refers to rows, and `axis=1` to columns

In [33]:
df1.apply(avg_3_apply, axis=1)

IndexError: ('index out of bounds', 'occurred at index 0')

In [34]:
df1.apply(avg_3_apply, axis=0)

a    20.0
b    30.0
dtype: float64

In [35]:
# another example
def avg_2_apply(row):
    x = row[0]
    y = row[1]
    return (x + y) / 2

In [36]:
df1.apply(avg_2_apply, axis=1)

0    15.0
1    25.0
2    35.0
dtype: float64

In [37]:
# another way of doing this - note that this is much more expensive than `.apply()`
for index, row in df1.iterrows(): # `.iterrows()` iterate through rows in a DF
    # print(index, row)
    # break
    # index is the index value of the row
    print(index, avg_2_apply(row))

0 15.0
1 25.0
2 35.0


# A More Complex Example of `.apply()`

- So far we have been playing with a very simple DF
- We actually use `.apply()` for more complicated use cases
    - e.g., testing the _missingness_ in a dataset

In [38]:
# load a dataset
# the `titanic` dataset is one of the most popular dataset in analytics
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [39]:
# in lecture 9, we had a way of calulating missingness
# count of missing values by column
titanic.isna().sum()

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

In [40]:
# we can also calculate the ratio of missing values
(titanic.isna().sum()/titanic.shape[0]).round(4) * 100

survived        0.00
pclass          0.00
sex             0.00
age            19.87
sibsp           0.00
parch           0.00
fare            0.00
embarked        0.22
class           0.00
who             0.00
adult_male      0.00
deck           77.22
embark_town     0.22
alive           0.00
alone           0.00
dtype: float64

In [41]:
# we use np.sum() since you can only apply functions not methods
# `.sum()` as we used above is a method
import numpy as np
def count_missing(col):
    """Counts the number of missing values in a column
    """
    null_col = pd.isna(col)
    null_count = np.sum(null_col)
    return null_count

In [42]:
cmis_col = titanic.apply(count_missing)
cmis_col

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

In [43]:
cmis_row = titanic.apply(count_missing, axis=1)
cmis_row

0      1
1      0
2      1
3      0
4      1
      ..
886    1
887    0
888    2
889    0
890    1
Length: 891, dtype: int64

# Your Turn Here

Please explain the results of the above code block.
#Similar to finding the missing value count in code block 23, we now instead apply a defined function to iterate through the titantic dataframe.  The defined function count missing to find missing values and then sum them per column. 

# Lambda Functions

- Regular Python functions require a definition, and a name
- Sometimes the function is so simple that it does not deserve a definition and a name
- we call them anonymous functions - which is __lambda__
    - _lambda_ has no name, and takes _arguments_ and specifies an _expression_ (usually an one-liner)
    - you do not have to specify the return value - it is automatic
- __lambda__ has a structure as following:

```python
lambda arguments: expression
```

In [44]:
# same as the my_exp function earlier
exp_lambda = lambda x, y: x ** y
exp_lambda(2, 3)

8

In [45]:
exp_lambda(3, 2)

9

In [46]:
exp_lambda(df1['a'], 2)

0    100
1    400
2    900
Name: a, dtype: int64

# When is the Best Time to use `lambda`?

- `lambda` is particularly useful when you deal with _lists_, _Series_, and _DataFrames_
    - In particular, when we need to transform a column in a DF
- the expression in `lambda` has to be simple enough
    - if the operation is complex, you can define it in a function, and use a lambda
    - if the operattion contains `if` statements or `for` loop, you should consider using a function rather than a `lambda`
- Being able to use `lambda` is the utmost benefit of using `.apply()`

In [47]:
# this is how you apply lambda to a column
df1['a'].apply(exp_lambda, y=2)

0    100
1    400
2    900
Name: a, dtype: int64

In [48]:
# an even easier way 
# you do not need any definition or function name
df1['a'].apply(lambda x: x**2)

0    100
1    400
2    900
Name: a, dtype: int64

In [49]:
# a complex function
def my_gender(x):
    if x == 'female':
        return 'f'
    else:
        return 'm'

In [50]:
# using lambda
titanic['sex'].apply(lambda x: my_gender(x)).head()

0    m
1    f
2    f
3    f
4    m
Name: sex, dtype: object

In [51]:
# equivalent of above
titanic['sex'].apply(my_gender).head()

0    m
1    f
2    f
3    f
4    m
Name: sex, dtype: object

# Other Ways to Use Functions in `pandas`

- `.map()` is another method
- difference between `.map()` and `.apply()` is that 
    - `.map()` can only work on a single Series (column)
    - `.apply()` can work on the whole DataFrame

In [52]:
df1.apply(lambda x: x**2)

Unnamed: 0,a,b
0,100,400
1,400,900
2,900,1600


In [53]:
# this will cause an error
df1.map(lambda x: x**2)

AttributeError: 'DataFrame' object has no attribute 'map'

# Other Ways to Use Functions in `pandas`

- since `.map()` has limited usabilty, only one column, it is not very useful
- but we have a hybrid method `.applymap()`
    - which is the combination of `.map()` nad `.apply()`
    - reason of using `.applymap()` is that it is much faster comparing to `.apply()`, and also works on the whole DF

In [54]:
df1.applymap(lambda x: x**2)

Unnamed: 0,a,b
0,100,400
1,400,900
2,900,1600


# Popular Use Cases of `.apply()` and `lambda`

- We use the combination of `.apply()` and `lambda` in `pandas` when we are dealing with these scenarios
    - creating a new column based on an existing column
    - filtering a DataFrame (selecting a subset of columns)
    - extracting data from a column

In [55]:
# let's reading a dataset as an example
# please change your PATH to `/srv/data/my_shared_data_folder/ba505-data/IMDB-Movie-Data.csv'`
imdb_data = pd.read_csv('/srv/data/my_shared_data_folder/ba505-data/IMDB-Movie-Data.csv')
#imdb_data = pd.read_csv('./data/IMD'B-Movie-Data.csv')
imdb_data.head(2)


Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0


In [56]:
# we can calculate the average rating of a movie
# by average the `Rating` and a tenth of the `Metascore`
imdb_data['AvgRating'] = (imdb_data['Rating'] + imdb_data['Metascore']/10)/2
imdb_data['AvgRating'].head()

0    7.85
1    6.75
2    6.75
3    6.55
4    5.10
Name: AvgRating, dtype: float64

In [57]:
# we can filter the DF by the values of a certain column
# say we want to filter the `imdb_data` by the `Title` column
# if the column contains more than 4 words then we select them

long_title_movie_data = imdb_data[imdb_data['Title'].apply(lambda x: len(x.split())>=4)]
long_title_movie_data.head(3)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,AvgRating
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0,7.85
8,9,The Lost City of Z,"Action,Adventure,Biography","A true-life drama, centering on British explor...",James Gray,"Charlie Hunnam, Robert Pattinson, Sienna Mille...",2016,141,7.1,7188,8.01,78.0,7.45
10,11,Fantastic Beasts and Where to Find Them,"Adventure,Family,Fantasy",The adventures of writer Newt Scamander in New...,David Yates,"Eddie Redmayne, Katherine Waterston, Alison Su...",2016,133,7.5,232072,234.02,66.0,7.05


In [58]:
name_df = pd.DataFrame(data = ['Braund, Mr. Owen Harris',
 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
 'Heikkinen, Miss. Laina',
 'Futrelle, Mrs. Jacques Heath (Lily May Peel)',
 'Allen, Mr. William Henry',
 'Moran, Mr. James',
 'McCarthy, Mr. Timothy J',
 'Palsson, Master. Gosta Leonard',
 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)',
 'Nasser, Mrs. Nicholas (Adele Achem)'], columns = ['Name'] )

#Take a look at the Data 
name_df.head(3)

Unnamed: 0,Name
0,"Braund, Mr. Owen Harris"
1,"Cumings, Mrs. John Bradley (Florence Briggs Th..."
2,"Heikkinen, Miss. Laina"


In [59]:
# we observe the the title is always after the comma (`,`)
# and separate from the first name by a period (`.`)
# following code does the trick
name_df['Title'] = name_df['Name'].apply(lambda x: x.split(" ")[1].replace(".", ""))
name_df.head(3)

Unnamed: 0,Name,Title
0,"Braund, Mr. Owen Harris",Mr
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Mrs
2,"Heikkinen, Miss. Laina",Miss


# Your Turn Here
Finish exercises below by following instructions of each of them.

## Q1. Coding Problem

Complete excecises regarding data types of the given DataFrame (`itinery_df`).

In [60]:
##### import random
import random
import pandas as pd
# generating the DF
duration_mins = pd.Series(random.sample(range(1, 1800), 20), name='duration_mins')
work_types = ['lecture', 'consulting', 'research']
work_type_series = pd.Series(random.choices(work_types, k=20), name='work_types')
locations = ['Beijing, China', 'London, England', 'Paris, France', 'Munich, Germany', 
             'Sydney, Australia', 'Mumbai, India', 'Madrid, Spain']
loc_series = pd.Series(random.choices(locations, k=20), name='locations')
hour_rates = pd.Series([round(random.uniform(10.0, 20.0), 2) for i in range(20)], name='hour_rates')
hour_rates.loc[random.sample(range(1, 20), 5)] = 'missing'
duration_mins.loc[random.sample(range(1, 20), 5)] = 'missing'
itinery_df = pd.concat([duration_mins, work_type_series, loc_series, hour_rates], axis=1)
#itinery_df['duration_mins'] = itinery_df['duration_mins'].astype(str)
itinery_df.head()

Unnamed: 0,duration_mins,work_types,locations,hour_rates
0,169,consulting,"Beijing, China",17.86
1,964,research,"Beijing, China",17.81
2,missing,consulting,"Madrid, Spain",missing
3,missing,lecture,"Beijing, China",15.8
4,1739,lecture,"Munich, Germany",19.18


## Part 1:

Use `.apply()` and `lambda` to create a new column `duration_hrs` by converting `duration_mins` to hours (divide by `60`).
- make sure you handle all `'missing'` values in `duration_mins` - use the average of the column to replace missing values.

In [61]:
#We have 20 rows and 4 columns.  All object type. 
itinery_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 4 columns):
duration_mins    20 non-null object
work_types       20 non-null object
locations        20 non-null object
hour_rates       20 non-null object
dtypes: object(4)
memory usage: 768.0+ bytes


In [62]:
#Check columns contain "missing"
#https://stackoverflow.com/questions/54508137/check-if-pandas-dataframe-cell-contains-certain-string
#checking column duration_mins
#contains 5 missing values 
itinery_df['duration_mins'].str.contains('missing').sum()

5

In [63]:
#checking column work_types for 'missing'
#does not contain a missing value
itinery_df['work_types'].str.contains('missing').sum()

0

In [64]:
#check column locations 
#does not contain missing value
itinery_df['locations'].str.contains('missing').sum()

0

In [65]:
#check if hour rates contains missing value 
#contains 5 missing values
itinery_df['hour_rates'].str.contains('missing').sum()

5

In [66]:
#convert object to integer for missing fields
itinery_df['duration_mins']=pd.to_numeric(itinery_df['duration_mins'], errors = 'coerce')
itinery_df['hour_rates']=pd.to_numeric(itinery_df['hour_rates'], errors = 'coerce')
itinery_df

Unnamed: 0,duration_mins,work_types,locations,hour_rates
0,169.0,consulting,"Beijing, China",17.86
1,964.0,research,"Beijing, China",17.81
2,,consulting,"Madrid, Spain",
3,,lecture,"Beijing, China",15.8
4,1739.0,lecture,"Munich, Germany",19.18
5,,research,"Mumbai, India",
6,1371.0,lecture,"Paris, France",16.03
7,,research,"Sydney, Australia",15.77
8,1244.0,research,"Mumbai, India",10.74
9,681.0,consulting,"Mumbai, India",18.39


In [68]:
#Fill missing values with mean for duration_mins and hour_rates
itinery_df['duration_mins'] = itinery_df['duration_mins'].fillna(itinery_df['duration_mins'].mean())
itinery_df['hour_rates'] = itinery_df['hour_rates'].fillna(itinery_df['hour_rates'].mean())
itinery_df

Unnamed: 0,duration_mins,work_types,locations,hour_rates
0,169.0,consulting,"Beijing, China",17.86
1,964.0,research,"Beijing, China",17.81
2,940.133333,consulting,"Madrid, Spain",15.034667
3,940.133333,lecture,"Beijing, China",15.8
4,1739.0,lecture,"Munich, Germany",19.18
5,940.133333,research,"Mumbai, India",15.034667
6,1371.0,lecture,"Paris, France",16.03
7,940.133333,research,"Sydney, Australia",15.77
8,1244.0,research,"Mumbai, India",10.74
9,681.0,consulting,"Mumbai, India",18.39


In [69]:
#convert minutes to hours for durations_mins: assign to new column duration_hrs
#use lamba cause complex operation, axis 1 for new column
itinery_df['duration_hrs'] = itinery_df.apply(lambda x: (x['duration_mins']/60), axis =1)
itinery_df.head()

Unnamed: 0,duration_mins,work_types,locations,hour_rates,duration_hrs
0,169.0,consulting,"Beijing, China",17.86,2.816667
1,964.0,research,"Beijing, China",17.81,16.066667
2,940.133333,consulting,"Madrid, Spain",15.034667,15.668889
3,940.133333,lecture,"Beijing, China",15.8,15.668889
4,1739.0,lecture,"Munich, Germany",19.18,28.983333


In [70]:
## Part 2:

Use `.apply()` and `lambda` to create two new columns `cities` and `countries`.

- `cities` refer to the first part in `locations` - before the `,`
- `countries` refer to the second part in `locations`
- note that there is a space after `,` that you need to remove

SyntaxError: invalid syntax (<ipython-input-70-df4e33fd87a2>, line 3)

In [71]:
#create two new columns cities and countries
#cities refer to the first part in locations - before the ,
#countries refer to the second part in locations
#example 
#long_title_movie_data = imdb_data[imdb_data['Title'].apply(lambda x: len(x.split())>=4)]
#long_title_movie_data.head(3)

itinery_df[['cities', 'countries']] = itinery_df['locations'].str.split(',', expand = True). apply(lambda x:x.str.strip())
itinery_df.head()

Unnamed: 0,duration_mins,work_types,locations,hour_rates,duration_hrs,cities,countries
0,169.0,consulting,"Beijing, China",17.86,2.816667,Beijing,China
1,964.0,research,"Beijing, China",17.81,16.066667,Beijing,China
2,940.133333,consulting,"Madrid, Spain",15.034667,15.668889,Madrid,Spain
3,940.133333,lecture,"Beijing, China",15.8,15.668889,Beijing,China
4,1739.0,lecture,"Munich, Germany",19.18,28.983333,Munich,Germany


## Part 3:

Use `.apply()` and `lambda` to create a column `work_load` using the following logic:

```python
if duration_hrs >= 20:
    # 'full_time'
else:
    # 'part_time'
```

In [72]:
#create new column work_load to define part time and full time 
itinery_df['work_load'] = itinery_df.apply(lambda x: 'full_time' if x ['duration_hrs'] >= 20 else 'part_time', axis =1)
itinery_df.head()

Unnamed: 0,duration_mins,work_types,locations,hour_rates,duration_hrs,cities,countries,work_load
0,169.0,consulting,"Beijing, China",17.86,2.816667,Beijing,China,part_time
1,964.0,research,"Beijing, China",17.81,16.066667,Beijing,China,part_time
2,940.133333,consulting,"Madrid, Spain",15.034667,15.668889,Madrid,Spain,part_time
3,940.133333,lecture,"Beijing, China",15.8,15.668889,Beijing,China,part_time
4,1739.0,lecture,"Munich, Germany",19.18,28.983333,Munich,Germany,full_time


##  Part 4:

Use `.apply()` and `lambda` to calculate the total payment for each row, $ payment_{total} = duration_hr \times hour\_rate $.

In order to do that, you need to:
1. verify the `duration_hrs` and `hour_rates` are in the numerical (float) type.
2. handle all `'missing'` values in the `hour_rates` column - use the average of the column to replace missing values.
3. create a new column namely `payments`, then put the calculation results in it.

In [73]:
#verify the duration_hrs and hour_rates are in the numerical (float) type
itinery_df.dtypes

duration_mins    float64
work_types        object
locations         object
hour_rates       float64
duration_hrs     float64
cities            object
countries         object
work_load         object
dtype: object

In [75]:
itinery_df['payments'] = itinery_df.apply(lambda x: (x['duration_hrs'] * x['hour_rates']), axis=1)
itinery_df.head()

Unnamed: 0,duration_mins,work_types,locations,hour_rates,duration_hrs,cities,countries,work_load,payments
0,169.0,consulting,"Beijing, China",17.86,2.816667,Beijing,China,part_time,50.305667
1,964.0,research,"Beijing, China",17.81,16.066667,Beijing,China,part_time,286.147333
2,940.133333,consulting,"Madrid, Spain",15.034667,15.668889,Madrid,Spain,part_time,235.576521
3,940.133333,lecture,"Beijing, China",15.8,15.668889,Beijing,China,part_time,247.568444
4,1739.0,lecture,"Munich, Germany",19.18,28.983333,Munich,Germany,full_time,555.900333


## Part 5:

Create a new column `final_pay` using the following logic (note that `work_load`, `payments` and `final_pay` are column names):

```python

if work_load == 'full_time':
    final_pay = payment * 1.05
elif work_load == 'part_time':
    final_pay = payment * 0.95
```

In [95]:
def final_pay(x,y):
    if x == 'full_time':
        return y*1.05
    elif x == 'part_time':
        return y*0.95

In [96]:
payment_calc('part_time', 15) #test defined function

14.25

In [99]:
#Create a new column final_pay using the two columns of work_load and payments
y= itinery_df['payments']
itinery_df['final_pay'] = itinery_df.apply(lambda x: final_pay() ,axis=1)
itinery_df

TypeError: ("final_pay() missing 2 required positional arguments: 'x' and 'y'", 'occurred at index 0')

# Classwork (start here in class)
You can start working on them right now:
- Read Chapter 9 in PD4E 
- If time permits, start in on your homework. 
- Ask questions when you need help. Use this time to get help from the professor!

# Homework (do at home)
The following is due before class next week:
  - Any remaining classwork from tonight
  - DataCamp “Speed efficient methods for iterating through a DataFrame” assignment

Note: All work on DataCamp is logged. Don't try to fake it!

Please email [me](mailto:jtao@fairfield.edu) if you have any problems or questions.

![DSB logo](img/Dolan.jpg)
# Apply Functions to Your DataFrame

## PD4E Chapter 9: Apply
### How do you read/manipulate/store data in Python?