# How do I apply a function to a pandas Series or DataFrame?

🐼 Tuto on pandas by Data School - Exercice performed by Dorian.H Mekni 🥇 | Frid 8 Jan 2021

⭐️ Get to know when to use map, apply, and applymap. 

In [1]:
import pandas as pd

In [2]:
# importing a dataset : 
train = pd.read_csv('http://bit.ly/kaggletrain')

In [3]:
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S



⭐️ map is a Series method. In the event we need to create a dummy variable for sex, 1 and 0, we will use map in this case that allow us to map values of a Series to a different set of values. 

In [4]:
# Translate female into 0, and men into 1 : 
train['Sex_num'] = train.Sex.map({'female':0, 'male':1})

In [5]:
# Checking the first 4 rows of our new Series :
train.loc[0:4, ['Sex', 'Sex_num']]


Unnamed: 0,Sex,Sex_num
0,male,1
1,female,0
2,female,0
3,female,0
4,male,1



✅ Each sex have been translated into binary code as targeted. 



⭐️ apply is both a Series and a dataframe method. apply applies a function to each element within a Series. 


In [6]:
# Calculate the lenght of each string within the Name Series and create a new column called name lenght : 
train['Name_length'] = train.Name.apply(len)

In [7]:
# Comparing both Series next to each other : 
train.loc[0:4, ['Name', 'Name_length']]

Unnamed: 0,Name,Name_length
0,"Braund, Mr. Owen Harris",23
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",51
2,"Heikkinen, Miss. Laina",22
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",44
4,"Allen, Mr. William Henry",24



🧐 It is also known to use apply with a numpy function. 


In [8]:
import numpy as np 

In [9]:
# Using the ceil numpy function with apply : 
train['Fare_ceil'] = train.Fare.apply(np.ceil)

In [11]:
# Checking if values are rounded up : 
train.loc[0:4, ['Fare', 'Fare_ceil']]

Unnamed: 0,Fare,Fare_ceil
0,7.25,8.0
1,71.2833,72.0
2,7.925,8.0
3,53.1,54.0
4,8.05,9.0



⭐️ We will now use apply to extract the last name of each person. To do so, we must select the bit before the comma. 


In [12]:
# A string method can be applied to this task : 
train.Name.str.split(',').head()

0                           [Braund,  Mr. Owen Harris]
1    [Cumings,  Mrs. John Bradley (Florence Briggs ...
2                            [Heikkinen,  Miss. Laina]
3      [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
4                          [Allen,  Mr. William Henry]
Name: Name, dtype: object

In [13]:
# Let's now extract the 1st list element from each Series element creating a function for that purpose : 
def get_element(my_list, position): 
    return my_list[position]

In [17]:
# Now applying : 
train.Name.str.split(',').apply(get_element, position=0).head()

0       Braund
1      Cumings
2    Heikkinen
3     Futrelle
4        Allen
Name: Name, dtype: object


✅ Extraction successful. To get the middle name or last name you will simply have to adjust the position of the get_element function.



☝🏻You can obtain the same result without the writting of your own function : 
    

In [18]:
# Using lambda function : 
train.Name.str.split(',').apply(lambda x: x[0]).head()

0       Braund
1      Cumings
2    Heikkinen
3     Futrelle
4        Allen
Name: Name, dtype: object


🧐 lambda functions are used extensively with apply function. 



⭐️ Let's now study the use of apply method within a dataframe context : 


In [20]:
# Reading a new dataset : 
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa



🧐 In a dataframe context, the apply method apply a function along either axis of a dataframe. 


In [25]:
# Using a subset of drinks : 
drinks.loc[:, 'beer_servings':'wine_servings'].head()

Unnamed: 0,beer_servings,spirit_servings,wine_servings
0,0,0,0
1,89,132,54
2,25,0,14
3,245,138,312
4,217,57,45


In [35]:
# Using apply method to travel on axis from top to bottom apply the python max func to it : 
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=0)

beer_servings      376.0
spirit_servings    438.0
wine_servings      370.0
dtype: float64

In [26]:
# Let's change it to axis = 1 for experimention purposes : 
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=1)

0        0
1      132
2       25
3      312
4      217
      ... 
188    333
189    111
190      6
191     32
192     64
Length: 193, dtype: int64


☝🏻 That gives us the max value in each row because axis= 1 goes from left to right. 


In [31]:
# Knowing which column is the maximum per row ? 
drinks.loc[:, 'beer_servings':'wine_servings'].apply(np.argmax, axis=1)

0        beer_servings
1      spirit_servings
2        beer_servings
3        wine_servings
4        beer_servings
            ...       
188      beer_servings
189      beer_servings
190      beer_servings
191      beer_servings
192      beer_servings
Length: 193, dtype: object


⭐️ Now focusing on applymap that is a dataframe method. applymap applies a function to every element of a dataframe. 



In [33]:
# Applying float to every elements of a targeted dataframe ; 
drinks.loc[:, 'beer_servings':'wine_servings'].applymap(float)

Unnamed: 0,beer_servings,spirit_servings,wine_servings
0,0.0,0.0,0.0
1,89.0,132.0,54.0
2,25.0,0.0,14.0
3,245.0,138.0,312.0
4,217.0,57.0,45.0
...,...,...,...
188,333.0,100.0,3.0
189,111.0,2.0,1.0
190,6.0,0.0,0.0
191,32.0,19.0,4.0



🧐 This method can therefore be used to overwrite the existing elements of a dataframe. 


In [34]:
# Chaning the dataframe elements from integers to floating points : 
drinks.loc[:, 'beer_servings':'wine_servings'] = drinks.loc[:, 'beer_servings':'wine_servings'].applymap(float)
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0.0,0.0,0.0,0.0,Asia
1,Albania,89.0,132.0,54.0,4.9,Europe
2,Algeria,25.0,0.0,14.0,0.7,Africa
3,Andorra,245.0,138.0,312.0,12.4,Europe
4,Angola,217.0,57.0,45.0,5.9,Africa



🙏🏻 Thank you !

👋🏻 See you in the next one !