# Using Apply and aaplymap on dataframes


<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:blue">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:white">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:black">

In [1]:
import pandas as pd
import numpy as np


In [2]:
data_source = 'vgsales.csv'
my_data = pd.read_csv(data_source)

my_data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:blue">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:white">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:black">

In [3]:
# Apply lets us use a function on whole rows or columns of data, map lets us use functions on whole data sets



# The most common way to change values is using lambda 
#lambda is way of writing functions in 1 line x is the input and the result comes after the colon
my_data['Name'] = my_data['Name'].apply(lambda x : x.upper())

my_data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,WII SPORTS,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,SUPER MARIO BROS.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,MARIO KART WII,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,WII SPORTS RESORT,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,POKEMON RED/POKEMON BLUE,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [4]:
#for demonstrative purposes lets use Lambda to undo our previous work and make everything lower case
my_data['Name'] = my_data['Name'].apply(lambda x : x.lower())
my_data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,wii sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,super mario bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,mario kart wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,wii sports resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,pokemon red/pokemon blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [None]:
# Activity: See if you can write a Lambda function to Capitalise the first letters of every word

<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:blue">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:white">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:black">

In [5]:
# You can also use normal functions in apply, first we write a function

def capitaliser(word):
    if type(word)==str:
        word = word.lower()
        word = word.upper()
  
        
    return word

# Now to use apply with a normal function you put the function in the bracket, 
# the input will be given as the value in your dataframe.

my_data['Name'] = my_data['Name'].apply(capitaliser)

my_data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,WII SPORTS,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,SUPER MARIO BROS.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,MARIO KART WII,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,WII SPORTS RESORT,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,POKEMON RED/POKEMON BLUE,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [6]:
# to apply to a whole data set we have to take in to account the datatypes contained.
# capitalise 2 works by only excecuting the code on data points witht correct type. Without this we would get errors.

def capitalise2(word):
    if type(word)==str:
        word = word.lower()
        word = word.upper()
  
        
    return word

#note we could accomplish the same here using a try except loop, this would allow us to a bit more general and potentially
# allow our function to act on other data types if lower() and upper() methods apply.


def capitalise3(word):
    try:
        word = word.lower()
        word = word.upper()
    except:
        word = word
    
    return word

In [9]:
# applymap allows you to apply a function across the entire data set.


# to save us from having to undo our work we save our editted dataframe to a new variable

md2 = my_data.applymap(capitalise2).head()
md2.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,WII SPORTS,WII,2006.0,SPORTS,NINTENDO,41.49,29.02,3.77,8.46,82.74
1,2,SUPER MARIO BROS.,NES,1985.0,PLATFORM,NINTENDO,29.08,3.58,6.81,0.77,40.24
2,3,MARIO KART WII,WII,2008.0,RACING,NINTENDO,15.85,12.88,3.79,3.31,35.82
3,4,WII SPORTS RESORT,WII,2009.0,SPORTS,NINTENDO,15.75,11.01,3.28,2.96,33.0
4,5,POKEMON RED/POKEMON BLUE,GB,1996.0,ROLE-PLAYING,NINTENDO,11.27,8.89,10.22,1.0,31.37


In [10]:
# apply map is a better option that allows us to apply a function to the whole data set
md3 = my_data.applymap(capitalise3).head()

md3.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,WII SPORTS,WII,2006.0,SPORTS,NINTENDO,41.49,29.02,3.77,8.46,82.74
1,2,SUPER MARIO BROS.,NES,1985.0,PLATFORM,NINTENDO,29.08,3.58,6.81,0.77,40.24
2,3,MARIO KART WII,WII,2008.0,RACING,NINTENDO,15.85,12.88,3.79,3.31,35.82
3,4,WII SPORTS RESORT,WII,2009.0,SPORTS,NINTENDO,15.75,11.01,3.28,2.96,33.0
4,5,POKEMON RED/POKEMON BLUE,GB,1996.0,ROLE-PLAYING,NINTENDO,11.27,8.89,10.22,1.0,31.37


<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:blue">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:white">
<hr style="width:100%;height:10px;border-width:0;color:gray;background-color:black">

## Fin

Now you can alter your datasets using lambda and purpose built functions, this opens up a while world of efficient datacleansing and processing. Coders often write a master function called "cleaner" or "preprocessor" that contains a whole stack of functions that they use to clean whole data sets with a single command, try building one for yourself that you can use on other projects.

### * There is a cool function very similar to apply and applymap that works on lists called "map" see if you can find documentation and apply it to a list yourself. Can you make the equivalent using a for loop or list comprehension?

In [11]:
my_list = ['Yo','Ya',"arrrrr",5,4,9]

my_new_list = list(map(capitalise3,my_list))
my_new_list

['YO', 'YA', 'ARRRRR', 5, 4, 9]