# Chapter 09 Apply Functions

Pandas for Everyone. See the author's [github page](https://github.com/chendaniely/pandas_for_everyone)

In [1]:
import pandas as pd
import numpy as np

## Apply Function Over a Series

Just like mapping a function over an iterator, we can apply a function over a series.

In [2]:
s = pd.Series(range(5))
s

0    0
1    1
2    2
3    3
4    4
dtype: int64

Apply a function y = x + 1 over it

In [3]:
s.apply(lambda x: x + 1) # a new series created

0    1
1    2
2    3
3    4
4    5
dtype: int64

### Apply With Multiple Arguments

What if the function has multiple arguments and all others except the one from the series are known? We can:

1. Use a partial function;
2. Use the key word argument in apply().

Now let's demonstrate (2)

In [4]:
power = lambda x, e: x ** e

In [5]:
s.apply(power, e=2)

0     0
1     1
2     4
3     9
4    16
dtype: int64

#### NOTE:
The argument unknown must be the first argument. The below does not work.

In [6]:
# power2 = lambda e, x: x ** e
# s.apply(power2, e=2)

## Apply Function Over a DataFrame

In [16]:
df = pd.DataFrame( [[33, 45], [29, 60], [35, 43]]
                 , columns=['Temperature', 'Humidity']
                 , index=['2020-06-20', '2020-06-21', '2020-06-22']
                 )
df

Unnamed: 0,Temperature,Humidity
2020-06-20,33,45
2020-06-21,29,60
2020-06-22,35,43


In [17]:
def show(x):
    print(x)
    
df.apply(show)

2020-06-20    33
2020-06-21    29
2020-06-22    35
Name: Temperature, dtype: int64
2020-06-20    45
2020-06-21    60
2020-06-22    43
Name: Humidity, dtype: int64


Temperature    None
Humidity       None
dtype: object

The above shows that when we call apply() over a dataframe, it actaully passes a column (series) to the function, one at a time, over all columns.

Therefore, the function needs to take a series as input. Say we want to calculate average tempearture and humidity, we can:

In [18]:
df.apply(lambda v: v.mean())

Temperature    32.333333
Humidity       49.333333
dtype: float64

### Apply Over Rows

By default a function will be applied columnwise, but we can apply a function over all the rows, by specifying *axis=1*

In [19]:
weatherCondition = lambda row: \
    'Good' if row['Temperature'] < 35 and row['Humidity'] < 50 else 'Bad'

df.apply(weatherCondition, axis=1)

2020-06-20    Good
2020-06-21     Bad
2020-06-22     Bad
dtype: object

## Vectorized Functions

Suppose we have a function f(x, y) that takes in two parameters x and y and produces a result. What if we want to take in two vectors of x(s) and y(s) to computer a vector of results? We can use vectorized functions through a decorator.

If we rewrite the weatherCondition function using a vectorize approach, we can:

In [20]:
@np.vectorize
def weatherCon2(temperature, humidity):
    return 'Good' if temperature < 35 and humidity < 50 else 'Bad'


result = weatherCon2(df['Temperature'], df['Humidity'])
result

array(['Good', 'Bad', 'Bad'], dtype='<U4')

In [21]:
type(result)

numpy.ndarray

In [24]:
pd.concat([df, pd.DataFrame({'Weather': result}, index=df.index)], axis=1)

Unnamed: 0,Temperature,Humidity,Weather
2020-06-20,33,45,Good
2020-06-21,29,60,Bad
2020-06-22,35,43,Bad


Put it all together, add a column of weather.

In [25]:
def addWeather(df):
    return pd.concat( [ df
                      , pd.DataFrame( {'Weather': weatherCon2(df['Temperature'], df['Humidity'])}
                                    , index=df.index)
                      ]
                    , axis=1)

addWeather(df)

Unnamed: 0,Temperature,Humidity,Weather
2020-06-20,33,45,Good
2020-06-21,29,60,Bad
2020-06-22,35,43,Bad
