# Apply Functions 

The `.apply()` method takes a function and applies it across each row or column of a `DataFrame` without having to write the code for each element separately. This is similar to a `map()` function. 

---

## Apply (Basics)


In [23]:
import pandas as pd

df = pd.DataFrame({
    "a": [10, 20, 30], 
    "b": [20, 30, 40]
})

df

Unnamed: 0,a,b
0,10,20
1,20,30
2,30,40


In [24]:
# use a function directly 
df['a'] ** 2

0    100
1    400
2    900
Name: a, dtype: int64

### Apply Over a Series 

We can `.apply()` our functions over a `Series`.

In [25]:
def my_square(x):
    return x ** 2

In [26]:
# apply my_square over a Series 
sqr = df['a'].apply(my_square)
sqr

0    100
1    400
2    900
Name: a, dtype: int64

In [27]:
def my_exp(x, e):
    return x ** e

In [28]:
# pass in the function arguments to the function being applied 
ex = df['a'].apply(my_exp, e=10)
ex

0        10000000000
1     10240000000000
2    590490000000000
Name: a, dtype: int64

### Apply Over a DataFrame 

The syntax for applying a function over more than one dimension changes slightly.

When applying a function over a dataframe, we'll first need to know which axis to apply the function over (e.g. column-by-column or row-by-row).

In [29]:
def print_me(x):
    print(x)

#### Column-Wise Operations

The `axis=0` parameter specifies that we want functions to execute in a column-wise manner.

In [30]:
df.apply(print_me, axis=0)

0    10
1    20
2    30
Name: a, dtype: int64
0    20
1    30
2    40
Name: b, dtype: int64


a    None
b    None
dtype: object

In [31]:
def avg_3(x, y, z):
    return (x + y + z) / 3

In [32]:
# won't work, we pass the ENTIRE column as an argument, so we'll need to parse it
# df.apply(avg_3, axis=0)

In [33]:
def avg_3_apply(col):
    x = col[0]
    y = col[1]
    z = col[2]
    return (x+y+z) / 3

In [34]:
df.apply(avg_3_apply, axis=0)

a    20.0
b    30.0
dtype: float64

#### Row-wise Operations 

Very similar to column-wise but instead we pass the `axis=1` parameter.

In [35]:
def avg_2(row):
    # x = row[0]
    # y = row[1]
    x = row.iloc[0]
    y = row.iloc[1]
    return (x + y) / 2

In [36]:
df.apply(avg_2, axis=1)

0    15.0
1    25.0
2    35.0
dtype: float64

In [37]:
df

Unnamed: 0,a,b
0,10,20
1,20,30
2,30,40


--- 

## Vectorized Functions 

Some numerical function are inherently vectorized but at times we may want to use functions that can't be automatically vectorized.

In [38]:
import numpy as np

In [39]:
def avg_2_mod(x, y):
    if (x == 20):
        return (np.NaN)
    else:
        return (x+y) /2

### Vectorize with NumPy

If we run the above function with two number as arguments, it will run; however attempting to use a vector of numbers will result in an error. 

What we need to do is explicitly vectorize the function. Recall from numerical analysis days, we can use NumPy's `vectorize()` function to create a vectorized version of our function. Then we can apply that one.

In [40]:
avg_2_mod_vec = np.vectorize(avg_2_mod)

print(avg_2_mod_vec(df['a'], df['b']))

[15. nan 35.]


Apparently we can use a decorator for this instead which is useful if we're the ones writing the function. The previous method is good for if oyu don't have the source code.

In [43]:
# vectorize using a decorator
@np.vectorize
def v_avg_2_mod(x, y):
    if (x == 20):
        return (np.NaN)
    else:
        return (x + y) / 2

In [42]:
v_avg_2_mod(df['a'], df['b'])

array([15., nan, 35.])

### Vectorize with Numba

Numba is designed to optimize Python code, particularly calculations on arrays performing mathematical calculations. 

In [44]:
import numba 

# vectorize using a decorator
@ numba.vectorize
def v_avg_2_mod(x, y):
    if (x == 20):
        return (np.NaN)
    else:
        return (x + y) / 2

Now, Numba doesn't understand Pandas objects, so we have to pass in the NumPy array representation of our data using the `.values` attribute of the `Series` object.

--- 

## Lambda Functions 

We can always use a lambda function in place of defining a function when it comes to providing a function to `.apply()`.

In [45]:
# add a column using a lambda function to compute
df['a_squared'] = df['a'].apply(lambda x: x ** 2)
df

Unnamed: 0,a,b,a_squared
0,10,20,100
1,20,30,400
2,30,40,900
