In [None]:
import numpy as np
import pandas as pd

### Apply a function along an axis of the DataFrame

> [**Reference**] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

> `df.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)`

- **Recall**: 파이썬 함수 만들기

```
def my_func(x, y):
    pass
my_func(1, 2)
    
def my_square(x):
    return x ** 2
my_square(2)
my_square(4)

assert my_square(4) == 16
assert my_square(4) == 15 # AssertionError

avg_2 = lambda x, y: (x + y) / 2
avg_2(10, 20)
```

```
df = pd.DataFrame({
    'a': [10, 20, 30],
    'b': [20, 30, 40]
})
df

df['a'] ** 2
df ** 2
```

```
my_square
df['a'].apply(my_square)
```

```
def my_exp(x, e):
    return x ** e
my_exp(2, 10)

df['a'].apply(my_exp, e=4)
```

- `axis=0`: apply function to each column (**default**)
- `axis=1`: apply function to each row

```
def print_me(x):
    print(x)
    return x.sum()
df

df.apply(print_me)
df.apply(print_me, axis=1)
```

```
avg_3 = lambda x, y, z: (x + y + z) / 3
df.apply(avg_3)

avg_3_apply = lambda col: np.mean(col)
df.apply(avg_3_apply)

def avg_3_apply(col):
    x = col[0]
    y = col[1]
    z = col[2]
    return (x + y + z) / 3
df.apply(avg_3_apply)
```

```
def avg_3_apply(col):
    x = col[0]
    y = col[1]
    z = col[2]
    return (x + y + z) / 3
df.apply(avg_3_apply)

df.apply(avg_3_apply, axis=1) # IndexError
```

```
df['a'].mean()
df['a'] + df['b']
```

```
def avg_2_mod(x, y):
    if (x == 20):
        return np.NaN # or np.NAN or np.nan
    else:
        return (x + y) / 2
df
avg_2_mod(df['a'], df['b']) # ValueError
```

> [**Reference**] https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

> - The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

```
avg_2_mod_vec = np.vectorize(avg_2_mod)
df
avg_2_mod_vec(df['a'], df['b'])
```

```
@np.vectorize
def avg_2_mod(x, y):
    if (x == 20):
        return np.NaN # or np.NAN or np.nan
    else:
        return (x + y) / 2
avg_2_mod(df['a'], df['b'])
```

### Numba

- Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. 
- https://numba.pydata.org

```
import numba

@numba.vectorize
def avg_2_mod_numba(x, y):
    if (x == 20):
        return np.NaN # or np.NAN or np.nan
    else:
        return (x + y) / 2
avg_2_mod_numba(df['a'].values, df['b'].values)
```

In [None]:
%%timeit
avg_2(df['a'], df['b'])

In [None]:
%%timeit
avg_2_mod(df['a'], df['b'])

In [None]:
%%timeit
avg_2_mod_numba(df['a'].to_numpy(), df['b'].to_numpy())

In [None]:
%%timeit
avg_2_mod_numba(df['a'].values, df['b'].values)