# Pandas Apply Functions

Sometimes, pre-built Pandas functions are not enough. Fortunately, Pandas allows its users to apply their own functions to Pandas objects. To apply these functions to Pandas objects, the appropriate method to use depends on whether your function expects to operate on an entire `DataFrame` or `Series`, or row- or column-wise.
- table-wise function application: `pipe()`
- row or column-wise function application: `apply()`

In [4]:
import pandas as pd
import numpy as np

## Table-wise Function Application

`DataFrames` and `Series` can be passed into functions.

In [5]:
# as the function name implies, this extracts the city name
def extract_city_name(df):
    """
    Chicago, IL -> Chicago for city_name column
    """
    df['city_name'] = df['city_and_code'].str.split(",").str.get(0)
    return df

In [13]:
# adds a country name to the specified dataframe
# so long as it has a city_name column
def add_country_name(df, country_name=None):
    """
    Chicago -> Chicago-US for city_name column
    """
    col = 'city_name'
    df['city_and_country'] = df[col] + f', {country_name}'
    return df

In [14]:
# create an example dataframe to work with
df_p = pd.DataFrame({'city_and_code': ['Chicago, IL']})

In [15]:
add_country_name(
    extract_city_name(df_p),
    country_name='US')

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,"Chicago, US"


Pandas encourages the use of `pipe()` for the problem above, which is known as **method chaining**. `pipe()` makes it easy to use your own or another library's functions in method chains, alongside Pandas' methods.

In [16]:
# df_p is `piped` into the function extract_city_name,
# then the result of that function is `piped` again into the function
# add_country_name with an additional parameter country_name
(df_p.pipe(extract_city_name)
 .pipe(add_country_name, country_name="US"))

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,"Chicago, US"


# Row or Column-wise Function Application

Arbitrary functions can be appllied along the axes of a `DataFrame` using the `apply()` method, which, like the descriptive statistics methods, takes an optional axis argument.

In [19]:
df = pd.DataFrame({
    'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
    'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
    'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])
})

df

Unnamed: 0,one,two,three
a,1.734365,-1.989212,
b,-0.587418,1.22772,-0.665805
c,-0.992983,-1.863234,-0.256519
d,,-1.082675,0.572919


In [18]:
# pre-built numpy function, mean
df.apply(np.mean)

one      0.350208
two     -0.024786
three   -0.393825
dtype: float64

In [20]:
# pre-built numpy function to rows, mean
df.apply(np.mean, axis=1)

a   -0.127424
b   -0.008501
c   -1.037579
d   -0.254878
dtype: float64

In [21]:
# lambda function
df.apply(lambda x: x.max() - x.min())

one      2.727348
two      3.216932
three    1.238724
dtype: float64

In [22]:
# pre-built numpy function, cumulative sum
df.apply(np.cumsum)

Unnamed: 0,one,two,three
a,1.734365,-1.989212,
b,1.146947,-0.761493,-0.665805
c,0.153963,-2.624727,-0.922324
d,,-3.707402,-0.349405


In [23]:
# pre-built numpy function, exponential
df.apply(np.exp)

Unnamed: 0,one,two,three
a,5.665327,0.136803,
b,0.55576,3.413437,0.51386
c,0.37047,0.15517,0.77374
d,,0.338688,1.773436


In [25]:
def my_function(x):
    return x*x

df.apply(my_function)

Unnamed: 0,one,two,three
a,3.008021,3.956966,
b,0.34506,1.507295,0.443296
c,0.986016,3.47164,0.065802
d,,1.172186,0.328236


In [26]:
def subtract_and_divide(x, sub, divide=1):
    return (x-sub) / divide

df.apply(subtract_and_divide, args=(5,3))

Unnamed: 0,one,two,three
a,-1.088545,-2.329737,
b,-1.862473,-1.257427,-1.888602
c,-1.997661,-2.287745,-1.752173
d,,-2.027558,-1.475694


`args` has to be iterable. Therefore, even if you pass only 1 argument, you have to pass it as a tuple.

In [28]:
def subtract(x, sub):
    return (x - sub)

df.apply(subtract, args=(5,))

Unnamed: 0,one,two,three
a,-3.265635,-6.989212,
b,-5.587418,-3.77228,-5.665805
c,-5.992983,-6.863234,-5.256519
d,,-6.082675,-4.427081
