We can map accross all of the rows in a df using the apply function.

Using the census df example. We have five columns for population estimates, with each column corresponding with one year of estimates. It's quite reasonable to want to create some new columns for minimum or maximum values, and the apply function is an easy way to do this.

First, we need to write a function which takes in a row of data, finds minimum and maximum values, and returns a new row of data. We'll call this function min_max. We can create some small slice of a row by projecting the population columns. Then use NumPy min and max functions, and create a new series where the label values represent the new values we want to apply.

In [None]:
import pandas as pd
import numpy as np

In [None]:
def min_max(row):
    data = row[['POPESTIMATE2010',
                'POPESTIMATE2011',
                'POPESTIMATE2012',
                'POPESTIMATE2013',]]
    return pd.Series({'min': np.min(data), 'max': np.max(data)})

Then we just need to call apply on the df.

Apply takes the function and the axis on which to operate as parameters. To apply accross all rows, which is applying on all columns, you pass axis equal to 'columns'.

In [None]:
df.apply(min_max, axis='columns').head()

Here's an example where we have a revised version of the min_max function where instead of returning a separate series to display min and max, we add two new columns in the original df to store min and max.

In [None]:
def min_max(rows):
    data = row[['POPESTIMATE2010',
                'POPESTIMATE2011',
                'POPESTIMATE2012',
                'POPESTIMATE2013',]]
    # Create a new entry for max
    row['max'] = np.max(data)
    # Create a new entry for min
    row['min'] = np.min(data)
    return row

# Now just apply the function
df.apply(min_max, axis='columns')

Apply, however, is rarely used with large function def like we did above. Instead lambdas are used to create succinct scripts. 

In [None]:
rows = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013']
# Now we'll just apply this accross the df with a lambda. 
# default for the axis parameter is 0 for row, use 1 for columm
df.apply(lambda x: np.max(x[rows]), axis=1).head()

In [2]:
# Another example of using apply to apply a function to a df then create new column with each row in the new column
# containing the return from the function

In [3]:
# Since the original df has a STNAME column with each state as a value for each row, we can create this 
# customized function to use it later within the apply function.
def get_state_region(x):
    northeast = ['List of Northeastern states...']
    midwest = ['List of Midwestern states...']
    south = ['List of Southern states...']
    west = ['List of Western states...']
    
    if x in northeast:
        return "Northeast"
    elif x in midwest:
        return "Midwest"
    elif x in south:
        return "South"
    else:
        return "West"

We have the customized function above, let's say we want to create a new column called Region, which shows the state's region. We can use the function and the apply function to do so. The customized function is supposed to work on the state name column STNAME. So we will set the apply function on the state name column and pass the customized function into the apply function.

In [None]:
df['state_region'] = df['STNAME'].apply(lambda x: get_state_region(x))

In [None]:
# Let's see the results
df[['STNAME', 'state_region']].head()