# Using the `apply()` method in pandas

Sometimes, creating a calculated column in pandas is as simple as this:

```python
df['difference'] = df['first_column'] - df['second_column']
```

or this:

```python
df['date_fixed'] = pd.to_datetime(df['date'])
```

Other times, though, your needs are more complex -- you need to take each row of data in your data frame and do _several things_ to it. That's where [`apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) comes in.

Given a function, `apply()` will, uh, _apply_ that function to every row in the data frame. A common scenario for doing so would be to create a new column.

An example might make this idea a little more clear. Let's load up a CSV of Texas death row media witnesses.

In [16]:
import pandas as pd

In [17]:
df = pd.read_csv('../data/tx-death-row-media-list.csv', parse_dates=['execution_date'])

Now, let's say, we want to create a new column with the _month_ of the execution. [Given what we know about date objects](Date%20and%20time%20data%20types.ipynb), this should be simple, right?

So this might be my first guess:

In [18]:
df['month'] = df['execution_date'].month

AttributeError: 'Series' object has no attribute 'month'

Womp womp. Looks like we need to create a _function_ to do this for us. Then we can _apply_ that function to each row.

👉 For a refresher on writing your own functions, [check out this notebook](Functions.ipynb).

In [19]:
def get_month(row):
    '''Given a row of data, return the month of the execution date'''
    return row['execution_date'].month

... and now we can apply it. We also need to specify _how_ it's going to be applied. `axis=0` is the default and attempts to apply the function to each _column_. We want `axis=1`, which applies the function to each _row_ of data.

In [20]:
df['month'] = df.apply(get_month, axis=1)

In [21]:
df.head()

Unnamed: 0,execution_no,execution_date,journo_last,journo_rest,journo_affiliation,inmate_no,inmate_last,inmate_rest,url,month
0,572,2021-06-30,Graczyk,Michael,Associated Press,999567,Hummel,John,https://www.tdcj.state.tx.us/death_row/dr_info...,6
1,572,2021-06-30,Brown,Joseph,Huntsville Item,999567,Hummel,John,https://www.tdcj.state.tx.us/death_row/dr_info...,6
2,571,2021-05-19,No media witnesses present.,,,999379,Jones,Quintin,https://www.tdcj.state.tx.us/death_row/dr_info...,5
3,570,2020-07-08,Graczyk,Michael,Associated Press,999137,Wardlow,Billy,https://www.tdcj.state.tx.us/death_row/dr_info...,7
4,570,2020-07-08,Brown,Joseph,Huntsville Item,999137,Wardlow,Billy,https://www.tdcj.state.tx.us/death_row/dr_info...,7


We could also have dropped in a _lambda expression_ for the function -- in this case, it's simple enough to be readable:

In [22]:
df['month'] = df.apply(lambda x: x['execution_date'].month, axis=1)