# Pandas - Adding Calculated Columns to a DataFrame

In [27]:
import pandas as pd

### Create DataFrame

In [28]:
inventory = [
    {'client': 'Arya', 'product': 'hot pies', 'n_boxes': 6, 'n_per_box': 2},
    {'client': 'Brienne', 'product': 'sapphires', 'n_boxes': 2, 'n_per_box': 100},
    {'client': 'Cersei', 'product': 'bottles of wine', 'n_boxes': 8, 'n_per_box': 12},
    {'client': 'Davos', 'product': 'onions', 'n_boxes': 10, 'n_per_box': 20},
]
columns = ['client', 'product', 'n_boxes', 'n_per_box']
df = pd.DataFrame(inventory, columns=columns)
df

Unnamed: 0,client,product,n_boxes,n_per_box
0,Arya,hot pies,6,2
1,Brienne,sapphires,2,100
2,Cersei,bottles of wine,8,12
3,Davos,onions,10,20


### Add calculated column based on one other column in the DataFrame

If you call `.apply()` on just one column of the DataFrame, you're effectively calling it on a Series. Only one value gets passed to the function - the value in the column specified. You don't need to specify the axis, and you don't need to index the passed value.

In [29]:
def designate_house(x):
    houses = {
        'Eddard': 'Stark',
        'Cersei': 'Lannister',
        'Arya': 'Stark',
        'Roose': 'Bolton',
        # 'Davos': 'Seaworth',
        'Brienne': 'Tarth',
        'Jamie': 'Lannister',
    }
    return houses.get(x, 'UNSPECIFIED') # value returned if not found in the houses dict

df['house'] = df['client'].apply(lambda x: designate_house(x))
df

Unnamed: 0,client,product,n_boxes,n_per_box,house
0,Arya,hot pies,6,2,Stark
1,Brienne,sapphires,2,100,Tarth
2,Cersei,bottles of wine,8,12,Lannister
3,Davos,onions,10,20,UNSPECIFIED


### Add calculated column based on multiple other columns in the DataFrame

When using `.apply()`, you can specify the `axis` that is being passed to the specified function. If you use `.apply()` on the whole DataFrame and specify `axis=1`, then you are passing Serieses corresponding to each row of the DataFrame to the function. The values from the columns you're interested in can be specified by indexing the Series with the column names of interest. 

In [30]:
def total_inventory(x):
    return x['n_boxes']*x['n_per_box']

df['n_total'] = df.apply(lambda x: total_inventory(x), axis=1)
df

Unnamed: 0,client,product,n_boxes,n_per_box,house,n_total
0,Arya,hot pies,6,2,Stark,12
1,Brienne,sapphires,2,100,Tarth,200
2,Cersei,bottles of wine,8,12,Lannister,96
3,Davos,onions,10,20,UNSPECIFIED,200
