# Creating Calculated Columns in `pandas`

In this notebook, you'll see a few ways to create calculated columns in pandas.

In [None]:
import pandas as pd
import numpy as np

In [None]:
weather = pd.read_csv('data/weather.csv')

In [None]:
weather.head()

## Method 1: Vectorized Operations

A vectorized operation is when you do a calculation on a whole column or columns at once. This is the preferred method as it is almost always the fastest, and it should be used whenever possible.

Example: Let's say we want to convert our high temperature, which is currently in degrees fahrenheit to degrees celsius. Recall that to convert from fahrenheit to celsius, subtract 32 and then multiple by 5/9.

In [None]:
weather['High_Temp_Celsius'] = (weather['High_Temp'] - 32) * 5/9

In [None]:
weather.head()

You can also create new columns by combining two or more other columns. Let's say we want to calcuate the range of temperature values.

In [None]:
weather['Temp_Range'] = weather['High_Temp'] - weather['Low_Temp']

You can even use a lot of numpy functions, which are vectorized.

In [None]:
weather['Sqrt_Temp'] = np.sqrt(weather['High_Temp'])

In [None]:
weather.head()

## Method 2: `.apply`

You can use functions with `.apply`. Generally, `.apply` will be slower than using a vectorized solution.

In [None]:
weather['Sqrt_Temp'] = weather['High_Temp'].apply(np.sqrt)

You can also write your own functions and use them with `.apply`.

In [None]:
def convert_fahrenheit_to_celsius(temp):
    return (temp - 32) * 5/9

In [None]:
weather['Low_Temp_Celsius'] = weather['Low_Temp'].apply(convert_fahrenheit_to_celsius)

In [None]:
weather.head()

## Method 2b: `.apply` with a lambda function.

Recall that a **lambda function** is an anonymous function. Lambda functions are useful if you need only need to use a function a single time.

Generally, using `.apply` with a lambda function is even slower, so avoid it if possible.

In [None]:
weather['Low_Temp_Celsius'] = weather['Low_Temp'].apply(lambda x: (x - 32) * 5/9)

If you have a function that involves the values from two or more columns, you can use `.apply` with a lambda function in order to apply that function. In this case, you need to specify that you want to apply the function to the rows (axis = 1).

Note: this is an example where you would definitely just use vectorized operations, but for more complicated/nontrivial operations on the columns, you may need to use the `.apply` approach

In [None]:
def difference(a, b):
    return a - b

In [None]:
# Note the axis = 1 argument
weather['Temp_Range'] = weather.apply(lambda row: difference(row['High_Temp'], row['Low_Temp']), axis = 1)

In [None]:
weather.head()

# Method 3: Iteration

Two ways to iterate through a dataframe are the `iterrows` and the `itertuples` methods.

The first method, `iterrows` returns a tuple containing the index value of each row and the content of that row, as a `pandas` Series.

In [None]:
for idx, row in weather.iterrows():
    print(idx)
    print(row)
    print('-----------')

Since the second component of this tuple is a Series, you can access its elements by slicing.

In [None]:
for idx, row in weather.iterrows():
    print('Date: {}'.format(row['Date']))
    print('High Temperature: {}'.format(row['High_Temp']))
    print('----------')

The `itertuples` method is similar, but it instead returns a `namedtuple`. This makes it faster than `iterrows` in general.

In [None]:
for item in weather.itertuples():
    print(item)
    print('------')

Using either of this iteration methods, you can create a new calculated column. However, you should only use this as a last resort or if you are doing some operation for which vectorized operations or `.apply` will not work.

Note that to access an element of a namedtuple, you need to use a . followed by the element.

In [None]:
temp_range = []
for row in weather.itertuples():
    weather.loc[row.Index, "Avg_Temp_Celsius"] = (row.Avg_Temp - 32) * 5/9

In [None]:
weather.head()