In [1]:
import pandas as pd

# (5) How To Create New Columns Derived From Existing Columns?

## How To Create New Columns Derived From Existing Columns?

In [2]:
air_quality = pd.read_csv(r"data/air_quality_no2.csv")

In [3]:
air_quality.head()

Unnamed: 0,datetime,station_antwerp,station_paris,station_london
0,2019-05-07 02:00:00,,,23.0
1,2019-05-07 03:00:00,50.5,25.0,19.0
2,2019-05-07 04:00:00,45.0,27.7,19.0
3,2019-05-07 05:00:00,,50.4,16.0
4,2019-05-07 06:00:00,,61.9,


To create a new column, use the `[]` brackets with the new column name at the left side of the assignment.

In [4]:
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882 #assume 1.882 is the conversion factor for the new column

In [5]:
air_quality.head()

Unnamed: 0,datetime,station_antwerp,station_paris,station_london,london_mg_per_cubic
0,2019-05-07 02:00:00,,,23.0,43.286
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758
3,2019-05-07 05:00:00,,50.4,16.0,30.112
4,2019-05-07 06:00:00,,61.9,,


In [6]:
air_quality["ratio_paris_antwerp"] = (air_quality["station_paris"] / air_quality["station_antwerp"])

In [7]:
air_quality.head()

Unnamed: 0,datetime,station_antwerp,station_paris,station_london,london_mg_per_cubic,ratio_paris_antwerp
0,2019-05-07 02:00:00,,,23.0,43.286,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
3,2019-05-07 05:00:00,,50.4,16.0,30.112,
4,2019-05-07 06:00:00,,61.9,,,


**This can all be done element-wise for simple logical and mathematical operators. If more advanced logic is required, use `.apply()`. e.g.: {**

In [8]:
import numpy as np
air_quality["station_paris_sqrt"] = air_quality["station_paris"].apply(np.sqrt)

In [9]:
air_quality.head()

Unnamed: 0,datetime,station_antwerp,station_paris,station_london,london_mg_per_cubic,ratio_paris_antwerp,station_paris_sqrt
0,2019-05-07 02:00:00,,,23.0,43.286,,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505,5.0
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556,5.263079
3,2019-05-07 05:00:00,,50.4,16.0,30.112,,7.099296
4,2019-05-07 06:00:00,,61.9,,,,7.867655


**}**

**To rename columns: {**

In [10]:
air_quality_renamed = air_quality.rename(
    columns = {
        "station_antwerp": "BETR801",
        "station_paris": "FR04014",
        "station_london": "London Westminster"
    }
)

In [11]:
air_quality_renamed.head()

Unnamed: 0,datetime,BETR801,FR04014,London Westminster,london_mg_per_cubic,ratio_paris_antwerp,station_paris_sqrt
0,2019-05-07 02:00:00,,,23.0,43.286,,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505,5.0
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556,5.263079
3,2019-05-07 05:00:00,,50.4,16.0,30.112,,7.099296
4,2019-05-07 06:00:00,,61.9,,,,7.867655


**}**

**Can also do this as a mapping function: {**

In [12]:
air_quality_renamed = air_quality_renamed.rename(columns=str.lower)

In [13]:
air_quality_renamed.head()

Unnamed: 0,datetime,betr801,fr04014,london westminster,london_mg_per_cubic,ratio_paris_antwerp,station_paris_sqrt
0,2019-05-07 02:00:00,,,23.0,43.286,,
1,2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505,5.0
2,2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556,5.263079
3,2019-05-07 05:00:00,,50.4,16.0,30.112,,7.099296
4,2019-05-07 06:00:00,,61.9,,,,7.867655


**}**

## Summary

Create a new column by assigning the output to the DataFrame with a new column name in between the `[]`.

Operations are element-wise, no need to loop over rows.

Use `rename` with a dictionary or function to rename row labels or column names.