# Pandas
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD licens

# Download Dataset

Air quality data : https://drive.google.com/file/d/10QJIxE1SxdToLObBmEccrtOr_dEVi4v_/view?usp=sharing

In [2]:
#import pandas modlue 
import pandas as pd

To load the pandas package and start working with it, import the package. The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

In [17]:
from IPython.core.display import HTML
css=open('style-table.css').read()
#+open('style-notebook.css')
HTML('<style>{}</style>'.format(css))

In [10]:
air_quality = pd.read_csv("Dataset/air_quality_no2.csv",index_col="datetime")
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-05-07 02:00:00,,,23.0
2019-05-07 03:00:00,50.5,25.0,19.0
2019-05-07 04:00:00,45.0,27.7,19.0
2019-05-07 05:00:00,,50.4,16.0
2019-05-07 06:00:00,,61.9,


# How to create new columns derived from existing columns?
<img src =" derived-col.png">
<br><br>
I want to express the NO2 concentration of the station in London in mg/m3

(If we assume temperature of 25 degrees Celsius and pressure of 1013 hPa, the conversion factor is 1.882)

In [11]:
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london,london_mg_per_cubic
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-05-07 02:00:00,,,23.0,43.286
2019-05-07 03:00:00,50.5,25.0,19.0,35.758
2019-05-07 04:00:00,45.0,27.7,19.0,35.758
2019-05-07 05:00:00,,50.4,16.0,30.112
2019-05-07 06:00:00,,61.9,,


To create a new column, use the [] brackets with the new column name at the left side of the assignment.

##### Note

The calculation of the values is done element_wise. This means all values in the given column are multiplied by the value 1.882 at once. You do not need to use a loop to iterate each of the rows!
<br>
<img src="img2.png">
<br>
I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column

In [16]:
air_quality["ratio_paris_antwerp"] = air_quality["station_paris"] / air_quality["station_antwerp"]
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-07 02:00:00,,,23.0,43.286,
2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
2019-05-07 05:00:00,,50.4,16.0,30.112,
2019-05-07 06:00:00,,61.9,,,


The calculation is again element-wise, so the / is applied for the values in each row.

Also other mathematical operators (+, -, *, /) or logical operators (<, >, =,…) work element wise. The latter was already used in the subset data tutorial to filter rows of a table using a conditional expression.

I want to rename the data columns to the corresponding station identifiers used by openAQ

In [18]:
air_quality_renamed = air_quality.rename(columns={"station_antwerp": "BETR801","station_paris": "FR04014","station_london": "London Westminster"})

In [19]:
air_quality_renamed.head()

Unnamed: 0_level_0,BETR801,FR04014,London Westminster,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-07 02:00:00,,,23.0,43.286,
2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
2019-05-07 05:00:00,,50.4,16.0,30.112,
2019-05-07 06:00:00,,61.9,,,


The rename() function can be used for both row labels and column labels. Provide a dictionary with the keys the current names and the values the new names to update the corresponding names.

The mapping should not be restricted to fixed names only, but can be a mapping function as well. For example, converting the column names to lowercase letters can be done using a function as well:

In [21]:
air_quality_renamed = air_quality_renamed.rename(columns=str.lower)
air_quality_renamed.head()

Unnamed: 0_level_0,betr801,fr04014,london westminster,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-07 02:00:00,,,23.0,43.286,
2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
2019-05-07 05:00:00,,50.4,16.0,30.112,
2019-05-07 06:00:00,,61.9,,,


# REMEMBER
Create a new column by assigning the output to the DataFrame with a new column name in between the [].

Operations are element-wise, no need to loop over rows.

Use rename with a dictionary or function to rename row labels or column names.