# Ambient Air Quality in Kochi and Trivandrum

Air pollution is one of the greatest environmental risk to health, contributing to respiratory and cardiovascular diseases, cancer, and premature death. Tracking air pollution levels can help governments and policy makers make informed policies and decisions to improve public and environmental health. 

There are a few common air pollutants that are frequently monitored as a proxy to achieving good air quality in cities. <br>

<b>Particulate Matter (PM)</b>
- RSPM/PM10/PM2.5: PM is a common proxy indicator for air pollution. There is strong evidence for the negative health impacts associated with exposure to this pollutant. The major components of PM are sulfates, nitrates, ammonia, sodium chloride, black carbon, mineral dust and water. PM10 are for particles with a diameter of 10 micrometers or less, while PM2.5 are for fine particles with a diameter of 2.5 micrometers or less. These are particularly harmful as they can penetrate deep into the lungs and even enter the bloodstream.

<b>Gaseous Pollutants</b>
- SO<sub>2</sub>: SO<sub>2</sub> is a colourless gas with a sharp odour. It is produced from the burning of fossil fuels (coal and oil) and the smelting of mineral ores that contain sulfur.
- NO<sub>2</sub>: NO<sub>2</sub> is a gas that is commonly released from the combustion of fuels in the transportation and industrial sectors.

## Data

## Air Pollutant Data

The data was downloaded from www.kerala.data.gov and is available for the years 1987 to 2002 and 2005 to 2015. Data for the years 2003 and 2004 are not available. Each year is recorded in a separate downloaded EXCEL file. <br>
Each file generally included:
- The station code where the data was recorded (Stn Code)
- Date of record (Sampling Date)
- Place of record (City/Town/Village Area)
- Agency 
- Type of location
- SO<sub>2</sub> values
- NO<sub>2</sub> values
- RSPM/PM10 values
- SPM values
- PM2.5 values

The completeness of data varies by year and city.


## Methodology

Data for 27 years were concatenated and pre-processed. Columns were standardized whenever possible. However, as the recording format varies from year to year, some human errors may have occurred in the process of standardizing the data. Due to lack of data for some columns, only SO<sub>2</sub>, NO<sub>2</sub> and RSPM/PM10 values were used for analysis and visualization. It is also important to note that data quality varies across cities and years.

## A look at the data



In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import base64
from io import BytesIO
import dash
from dash import dcc, html
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output
import ace as tools

In [2]:
data = pd.read_csv('../data/combined_airquality_edit.csv')
data

Unnamed: 0,Stn Code,Sampling Date,Month,Year,State,City/Town/Village/Area,Agency,Type of Location,SO2,NO2,RSPM/PM10,SPM,Location of Monitoring Station,PM 2.5
0,29.0,10/3/1987,3.0,1987,Kerala,Cochin,Kerala Pollution Control Board,Industrial,,2.9,,,,
1,30.0,4/5/1988,5.0,1988,Kerala,Cochin,Kerala Pollution Control Board,Industrial Area,,,,79.0,,
2,31.0,4/5/1988,5.0,1988,Kerala,Cochin,Kerala Pollution Control Board,Industrial Area,,,,88.0,,
3,32.0,4/5/1988,5.0,1988,Kerala,Cochin,Kerala Pollution Control Board,Industrial Area,,,,98.0,,
4,33.0,4/5/1988,5.0,1988,Kerala,Cochin,Kerala Pollution Control Board,,,,,88.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24750,624.0,16-12-15,12.0,2015,Kerala,Thissur,Kerala State Pollution Control Board,"Residential, Rural and other Areas",2.0,5.0,64.0,,"KSPCB, District Office, Poonkunnam, Thrissur",
24751,624.0,18-12-15,12.0,2015,Kerala,Thissur,Kerala State Pollution Control Board,"Residential, Rural and other Areas",2.0,15.0,43.0,,"KSPCB, District Office, Poonkunnam, Thrissur",
24752,624.0,21-12-15,12.0,2015,Kerala,Thissur,Kerala State Pollution Control Board,"Residential, Rural and other Areas",2.0,5.0,60.0,,"KSPCB, District Office, Poonkunnam, Thrissur",
24753,624.0,28-12-15,12.0,2015,Kerala,Thissur,Kerala State Pollution Control Board,"Residential, Rural and other Areas",2.0,5.0,61.0,,"KSPCB, District Office, Poonkunnam, Thrissur",


In [3]:
# All the cities/regions available
data['City/Town/Village/Area'].unique()

array(['Cochin', 'Kotttayam', 'Kottayam', 'Kozhikode', 'Trivendrum',
       'Palakkad', 'Kochi', 'Trivandrum', 'Alappuzha', 'Kollam',
       'Malappuram', 'Thiruvananthapuram', 'Pathanamthitta', 'Thissur',
       'Wayanad'], dtype=object)

In [4]:
# Noticed above that some cities are duplicated (e.g. Cochin and Kochi are used interchangeably)
# Data preprocessing
data['City/Town/Village/Area'] = data['City/Town/Village/Area'].replace({
    'Kotttayam': 'Kottayam',
    'Trivendrum': 'Trivandrum',
    'Cochin': 'Kochi',
    'Thiruvananthapuram': 'Trivandrum'})

In [5]:
# More data preprocessing
# Filter the data for Kochi
kochi_data = data[data['City/Town/Village/Area'] == 'Kochi'].copy()

kochi_data = kochi_data[['Stn Code', 'Month', 'Year', 'SO2', 'NO2',
       'RSPM/PM10', 'SPM', 'PM 2.5']]

# Handle missing values in 'Month' and 'Year' columns
kochi_data = kochi_data.dropna(subset=['Month', 'Year'])

# Convert 'Month' column to numeric, coercing errors to NaN
kochi_data['Month'] = pd.to_numeric(kochi_data['Month'], errors='coerce')

# Drop rows where 'Month' conversion resulted in NaN
kochi_data = kochi_data.dropna(subset=['Month'])

# Convert 'Year' column to integers
kochi_data['Year'] = kochi_data['Year'].astype(int)
kochi_data['Month'] = kochi_data['Month'].astype(int)

# Create a datetime column from 'Month' and 'Year'
kochi_data['Date'] = pd.to_datetime(kochi_data[['Year', 'Month']].assign(DAY=1))

# Filter the data for years from 2005 onwards
kochi_data = kochi_data[kochi_data['Date'].dt.year >= 2005]

# Ensure NO2, PM10, and PM2.5 are numeric
kochi_data['NO2'] = pd.to_numeric(kochi_data['NO2'], errors='coerce')
kochi_data['RSPM/PM10'] = pd.to_numeric(kochi_data['RSPM/PM10'], errors='coerce')
kochi_data['SO2'] = pd.to_numeric(kochi_data['SO2'], errors='coerce')

# Interpolate the missing values
# Warning: For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
kochi_data.set_index('Date', inplace=True)
kochi_data['NO2'] = kochi_data['NO2'].interpolate(method='time')
kochi_data['RSPM/PM10'] = kochi_data['RSPM/PM10'].interpolate(method='time')
kochi_data['SO2'] = kochi_data['SO2'].interpolate(method='time')

# Resample the data by month
kochi_data = kochi_data.resample('ME').mean()

## Interactive Air Quality Dashboard of Kochi

In [6]:
# Initialize the Dash app
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])

# App layout
app.layout = dbc.Container([
    dbc.Row([
        dbc.Col(html.H1("Air Quality Dashboard for Kochi"), className="mb-2")
    ]),
    dbc.Row([
        dbc.Col(dcc.Graph(id='no2-plot'), width=12)
    ]),
    dbc.Row([
        dbc.Col(dcc.Graph(id='pm10-plot'), width=12)
    ]),
    dbc.Row([
        dbc.Col(dcc.Graph(id='so2-plot'), width=12)
    ]),
])

# Callback to update plots
@app.callback(
    [Output('no2-plot', 'figure'),
     Output('pm10-plot', 'figure'),
     Output('so2-plot', 'figure')],
    [Input('no2-plot', 'id')]
)
def update_plots(n):
    no2_fig = {
        'data': [{'x': kochi_data.index, 'y': kochi_data['NO2'], 'type': 'line', 'name': 'NO2'}],
        'layout': {'title': 'NO2 Levels in kochi (2005 Onwards)', 'yaxis': {'title': 'NO2'}, 'xaxis': {'title': 'Date'}}
    }

    pm10_fig = {
        'data': [{'x': kochi_data.index, 'y': kochi_data['RSPM/PM10'], 'type': 'line', 'name': 'PM10'}],
        'layout': {'title': 'PM10 Levels in Kochi (2005 Onwards)', 'yaxis': {'title': 'PM10'}, 'xaxis': {'title': 'Date'}}
    }

    pm25_fig = {
        'data': [{'x': kochi_data.index, 'y': kochi_data['SO2'], 'type': 'line', 'name': 'SO2'}],
        'layout': {'title': 'SO2 Levels in Kochi (2005 Onwards)', 'yaxis': {'title': 'SO2'}, 'xaxis': {'title': 'Date'}}
    }

    return no2_fig, pm10_fig, pm25_fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, port=8051)


### Observations

- Kochi has been managing PM 10 and SO<sub>2</sub> levels well, but NO<sub>2</sub> levels are on the rise, especially since mid-2015.

## Interactive Air Quality Dashboard of Trivandrum

In [7]:
# Data preprocessing
data['City/Town/Village/Area'] = data['City/Town/Village/Area'].replace({
    'Kotttayam': 'Kottayam',
    'Trivendrum': 'Trivandrum',
    'Cochin': 'Kochi',
    'Thiruvananthapuram': 'Trivandrum'
})

# Filter the data for Trivandrum
trivandrum_data = data[data['City/Town/Village/Area'] == 'Trivandrum'].copy()

trivandrum_data = trivandrum_data[['Stn Code', 'Month', 'Year', 'SO2', 'NO2',
       'RSPM/PM10', 'SPM', 'PM 2.5']]

# Handle missing values in 'Month' and 'Year' columns
trivandrum_data = trivandrum_data.dropna(subset=['Month', 'Year'])

# Convert 'Month' column to numeric, coercing errors to NaN
trivandrum_data['Month'] = pd.to_numeric(trivandrum_data['Month'], errors='coerce')

# Drop rows where 'Month' conversion resulted in NaN
trivandrum_data = trivandrum_data.dropna(subset=['Month'])

# Convert 'Year' column to integers
trivandrum_data['Year'] = trivandrum_data['Year'].astype(int)
trivandrum_data['Month'] = trivandrum_data['Month'].astype(int)

# Create a datetime column from 'Month' and 'Year'
trivandrum_data['Date'] = pd.to_datetime(trivandrum_data[['Year', 'Month']].assign(DAY=1))

# Filter the data for years from 2000 onwards
trivandrum_data = trivandrum_data[trivandrum_data['Date'].dt.year >= 2005]

# Ensure NO2, PM10, and PM2.5 are numeric
trivandrum_data['NO2'] = pd.to_numeric(trivandrum_data['NO2'], errors='coerce')
trivandrum_data['RSPM/PM10'] = pd.to_numeric(trivandrum_data['RSPM/PM10'], errors='coerce')
trivandrum_data['SO2'] = pd.to_numeric(trivandrum_data['SO2'], errors='coerce')

# Interpolate the missing values
# Warning: For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
trivandrum_data.set_index('Date', inplace=True)
trivandrum_data['NO2'] = trivandrum_data['NO2'].interpolate(method='time')
trivandrum_data['RSPM/PM10'] = trivandrum_data['RSPM/PM10'].interpolate(method='time')
trivandrum_data['SO2'] = trivandrum_data['SO2'].interpolate(method='time')

# Resample the data by month
trivandrum_data = trivandrum_data.resample('ME').mean()

# Initialize the Dash app
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])

# App layout
app.layout = dbc.Container([
    dbc.Row([
        dbc.Col(html.H1("Air Quality Dashboard for Trivandrum"), className="mb-2")
    ]),
    dbc.Row([
        dbc.Col(dcc.Graph(id='no2-plot'), width=12)
    ]),
    dbc.Row([
        dbc.Col(dcc.Graph(id='pm10-plot'), width=12)
    ]),
    dbc.Row([
        dbc.Col(dcc.Graph(id='so2-plot'), width=12)
    ]),
])

# Callback to update plots
@app.callback(
    [Output('no2-plot', 'figure'),
     Output('pm10-plot', 'figure'),
     Output('so2-plot', 'figure')],
    [Input('no2-plot', 'id')]
)
def update_plots(n):
    no2_fig = {
        'data': [{'x': trivandrum_data.index, 'y': trivandrum_data['NO2'], 'type': 'line', 'name': 'NO2'}],
        'layout': {'title': 'NO2 Levels in Trivandrum (2005 Onwards)', 'yaxis': {'title': 'NO2'}, 'xaxis': {'title': 'Date'}}
    }

    pm10_fig = {
        'data': [{'x': trivandrum_data.index, 'y': trivandrum_data['RSPM/PM10'], 'type': 'line', 'name': 'PM10'}],
        'layout': {'title': 'PM10 Levels in Trivandrum (2005 Onwards)', 'yaxis': {'title': 'PM10'}, 'xaxis': {'title': 'Date'}}
    }

    pm25_fig = {
        'data': [{'x': trivandrum_data.index, 'y': trivandrum_data['SO2'], 'type': 'line', 'name': 'SO2'}],
        'layout': {'title': 'SO2 Levels in Trivandrum (2005 Onwards)', 'yaxis': {'title': 'SO2'}, 'xaxis': {'title': 'Date'}}
    }

    return no2_fig, pm10_fig, pm25_fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, port=8050)


### Observations

- Air Quality is generally worse in Trivandrum than Kochi.