# Air Quality Index Detection

Air is what keeps humans alive. Monitoring it and understanding its quality is of immense importance to our well-being.

Content
The dataset contains air quality data and AQI (Air Quality Index) at hourly and daily level of various stations across multiple cities in India.

AQI
A tutorial of how AQI is calculated is available here: https://www.kaggle.com/rohanrao/calculating-aqi-air-quality-index

Cities
Ahmedabad, Aizawl, Amaravati, Amritsar, Bengaluru, Bhopal, Brajrajnagar, Chandigarh, Chennai, Coimbatore, Delhi, Ernakulam, Gurugram, Guwahati, Hyderabad, Jaipur, Jorapokhar, Kochi, Kolkata, Lucknow, Mumbai, Patna, Shillong, Talcher, Thiruvananthapuram, Visakhapatnam

Acknowledgements
The data has been made publicly available by the Central Pollution Control Board: https://cpcb.nic.in/ which is the official portal of Government of India. They also have a real-time monitoring app: https://app.cpcbccr.com/AQI_India/

Noise
Similar to air monitoring data, a dataset on noise decibel levels in India is available here: https://www.kaggle.com/rohanrao/noise-monitoring-data-in-india

### importing libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
import plotly.express as px

import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv('city_day.csv')

In [3]:
type(data)

pandas.core.frame.DataFrame

In [4]:
data

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
0,Ahmedabad,2015-01-01,,,0.92,18.22,17.15,,0.92,27.64,133.36,0.00,0.02,0.00,,
1,Ahmedabad,2015-01-02,,,0.97,15.69,16.46,,0.97,24.55,34.06,3.68,5.50,3.77,,
2,Ahmedabad,2015-01-03,,,17.40,19.30,29.70,,17.40,29.07,30.70,6.80,16.40,2.25,,
3,Ahmedabad,2015-01-04,,,1.70,18.48,17.97,,1.70,18.59,36.08,4.43,10.14,1.00,,
4,Ahmedabad,2015-01-05,,,22.10,21.42,37.76,,22.10,39.33,39.31,7.01,18.89,2.78,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29526,Visakhapatnam,2020-06-27,15.02,50.94,7.68,25.06,19.54,12.47,0.47,8.55,23.30,2.24,12.07,0.73,41.0,Good
29527,Visakhapatnam,2020-06-28,24.38,74.09,3.42,26.06,16.53,11.99,0.52,12.72,30.14,0.74,2.21,0.38,70.0,Satisfactory
29528,Visakhapatnam,2020-06-29,22.91,65.73,3.45,29.53,18.33,10.71,0.48,8.42,30.96,0.01,0.01,0.00,68.0,Satisfactory
29529,Visakhapatnam,2020-06-30,16.64,49.97,4.05,29.26,18.80,10.03,0.52,9.84,28.30,0.00,0.00,0.00,54.0,Satisfactory


In [5]:
data.head()

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
0,Ahmedabad,2015-01-01,,,0.92,18.22,17.15,,0.92,27.64,133.36,0.0,0.02,0.0,,
1,Ahmedabad,2015-01-02,,,0.97,15.69,16.46,,0.97,24.55,34.06,3.68,5.5,3.77,,
2,Ahmedabad,2015-01-03,,,17.4,19.3,29.7,,17.4,29.07,30.7,6.8,16.4,2.25,,
3,Ahmedabad,2015-01-04,,,1.7,18.48,17.97,,1.7,18.59,36.08,4.43,10.14,1.0,,
4,Ahmedabad,2015-01-05,,,22.1,21.42,37.76,,22.1,39.33,39.31,7.01,18.89,2.78,,


In [6]:
data.tail()

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
29526,Visakhapatnam,2020-06-27,15.02,50.94,7.68,25.06,19.54,12.47,0.47,8.55,23.3,2.24,12.07,0.73,41.0,Good
29527,Visakhapatnam,2020-06-28,24.38,74.09,3.42,26.06,16.53,11.99,0.52,12.72,30.14,0.74,2.21,0.38,70.0,Satisfactory
29528,Visakhapatnam,2020-06-29,22.91,65.73,3.45,29.53,18.33,10.71,0.48,8.42,30.96,0.01,0.01,0.0,68.0,Satisfactory
29529,Visakhapatnam,2020-06-30,16.64,49.97,4.05,29.26,18.8,10.03,0.52,9.84,28.3,0.0,0.0,0.0,54.0,Satisfactory
29530,Visakhapatnam,2020-07-01,15.0,66.0,0.4,26.85,14.05,5.2,0.59,2.1,17.05,,,,50.0,Good


In [7]:
data.sample(10)

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
4456,Bengaluru,2015-06-12,14.76,,3.2,13.35,7.08,5.29,1.89,10.35,20.52,4.35,1.56,,81.0,Satisfactory
15062,Hyderabad,2016-04-29,40.1,85.11,8.83,32.79,28.28,33.59,0.48,6.71,31.34,0.44,3.01,0.37,99.0,Satisfactory
17816,Jorapokhar,2017-08-13,,54.73,6.75,7.84,,,,13.9,,,,,,
29438,Visakhapatnam,2020-03-31,18.77,52.17,6.65,25.22,18.82,5.64,0.11,8.52,46.99,3.88,9.99,1.17,84.0,Satisfactory
3439,Amritsar,2018-02-28,57.97,149.22,15.54,9.32,34.92,10.55,0.47,3.58,17.94,11.71,7.72,5.89,133.0,Moderate
28606,Visakhapatnam,2017-12-20,72.83,137.31,10.25,54.01,37.01,12.09,1.12,6.59,69.33,3.83,6.36,1.87,124.0,Moderate
21292,Lucknow,2018-12-17,157.84,,35.11,56.53,57.35,43.52,2.33,7.67,24.04,0.08,1.71,,317.0,Very Poor
535,Ahmedabad,2016-06-19,,,,,,,,,,,,,,
10552,Delhi,2015-11-20,271.42,470.08,139.98,101.44,164.44,64.78,3.28,26.81,124.74,9.41,35.26,3.45,455.0,Severe
8080,Chennai,2015-09-04,26.96,,9.51,17.1,19.35,23.47,1.92,14.6,28.95,,,,106.0,Moderate


# Data Analyis and Preprocessing

In [8]:
data.describe()

Unnamed: 0,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI
count,24933.0,18391.0,25949.0,25946.0,25346.0,19203.0,27472.0,25677.0,25509.0,23908.0,21490.0,11422.0,24850.0
mean,67.450578,118.127103,17.57473,28.560659,32.309123,23.483476,2.248598,14.531977,34.49143,3.28084,8.700972,3.070128,166.463581
std,64.661449,90.60511,22.785846,24.474746,31.646011,25.684275,6.962884,18.133775,21.694928,15.811136,19.969164,6.323247,140.696585
min,0.04,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,13.0
25%,28.82,56.255,5.63,11.75,12.82,8.58,0.51,5.67,18.86,0.12,0.6,0.14,81.0
50%,48.57,95.68,9.89,21.69,23.52,15.85,0.89,9.16,30.84,1.07,2.97,0.98,118.0
75%,80.59,149.745,19.95,37.62,40.1275,30.02,1.45,15.22,45.57,3.08,9.15,3.35,208.0
max,949.99,1000.0,390.68,362.21,467.63,352.89,175.81,193.86,257.73,455.03,454.85,170.37,2049.0


In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29531 entries, 0 to 29530
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   City        29531 non-null  object 
 1   Date        29531 non-null  object 
 2   PM2.5       24933 non-null  float64
 3   PM10        18391 non-null  float64
 4   NO          25949 non-null  float64
 5   NO2         25946 non-null  float64
 6   NOx         25346 non-null  float64
 7   NH3         19203 non-null  float64
 8   CO          27472 non-null  float64
 9   SO2         25677 non-null  float64
 10  O3          25509 non-null  float64
 11  Benzene     23908 non-null  float64
 12  Toluene     21490 non-null  float64
 13  Xylene      11422 non-null  float64
 14  AQI         24850 non-null  float64
 15  AQI_Bucket  24850 non-null  object 
dtypes: float64(13), object(3)
memory usage: 3.6+ MB


Missing data found
# Missing data handling

In [10]:
data = data.fillna(data.median())

TypeError: could not convert string to float: 'Ahmedabad'

In [None]:
data

In [None]:
data = data.dropna()
data = data.drop(['City','Date'],axis=1)

In [None]:
data.info()

# Exploratory Data Analysis

In [None]:
data["PM2.5"].describe()

In [None]:
data.corr()

In [None]:
sns.set(rc={'figure.figsize':(20,15)})
dataplot = sns.heatmap(data.corr(), cmap="YlGnBu", annot=True) 

In [None]:
sns.barplot(data.AQI_Bucket,data["PM2.5"])

In [None]:
sns.barplot(data.AQI_Bucket,data["PM10"])

In [None]:
sns.barplot(data.AQI_Bucket,data["CO"])

In [None]:
sns.barplot(data.AQI_Bucket,data["NO"])

In [None]:
import seaborn as sns
sns.countplot(x =data['AQI_Bucket'], data = data)
plt.savefig("result.png")

In [None]:
data

In [None]:
sns.countplot(x =data['AQI_Bucket'], data = data) 

In [None]:
# Preprocessing steps
# Outlier detection using IQR method
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
data = data[~((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))).any(axis=1)]

In [None]:
data

In [None]:
# Interpolation
data.interpolate(inplace=True)

In [None]:
# Data normalization
scaler = StandardScaler()
data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene']] = scaler.fit_transform(data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene']])

In [None]:
# Splitting the dataset into train and test sets
X = data.drop(['AQI', 'AQI_Bucket'], axis=1)
y = data['AQI']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
data = data.drop('AQI_Bucket',axis=1)

In [None]:
data

In [None]:
X

In [None]:
y

In [None]:
data

# splitting the data into training and testing

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.33,random_state=100)