# Air Quality Data in India (2015 - 2020)
![](https://carbontracker.org/wp-content/uploads/2019/08/air-pollution-chimney-clouds-459728.jpg)

### Air quality index
- The air quality index **(AQI)** is an index for reporting **air quality** on a daily basis. It is a measure of how air   
  pollution   **affects** one's health within a short time period.
- So, Let's get started with the analysis.

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib notebook
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [9]:
city_df = pd.read_csv(r"C:/Users/vijay kumar yadav/Desktop/kaggle/Air quality kaggle question/city_day.csv")
city_df.head()

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
0,Ahmedabad,2015-01-01,,,0.92,18.22,17.15,,0.92,27.64,133.36,0.0,0.02,0.0,,
1,Ahmedabad,2015-01-02,,,0.97,15.69,16.46,,0.97,24.55,34.06,3.68,5.5,3.77,,
2,Ahmedabad,2015-01-03,,,17.4,19.3,29.7,,17.4,29.07,30.7,6.8,16.4,2.25,,
3,Ahmedabad,2015-01-04,,,1.7,18.48,17.97,,1.7,18.59,36.08,4.43,10.14,1.0,,
4,Ahmedabad,2015-01-05,,,22.1,21.42,37.76,,22.1,39.33,39.31,7.01,18.89,2.78,,


In [10]:
city_df.tail()

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
29526,Visakhapatnam,2020-06-27,15.02,50.94,7.68,25.06,19.54,12.47,0.47,8.55,23.3,2.24,12.07,0.73,41.0,Good
29527,Visakhapatnam,2020-06-28,24.38,74.09,3.42,26.06,16.53,11.99,0.52,12.72,30.14,0.74,2.21,0.38,70.0,Satisfactory
29528,Visakhapatnam,2020-06-29,22.91,65.73,3.45,29.53,18.33,10.71,0.48,8.42,30.96,0.01,0.01,0.0,68.0,Satisfactory
29529,Visakhapatnam,2020-06-30,16.64,49.97,4.05,29.26,18.8,10.03,0.52,9.84,28.3,0.0,0.0,0.0,54.0,Satisfactory
29530,Visakhapatnam,2020-07-01,15.0,66.0,0.4,26.85,14.05,5.2,0.59,2.1,17.05,,,,50.0,Good


In [11]:
# Get info of data types of columns.
city_df['Date'] = city_df['Date'].apply(pd.to_datetime) # Changing data type of 'Date' column from object to date time.
city_df.dtypes

City                  object
Date          datetime64[ns]
PM2.5                float64
PM10                 float64
NO                   float64
NO2                  float64
NOx                  float64
NH3                  float64
CO                   float64
SO2                  float64
O3                   float64
Benzene              float64
Toluene              float64
Xylene               float64
AQI                  float64
AQI_Bucket            object
dtype: object

In [15]:
# Checking for missing values

missing_data = city_df.isnull().sum()
percent_missing = (missing_data/len(city_df))*100
data_frame = pd.concat([missing_data,percent_missing.round(2)],axis=1)
data_frame.columns = ['No of missing values','% of missing values']
data_frame.sort_values('% of missing values',ascending=False).style.background_gradient('Blues')

Unnamed: 0,No of missing values,% of missing values
Xylene,18109,61.32
PM10,11140,37.72
NH3,10328,34.97
Toluene,8041,27.23
Benzene,5623,19.04
AQI,4681,15.85
AQI_Bucket,4681,15.85
PM2.5,4598,15.57
NOx,4185,14.17
O3,4022,13.62


## Calculate AQI
- Since our data set has missing **AQI** values. Although we have information of all essential pollutants.Thus, we need to   
  calculate these AQI values.
- Note for calculating **final** AQI we need data for either PM2.5 or PM10, Also data for any three Pollutants from all 
  pollutants.
- For more information please refer given link [Kaggle](https://www.kaggle.com/rohanrao/calculating-aqi-air-quality-index-tutorial)

In [6]:
# Delete rows with PM2.5 and PM10 both null.
city_df.dropna(axis=0, how="all", subset=['PM2.5','PM10'], inplace = True)
city_df.head()

Unnamed: 0,City,Date,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
27,Ahmedabad,2015-01-28,73.24,,5.72,21.11,25.84,,5.72,36.52,62.42,0.03,0.01,1.41,,
28,Ahmedabad,2015-01-29,83.13,,6.93,28.71,33.72,,6.93,49.52,59.76,0.02,0.0,3.14,209.0,Poor
29,Ahmedabad,2015-01-30,79.84,,13.85,28.68,41.08,,13.85,48.49,97.07,0.04,0.0,4.81,328.0,Very Poor
30,Ahmedabad,2015-01-31,94.52,,24.39,32.66,52.61,,24.39,67.39,111.33,0.24,0.01,7.67,514.0,Severe
31,Ahmedabad,2015-02-01,135.99,,43.48,42.08,84.57,,43.48,75.23,102.7,0.4,0.04,25.87,782.0,Severe
