**Project Title:**
Air_Quality_Monitoring_and_Health_Risk_Assesment

**Problem Statement:**
Air pollution in India is rising at an alarming rate, with pollutants like PM2.5, PM10, NO2, and SO2 posing major risks to human health. There is a need for effective prediction of air quality and its related health impacts to support timely preventive action.

**Project Description:**
This project analyzes Indian air quality data and applies machine learning to predict AQI levels while classifying them into health risk categories such as Good, Moderate, Poor, and Severe, providing insights for public health and policy decisions.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report 
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

df = pd.read_csv('city_day.csv')
df.head()




Unnamed: 0,City,Datetime,PM2.5,PM10,NO,NO2,NOx,NH3,CO,SO2,O3,Benzene,Toluene,Xylene,AQI,AQI_Bucket
0,Delhi,2015-01-01,153.3,241.7,182.9,33.0,81.3,38.5,1.87,64.5,83.6,18.93,20.81,8.32,204.5,Severe
1,Mumbai,2015-01-01,70.5,312.7,195.0,42.0,122.5,31.5,7.22,83.8,108.0,2.01,19.41,2.86,60.9,Satisfactory
2,Chennai,2015-01-01,174.1,275.4,56.2,68.8,230.9,28.5,8.56,60.8,43.9,19.07,10.19,9.63,486.5,Severe
3,Kolkata,2015-01-01,477.2,543.9,14.1,76.4,225.9,45.6,2.41,42.1,171.1,9.31,11.65,9.39,174.4,Very Poor
4,Bangalore,2015-01-01,171.6,117.7,123.3,12.4,61.9,49.7,1.26,79.7,164.3,6.04,12.74,9.59,489.7,Good


In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18265 entries, 0 to 18264
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   City        18265 non-null  object 
 1   Datetime    18265 non-null  object 
 2   PM2.5       18265 non-null  float64
 3   PM10        18265 non-null  float64
 4   NO          18265 non-null  float64
 5   NO2         18265 non-null  float64
 6   NOx         18265 non-null  float64
 7   NH3         18265 non-null  float64
 8   CO          18265 non-null  float64
 9   SO2         18265 non-null  float64
 10  O3          18265 non-null  float64
 11  Benzene     18265 non-null  float64
 12  Toluene     18265 non-null  float64
 13  Xylene      18265 non-null  float64
 14  AQI         18265 non-null  float64
 15  AQI_Bucket  18265 non-null  object 
dtypes: float64(13), object(3)
memory usage: 2.2+ MB


In [44]:
df.isnull().sum()

City          0
Datetime      0
PM2.5         0
PM10          0
NO            0
NO2           0
NOx           0
NH3           0
CO            0
SO2           0
O3            0
Benzene       0
Toluene       0
Xylene        0
AQI           0
AQI_Bucket    0
dtype: int64

In [45]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
PM2.5,18265.0,250.597695,144.460292,0.0,125.7,251.0,376.2,499.9
PM10,18265.0,299.442491,173.479906,0.0,150.1,300.3,450.0,600.0
NO,18265.0,100.481035,57.774795,0.0,50.6,100.2,151.0,200.0
NO2,18265.0,75.415916,43.460066,0.0,37.7,76.0,113.2,150.0
NOx,18265.0,125.964079,72.403893,0.0,63.1,126.2,188.9,250.0
NH3,18265.0,25.065042,14.452019,0.0,12.6,25.3,37.6,50.0
CO,18265.0,5.002451,2.889439,0.0,2.49,5.0,7.51,10.0
SO2,18265.0,49.835839,28.988739,0.0,24.4,49.9,75.1,100.0
O3,18265.0,100.40674,57.591436,0.0,50.6,100.7,150.4,200.0
Benzene,18265.0,10.070033,5.785282,0.0,5.08,10.08,15.11,20.0
