# Analyzing Air Quality

### Import Libraries

In [69]:
import pandas as pd

### Load and Process Data

In [70]:
aqi_data = pd.read_csv("air_quality_data.csv")
aqi_data.sample(5)

Unnamed: 0,_id,date,site,parameter,index_value,description,health_advisory,health_effects
49524,51022,2022-02-05,Lawrenceville 2,CO,3,Good,,
70099,81801,2024-09-09,North Braddock,SO2,14,Good,,
22784,22785,2018-10-24,North Braddock,SO2,0,Good,,
64523,75520,2023-12-29,Lawrenceville 2,CO,2,Good,,
24986,24987,2019-01-29,Lawrenceville,PM25B,53,Moderate,Unusually sensitive people should consider red...,Respiratory symptoms possible in unusually sen...


In [71]:
aqi_data_processed = aqi_data[["site", "parameter", "index_value", "description"]]
aqi_data_processed.sample(5)

Unnamed: 0,site,parameter,index_value,description
36297,South Fayette,SO2,1,Good
70728,Glassport High Street,PM10,20,Good
10115,Lawrenceville,PM25B,40,Good
3316,Parkway East,NO2,17,Good
8470,Lawrenceville 2,SO2,1,Good


In [72]:
# unique neighborhoods specified in the dataset
aqi_data_processed["site"].unique()

array(['Lawrenceville', 'Flag Plaza', 'Harrison Township', 'Avalon',
       'Lincoln', 'South Fayette', 'North Braddock', 'Parkway East',
       'Liberty 2', 'Lawrenceville 2', 'Glassport High Street', 'Liberty',
       'Clairton', 'West Mifflin', 'Pittsburgh'], dtype=object)

In [73]:
# unique parameters used to determine airquality
aqi_data_processed["parameter"].unique()

array(['PM25B', 'CO', 'OZONE', 'SO2', 'PM25', 'PM10', 'PM10B', 'PM25(2)',
       'NO2', 'PM25T', 'PM25_640', 'PM10_640', 'NO2_500', 'NO2_200'],
      dtype=object)

In [74]:
# unique air quality descriptions
aqi_data_processed["description"].unique()

array(['Good', 'Moderate', 'Unhealthy for Sensitive Groups', 'Unhealthy',
       'Very Unhealthy'], dtype=object)

#### Concerns

* Not many neighborhoods are included from the general Pittsburgh area.
* Different parameters are used to identify air quality

### Identifying Patterns

In [75]:
# just for one specific neighborhood
aqi_data_processed.loc[aqi_data_processed["site"] == "Lawrenceville"].sort_values("parameter")

Unnamed: 0,site,parameter,index_value,description
27482,Lawrenceville,OZONE,26,Good
40780,Lawrenceville,OZONE,26,Good
40795,Lawrenceville,OZONE,17,Good
40808,Lawrenceville,OZONE,17,Good
40837,Lawrenceville,OZONE,31,Good
...,...,...,...,...
44280,Lawrenceville,PM25T,26,Good
44259,Lawrenceville,PM25T,22,Good
44251,Lawrenceville,PM25T,60,Moderate
44373,Lawrenceville,PM25T,58,Moderate


#### Points to Note

* So for each neighborhood, different parameters are measured. We can find out the average AQI for each parameter first.

### Grouping Dataset Based on Neighborhood

In [76]:
unique_neighborhoods = aqi_data_processed["site"].unique()
neighborhoods_data = []
for neighborhood in unique_neighborhoods:
    neighborhoods_data.append(aqi_data_processed.loc[aqi_data_processed["site"] == neighborhood])

In [77]:
neighborhoods_data[0].groupby("parameter").mean()

Unnamed: 0_level_0,index_value
parameter,Unnamed: 1_level_1
OZONE,35.41617
PM25B,42.261209
PM25T,40.666667


In [78]:
neighborhoods_data[2].groupby("parameter").mean()

Unnamed: 0_level_0,index_value
parameter,Unnamed: 1_level_1
NO2,12.895958
NO2_200,10.419355
NO2_500,9.527473
OZONE,36.312277


#### Concerns

* Not all neighborhoods have same parameters.

### Identifying Common Parameter in All Neighborhoods Data

In [79]:
for param in aqi_data_processed["parameter"].unique():
    print(param)
    print(aqi_data_processed.loc[aqi_data_processed["parameter"] == param, "site"].unique())
    print()

PM25B
['Lawrenceville']

CO
['Flag Plaza' 'Parkway East' 'Lawrenceville 2' 'North Braddock']

OZONE
['Harrison Township' 'Lawrenceville' 'South Fayette']

SO2
['Avalon' 'South Fayette' 'Lawrenceville 2' 'North Braddock' 'Liberty'
 'Clairton' 'West Mifflin']

PM25
['Lincoln']

PM10
['Lincoln' 'Flag Plaza' 'Liberty 2' 'Glassport High Street']

PM10B
['North Braddock']

PM25(2)
['Liberty 2']

NO2
['Harrison Township' 'Parkway East']

PM25T
['Parkway East' 'Avalon' 'Liberty 2' 'Lawrenceville']

PM25_640
['Pittsburgh' 'North Braddock' 'Clairton' 'Avalon' 'Parkway East'
 'Liberty 2']

PM10_640
['Pittsburgh' 'North Braddock' 'Liberty 2']

NO2_500
['Pittsburgh' 'Harrison Township']

NO2_200
['Harrison Township']



#### Points to Note

* There is no common parameter in all neighborhoods.
*  Even though it is not the most accurate option, we can average out the AQI index of different neighborhoods to identify the lowest average, and that will be the best neighborhood based on Air Quality

### Identifying Lowest AQI Index

In [80]:
neighborhoods_avg_aqi = aqi_data_processed.groupby("site").mean().sort_values(by="index_value")
neighborhoods_avg_aqi

Unnamed: 0_level_0,index_value
site,Unnamed: 1_level_1
Lawrenceville 2,3.054412
West Mifflin,6.335548
Flag Plaza,9.949275
Glassport High Street,17.395508
Liberty,17.944548
North Braddock,18.807318
Parkway East,21.650542
South Fayette,23.854926
Avalon,24.238474
Harrison Township,24.304659


### Conclusion

* According to the dataset, <b>Lawrenceville 2</b> is the best available neighborhood, while Lawrenceville is the worst available neighborhood.
* There is no metadata available to specify these findings further.
* Took the average of available AQI to figure out the best option. This is not favourable as this average included multiple parameters. It would've been better to identify common parameters and find the best available option among that.

### (Optional) Based on AQI Description

* Find the neighborhood with most "Good" description.

In [81]:
aqi_data_processed_good = aqi_data_processed.loc[aqi_data_processed["description"] == "Good"]
aqi_data_processed_good.sample(5)

Unnamed: 0,site,parameter,index_value,description
30572,Lawrenceville,OZONE,38,Good
26781,Lawrenceville,PM25B,42,Good
58712,Harrison Township,OZONE,44,Good
63712,North Braddock,PM25_640,23,Good
40258,Liberty 2,PM10,9,Good


In [82]:
aqi_data_processed_good.groupby("site")["description"].count()

site
Avalon                   4068
Clairton                 1099
Flag Plaza               3999
Glassport High Street    3188
Harrison Township        6126
Lawrenceville            4042
Lawrenceville 2          6414
Liberty                  3058
Liberty 2                5043
Lincoln                  2754
North Braddock           8176
Parkway East             8669
Pittsburgh               3290
South Fayette            5103
West Mifflin              301
Name: description, dtype: int64

* Accordingly, <b>Avalon</b> has the most number of "Good" description.

### Final Remarks

* The Dataset used is not ideal to identify the best neighborhood in general Pittsburgh area.
* No common parameter is available for comparison, and hence the conclusion was made by simply taking the average of existing findings.
* (Optional) The conclusion made from the "Optional" section is invalid as it could also mean that Avalon had more areas compared to West Mifflin. This can be further convincing when we look at the average AQI, where West Mifflin ranks higher than Avalon by significant amount (6.3 and 24.2).