# Air Quality and Asthma Outcomes in NYC

- **Dataset(s) to be used:** 

[NYC Asthma emergency department visits](https://a816-dohbesp.nyc.gov/IndicatorPublic/data-explorer/asthma/?id=2380#display=summary)

[NYC Air quality(PM2.5)](https://a816-dohbesp.nyc.gov/IndicatorPublic/data-explorer/air-quality/?id=2023#display=summary)

[Cockroach](https://a816-dohbesp.nyc.gov/IndicatorPublic/data-explorer/asthma-triggers/?id=107#display=summary)
- **Analysis question:** Is there an association between air quality(measured by PM2.5 concentrations) and asthma emergency department visit rates across New York City?
- **Columns that will (likely) be used:**
  - Asthma dataset:
     - [TimePeriod]
     - [GeoID]
     - [Geography]
     - [Estimated annual rate per 10,000]

  - Air quality dataset:
     - [TimePeriod]
     - [GeoID]
     - [Geography]
     - [Annual mean mcg/m3]
     - [Summer mean mcg/m3]
     - [Winter mean mcg/m3]

  - Cockroashes dataset:
     - [TimePeriod]
     - [Borough]
     - [Percent]

- **Columns to be used to merge/join them:**
  - Asthma dataset: [TimePeriod] [GeoID]
  - Air quality dataset: [TimePeriod] [GeoID]
  - Cockroashes dataset: [TimePeriod] [Borough]

- **Hypothesis**: Higher PM2.5 concentrations will be associated with higher asthma emergency department visit rates, meaning that poorer air quality is expected to correspond to increased asthma-related health impacts.

In [59]:
# ensure the visualizations render properly across VSCode, Jupyter Book, etc.
# https://plotly.com/python/renderers/

import plotly.io as pio

pio.renderers.default = "notebook_connected+plotly_mimetype"

## Examining the Air Quality Dataset in Depth

In [None]:
# import the packages needed to load and process the datasets and visualization 
import pandas as pd
import plotly.express as px

In [None]:
# Import the dataset that includes PM 2.5 concentration in each regions in NYC 
air_quality_all=pd.read_csv("NYC EH Data Portal - Fine particles (PM 2.5) (filtered).csv")
air_quality_all.head()

Unnamed: 0,TimePeriod,GeoTypeDesc,GeoID,GeoRank,BoroID,Borough,Geography,Area,Annual mean mcg/m3,Summer mean mcg/m3,Winter mean mcg/m3
0,2024,UHF 34,306308,3,3,Manhattan,Chelsea-Village,Chelsea-Village Manhattan,9.3,11.4,8.2
1,2024,UHF 34,309310,3,3,Manhattan,Union Square-Lower Manhattan,Union Square-Lower Manhattan Manhattan,8.3,10.3,7.6
2,2024,UHF 34,305307,3,3,Manhattan,Upper East Side-Gramercy,Upper East Side-Gramercy Manhattan,7.9,9.8,7.4
3,2024,UHF 34,201,3,2,Brooklyn,Greenpoint,Greenpoint Brooklyn,7.4,9.1,7.1
4,2024,UHF 34,202,3,2,Brooklyn,Downtown - Heights - Slope,Downtown - Heights - Slope Brooklyn,7.2,9.0,7.0


In [None]:
"""NYC EH uses two geographic classification systems: UHF34 and UHF42. 
   The dataset contains both types, but we only use the data classified under UHF42. 
   Therefore, we filter the dataset based on the value of the GeoType column
   to keep only the UHF42 records."""
air_quality=air_quality_all[air_quality_all["GeoTypeDesc"]=="UHF 42"]

# Delete the unnecessary rows
air_quality = air_quality.drop(columns=["GeoTypeDesc", "GeoRank", "BoroID","Area"])
air_quality.head()

Unnamed: 0,TimePeriod,GeoID,Borough,Geography,Annual mean mcg/m3,Summer mean mcg/m3,Winter mean mcg/m3
34,2024,306,Manhattan,Chelsea - Clinton,9.5,11.6,8.4
35,2024,308,Manhattan,Greenwich Village - SoHo,8.9,11.0,8.0
36,2024,307,Manhattan,Gramercy Park - Murray Hill,8.9,10.9,7.9
37,2024,309,Manhattan,Union Square - Lower East Side,8.4,10.5,7.7
38,2024,310,Manhattan,Lower Manhattan,8.0,9.9,7.5


#### Draw a map based on the average annual PM 2.5 concentration

By analyzing this dataset, we can create a map to show the PM2.5 levels across different areas over recent years, compare them, and identify which regions have higher concentrations.

In [None]:
"""Before we can draw any kind of map in Python,
   we need the shapes of those neighborhoods. 
   Those shapes usually come in a special file format called GeoJSON."""
# importing the requests library which enables us to ask websites for data
import requests

# We tell Python: “Go to this URL…”.This URL points to UHF42.geojson, 
# a file that contains NYC’s UHF42 neighborhood boundaries."""
response = requests.get("https://raw.githubusercontent.com/nychealth/EHDP-data/refs/heads/production/geography/UHF42.geojson")

# Read the text inside the file and interpret it as JSON
shapes = response.json() 

print("loaded")

loaded


In [None]:
# Calculate the average value for each of the 42 regions over the time period.
# Groupby makes every records grouped by GeoID
air_quality_average=air_quality.groupby("GeoID")["Annual mean mcg/m3"].mean().reset_index(name="Average Value mcg/m3")
air_quality_average.head()

Unnamed: 0,GeoID,Average Value mcg/m3
0,101,6.94
1,102,6.93
2,103,7.0
3,104,6.96
4,105,7.37


In [None]:
# Plot the choropleth_map by using connecting the GeoID and the GEOCODE in Geojson file
fig = px.choropleth_map(
        air_quality_average,
        locations="GeoID",  # column name to match on
        color="Average Value mcg/m3",  # column name for values
        geojson=shapes,
        featureidkey="properties.GEOCODE",  # GeoJSON property to match on
        center={"lat": 40.71, "lon": -73.98},
        zoom=9,
        height=600,
        title="PM 2.5 Concentration across Community Districts",
    )

fig.show()

The spatial distribution of PM2.5 concentrations across New York City reveals a clear central–peripheral pattern. The highest pollution levels are concentrated in Manhattan, especially in Midtown. In contrast, the outer boroughs, including Staten Island, Eastern Queens, and parts of Southern Brooklyn, consistently show much lower concentrations. 

This contrast underscores significant environmental disparities between the urban core and surrounding residential areas. Within individual boroughs, additional variation exists; for example, western Queens and the southern Bronx exhibit higher concentrations than their more suburban counterparts.

#### Compare the average air pollution among borough

In [None]:
# Group the dataset by Borough and TimePeriod, calculate the mean PM2.5 concentration 
# within each group, and reset the index to create a clean DataFrame with a new column
# named "Average Value" that stores these averaged PM2.5 levels.
air_quality_borough=air_quality.groupby(["Borough","TimePeriod"])["Annual mean mcg/m3"].mean().reset_index(name="Average Value")
air_quality_borough.head()

Unnamed: 0,Borough,TimePeriod,Average Value
0,Bronx,2015,9.314286
1,Bronx,2016,7.842857
2,Bronx,2017,7.8
3,Bronx,2018,7.314286
4,Bronx,2019,6.942857
5,Bronx,2020,6.114286
6,Bronx,2021,6.671429
7,Bronx,2022,6.1
8,Bronx,2023,6.914286
9,Bronx,2024,6.457143


In [None]:
"""Plot the average PM2.5 concentration for each borough over the selected time period, 
   compare the pollution levels across boroughs, and 
   examine how these levels have changed over time."""
fig=px.line(
    air_quality_borough,
    x="TimePeriod",
    y="Average Value",
    color="Borough",
    title="Average PM2.5 Concentration of boroughs over time",
)
fig.show()

From this chart, we can see that Manhattan has the poorest air quality among the five boroughs, maintaining the highest PM2.5 concentrations from 2015 to 2024. In contrast, Staten Island consistently exhibits the best air quality.

In terms of trends, all boroughs show improvement beginning in 2015, with PM2.5 levels steadily declining through 2020. After 2020, the pattern becomes more variable, with noticeable increases in both 2021 and 2023.

#### Seasonal Differences in PM2.5 Concentration
PM2.5 concentrations often exhibit pronounced seasonal variation driven by weather conditions, atmospheric stability, and changes in emission activities throughout the year. Many studies suggest that PM2.5 levels tend to be higher in winter due to weaker atmospheric dispersion and increased heating-related emissions under colder temperatures. In contrast, concentrations are typically lower in summer, partly because of higher precipitation and more favorable dispersion conditions.

Based on these theories, the following section investigates whether PM2.5 concentrations in my dataset are, on average, higher in winter than in summer.

In [None]:
""" 
In the current air quality dataset, 
the PM2.5 concentrations for winter and summer are stored in two separate columns. 
To compare winter and summer values in our visualizations using a single “season” variable, 
we need to reshape the data into a long format. 
We first remove the unnecessary columns from the original DataFrame,
and then create a new variable called “Season” to label each observation accordingly.
"""
# reshape为了之后好画出winter和summer的图
air_quality_melted = (
    air_quality.drop(columns=["Geography","Annual mean mcg/m3"])
    .melt(
        id_vars=["TimePeriod", "GeoID","Borough"],
        var_name="Season",
        value_name="Season mean mcg/m3",
    )
)
# Clean the value in column "Season" by using "replace" to delete the useless words
air_quality_melted["Season"]=air_quality_melted["Season"].str.replace(" mean mcg/m3","")
air_quality_melted.head()

KeyError: "['GeoTypeDesc', 'GeoRank', 'BoroID', 'Area'] not found in axis"

In [None]:
"""
We create box plots for summer and winter PM2.5 concentrations. 
Because box plots visualize the distribution of a dataset
including its median, spread, and potential outliers,
they allow us to compare the overall patterns of PM2.5 levels across the two seasons. 
By examining these distributions side by side, 
we can assess whether winter and summer PM2.5 concentrations differ meaningfully.
"""
fig = px.box(air_quality_melted,
             x="Season", 
             y="Season mean mcg/m3",
             title="Winter vs Summer PM2.5 Levels",
             labels={"Season mean mcg/m3": "PM2.5 Concentration:mcg/m3"})
fig.show()

The box plot shows that summer PM2.5 levels are slightly higher, while winter levels tend to be lower. Both seasons show some variation across different areas, but the general pattern suggests that air pollution from PM2.5 is worse in summer than in winter.

The result shown in this figure contradicts our initial expectation, but it is nevertheless reasonable for several reasons: In New York City, PM2.5 tends to be higher in summer because warm temperatures, strong sunlight, and high humidity promote the formation of secondary particulate matter, which becomes the dominant source of PM2.5. Unlike some regions, winter heating does not significantly increase PM2.5 levels in NYC.

## Examining the Asthma Dataset in Depth

This dataset includes the asthma emergency department visits annual rate per 10,000.
This metric represents the number of asthma-related emergency department visits per 10,000 residents in each area of New York City within a year. Because it is a standardized rate, it removes the effect of population size differences across regions. This measure can therefore be used to describe the severity of asthma incidence in a given area.

In [None]:
# Load the dataset that contains data of asthma emergency deparment visits rate
Asthma=pd.read_csv("NYC EH Data Portal - Asthma emergency department visits (adults) (filtered).csv")
Asthma.head()

Unnamed: 0,TimePeriod,GeoTypeDesc,GeoID,GeoRank,BoroID,Borough,Geography,Area,Number,"Estimated annual rate per 10,000","Age-adjusted rate per 10,000"
0,2023,UHF 42,107,4,1,Bronx,Hunts Point - Mott Haven,Hunts Point - Mott Haven Bronx,1961,192.3,193.5
1,2023,UHF 42,106,4,1,Bronx,High Bridge - Morrisania,High Bridge - Morrisania Bronx,2717,178.0,179.0
2,2023,UHF 42,303,4,3,Manhattan,East Harlem,East Harlem Manhattan,1539,173.2,176.2
3,2023,UHF 42,105,4,1,Bronx,Crotona - Tremont,Crotona - Tremont Bronx,2588,171.9,171.0
4,2023,UHF 42,302,4,3,Manhattan,Central Harlem - Morningside Heights,Central Harlem - Morningside Heights Manhattan,2333,157.7,157.9


In [None]:
"""
Check the data type in this dataframe,especially for the data type of Estimated annual rate per 10,000
to make sure they're numerical type. So we can do further calculation without the conversion of their type
"""
Asthma.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 756 entries, 0 to 755
Data columns (total 11 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   TimePeriod                        756 non-null    int64  
 1   GeoTypeDesc                       756 non-null    object 
 2   GeoID                             756 non-null    int64  
 3   GeoRank                           756 non-null    int64  
 4   BoroID                            756 non-null    int64  
 5   Borough                           756 non-null    object 
 6   Geography                         756 non-null    object 
 7   Area                              756 non-null    object 
 8   Number                            756 non-null    object 
 9   Estimated annual rate per 10,000  756 non-null    float64
 10  Age-adjusted rate per 10,000      756 non-null    float64
dtypes: float64(2), int64(4), object(5)
memory usage: 65.1+ KB


In [None]:
# Delete some useless columns
Asthma = Asthma.drop(columns=["GeoTypeDesc", "GeoRank", "BoroID","Area"])

In [None]:
"""
To calculate the average asthma emergency visit rate for each area over this time period, 
we use groupby to group the data by region and then compute the period mean within each group.
By doing this, we can get the values to draw a map for each regions.
"""
Asthma_average=Asthma.groupby("GeoID")["Estimated annual rate per 10,000"].mean().reset_index(name="Average Value")
Asthma_average.head()

Unnamed: 0,GeoID,Average Value
0,101,58.616667
1,102,115.822222
2,103,159.494444
3,104,125.055556
4,105,250.927778


In [None]:
"""
Draw a chloropleth map of NYC, 
using the DataFrame we generated above with a GeoID and Average value column, 
and also the Geojson we loaded before.
"""

fig = px.choropleth_map(
    Asthma_average,
    locations="GeoID",  # column name to match on
    color="Average Value",  # column name for values
    geojson=shapes,
    featureidkey="properties.GEOCODE",  # GeoJSON property to match on
    center={"lat": 40.71, "lon": -73.98},
    zoom=9,
    height=600,
    title="Annual asthma emergyency department visits per 10,000 across Community Districts",
    )

fig.show()

The Bronx stands out as the area with the highest asthma emergency visit rates, with many districts shaded in bright yellow and orange. This suggests a disproportionately high asthma burden compared to other boroughs.

Central and East Brooklyn also show moderately high rates, though not as severe as in the Bronx.

In contrast, most districts in Manhattan, Queens, and Staten Island have relatively lower asthma emergency visit rates, reflected by darker blue and purple shades.

The spatial pattern highlights significant health disparities across NYC, with asthma outcomes clustering in specific neighborhoods—particularly low-income and historically underserved communities.

#### Compare the average asthma emergency visit rates among borough and their trends

In [None]:
# compute the average value for each borough to enable comparisons at the borough level
"""
We tell pandas to group all rows that belong to the same borough and time period, 
and We tell pandas to group all rows that belong to the same borough and time period for each group. 
Finally we use reset_index() turns it back into a clean DataFrame, 
and we rename the new column to “Average Rate” for clarity.
"""
Asthma_by_borough=Asthma.groupby(["Borough","TimePeriod"])["Estimated annual rate per 10,000"].mean().reset_index(name="Average Rate")
Asthma_by_borough

Unnamed: 0,Borough,TimePeriod,Average Rate
0,Bronx,2005,184.157143
1,Bronx,2006,201.414286
2,Bronx,2007,211.357143
3,Bronx,2008,217.728571
4,Bronx,2009,208.928571
...,...,...,...
85,Staten Island,2019,62.850000
86,Staten Island,2020,39.075000
87,Staten Island,2021,38.125000
88,Staten Island,2022,42.875000


In [None]:
# Check the data type of the TimePeriod column to ensure it is formatted correctly before using it for filtering
Asthma_by_borough.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90 entries, 0 to 89
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Borough       90 non-null     object 
 1   TimePeriod    90 non-null     int64  
 2   Average Rate  90 non-null     float64
dtypes: float64(1), int64(1), object(1)
memory usage: 2.2+ KB


In [None]:
"""
Plot the average Asthma Emergency Visit Rate for each borough from 2015 to 2023. 
Since the PM2.5 dataset covers the years 2015 to 2024, 
we restrict the asthma data to a similar time range so that the two line charts can be compared directly.
(But in this dataset, there is no data in 2015, so the graph starts from 2016.)
"""
fig=px.line(
    Asthma_by_borough[Asthma_by_borough["TimePeriod"]>=2015],
    x="TimePeriod",
    y="Average Rate",
    color="Borough",
    title="Average Asthma Emergency Visit Rate from 2016 to 2023",
    labels={"TimePeriod":"Year","Average Rate":"Average Asthma Emergency Visit Rate(per 10,000)"})
fig.show()

From 2016 to 2020, the average asthma emergency visit rate shows a clear downward trend across all boroughs. This pattern mirrors what we observed in the PM2.5 concentration plot, where PM2.5 levels also declined over the same period. After 2020, the asthma visit rate becomes relatively stable and even shows signs of increasing in several boroughs—again matching the post-2020 fluctuations seen in PM2.5 levels.

Taken together, these parallel trends suggest a potential relationship between air quality and asthma emergency visits at the borough level.

Next, we will merge the two datasets to compare these variables directly and explore their relationship in more detail.

## Combining Two Datasets 

In [None]:
"""
we merge the asthma dataset with the air quality dataset
using TimePeriod, GeoID, Borough, and Geography as the matching keys. 
This step allows us to bring the two datasets together 
so we can compare PM2.5 levels and asthma emergency visit rates for the same places and years.
"""
Combine=pd.merge(Asthma, air_quality, on=["TimePeriod","GeoID","Borough","Geography"])
Combine

Unnamed: 0,TimePeriod,GeoID,Borough,Geography,Number,"Estimated annual rate per 10,000","Age-adjusted rate per 10,000",Annual mean mcg/m3,Summer mean mcg/m3,Winter mean mcg/m3
0,2023,107,Bronx,Hunts Point - Mott Haven,1961,192.3,193.5,7.2,9.5,6.7
1,2023,106,Bronx,High Bridge - Morrisania,2717,178.0,179.0,6.9,9.3,6.5
2,2023,303,Manhattan,East Harlem,1539,173.2,176.2,6.9,9.2,6.5
3,2023,105,Bronx,Crotona - Tremont,2588,171.9,171.0,7.0,9.4,6.7
4,2023,302,Manhattan,Central Harlem - Morningside Heights,2333,157.7,157.9,6.9,9.2,6.6
...,...,...,...,...,...,...,...,...,...,...
331,2016,403,Queens,Flushing - Clearview,511,23.3,24.6,7.2,7.8,7.7
332,2016,209,Brooklyn,Bensonhurst - Bay Ridge,402,23.5,23.7,7.1,7.9,7.7
333,2016,305,Manhattan,Upper East Side,340,18.2,20.3,9.1,9.7,9.3
334,2016,308,Manhattan,Greenwich Village - SoHo,128,17.6,17.9,9.5,9.8,10.2


In [None]:
"""
create a scatter plot that visualizes the relationship between PM2.5 concentration
and asthma emergency visit rates. 
Each point represents a community district in a given year, 
and the colors distinguish the five boroughs, 
allowing us to see whether higher pollution levels are associated with higher asthma visit rates.
"""
fig=px.scatter(
    Combine,
    x="Annual mean mcg/m3",
    y="Estimated annual rate per 10,000",
    color="Borough",
    title="The Relationship between the PM 2.5 Concentration and Asthma Emergency Visit Rate",
    labels={"Annual mean mcg/m3": "PM 2.5 Concentration(mcg/m3)",
            "Estimated annual rate per 10,000":"Asthma Emergency Visit Rate (per 10,000)"})
fig.show()

From the scatter plot, we observe a clear overall positive relationship between PM2.5 concentration and asthma emergency visit rates. In other words, districts with higher levels of PM2.5 pollution tend to experience more asthma-related emergency visits. This upward trend is visible across most boroughs: as the PM2.5 values move from left to right, the corresponding asthma visit rates generally rise, forming a loosely ascending cloud of points. This pattern is consistent with medical research showing that fine particulate matter can trigger or worsen asthma symptoms, leading to more emergency cases.

However, there are also several points that do not follow this pattern. Many of these outliers are represented by red dots—corresponding to Manhattan. These points appear in the lower-right portion of the graph, showing unusually high PM2.5 concentrations but relatively low asthma emergency visit rates. This suggests that Manhattan may have unique factors—such as better healthcare access, socioeconomic differences, or population characteristics—that weaken the expected link between pollution and asthma outcomes.

This deviation suggests that PM2.5 exposure alone cannot fully explain asthma outcomes across New York City. Several factors may contribute to Manhattan’s distinct pattern. For example, indoor environment is a important trigger for asthma. Manhattan’s housing stock tends to be newer or better maintained relative to other boroughs, and residents may experience fewer indoor environmental triggers—such as mold, pests, and poor ventilation—that are known to exacerbate asthma. These social and environmental advantages can help mitigate the health impacts of outdoor air pollution, leading to lower emergency visit rates despite higher PM2.5 levels.

## Adding the Cockroach Dataset

It's important to examine other contributing factors—indoor environmental conditions. One particularly well-documented risk factor is pest infestation, including cockroaches, whose allergens are strongly associated with asthma attacks among both children and adults. To further understand the differences in asthma emergency visit rates across boroughs and neighborhoods, we next turn to housing data to explore whether the prevalence of cockroach infestation may help explain the geographic disparities observed in the asthma data.

In [None]:
# Load the cockroach dataset
cockroach=pd.read_csv("NYC EH Data Portal - Homes with cockroaches.csv")
cockroach.head()

Unnamed: 0,TimePeriod,GeoTypeDesc,GeoID,GeoRank,BoroID,Borough,Geography,Area,Number,Percent
0,2023,Borough,1,1,1,Bronx,Bronx,Bronx,-,"49.2 (47.9, 50.5)"
1,2023,Borough,2,1,2,Brooklyn,Brooklyn,Brooklyn,-,"31.1 (30.2, 31.9)"
2,2023,Borough,4,1,4,Queens,Queens,Queens,-,"28.0 (27.0, 29.0)"
3,2023,Borough,3,1,3,Manhattan,Manhattan,Manhattan,-,"25.1 (24.1, 26.1)"
4,2023,Borough,5,1,5,Staten Island,Staten Island,Staten Island,-,"10.0 (8.3, 11.7)"


In [None]:
# delete useless columns
cockroach=cockroach.drop(columns=["GeoTypeDesc", "GeoID","GeoRank", "BoroID","Geography","Area","Number"])

In [None]:
# merge the cockroach dataset with asthma dataset by same Borough and time period
asthma_cockroach=pd.merge(Asthma_by_borough,cockroach,on=["Borough","TimePeriod"])
asthma_cockroach.head()

Unnamed: 0,Borough,TimePeriod,Average Rate,Percent
0,Bronx,2008,217.728571,42.3
1,Bronx,2011,189.442857,"37.7 (35.4, 40)"
2,Bronx,2014,218.514286,"35.0 (32.8, 37.2)"
3,Bronx,2017,157.885714,"39.4 (36.8, 41.9)"
4,Bronx,2021,97.214286,"45.2 (43.3, 47.1)"


In [None]:
"""
In the original Percent column, each value includes not only the estimate 
but also a confidence interval in parentheses — for example:37.7 (35.4, 40)
Since I only need the main estimate (the percentage of households reporting cockroach sightings), 
I first remove everything inside the parentheses using a regular expression. 
This leaves me with clean numeric values such as 37.7. 
After stripping out the extra text, I convert the column into a float type so that it can be used for plotting.
"""
asthma_cockroach["Percent"]=asthma_cockroach["Percent"].str.replace(r"\s*\(.*?\)", "", regex=True)
asthma_cockroach["Percent"]=asthma_cockroach["Percent"].astype(float)
asthma_cockroach.head()

Unnamed: 0,Borough,TimePeriod,Average Rate,Percent
0,Bronx,2008,217.728571,42.3
1,Bronx,2011,189.442857,37.7
2,Bronx,2014,218.514286,35.0
3,Bronx,2017,157.885714,39.4
4,Bronx,2021,97.214286,45.2


In [None]:
# Draw a scatter plot between the Asthma Emergency Visit Rate and Percentage of Home with Cockroach
fig=px.scatter(
    asthma_cockroach,
    x="Average Rate",
    y="Percent",
    title="Percentage of Home with Cockroach VS Asthma Emergency Visit Rate",
    labels={"Average Rate":"Asthma Emergency Visit Rate(per 10,000)","Percent":"Percentage of Home with Cockroach(%)"}
)
fig.show()

This scatter plot shows a clear upward trend: districts with higher cockroach infestation rates tend to have higher asthma emergency visit rates.

While the relationship is not perfectly linear, the general pattern suggests a positive correlation, meaning that cockroach exposure may be an important environmental risk factor contributing to asthma severity. Districts with very high cockroach percentages (around 40–50%) also appear among those with the highest asthma emergency visit rates, reinforcing this association.

This visual pattern supports public health findings that indoor allergens—especially cockroach allergens—can significantly worsen asthma symptoms and increase the likelihood of severe asthma episodes requiring emergency care.

## Conclusion
This project began with the hypothesis that higher PM2.5 concentrations would be associated with higher asthma emergency department visit rates. The results partially support this idea.

Across years, both PM2.5 levels and asthma ED visit rates decreased from 2015–2020 and then fluctuated afterward, suggesting a parallel trend between air quality and asthma outcomes. And the scatter plot also revealed a general positive correlation between PM2.5 and asthma ED visits, aligning with the hypothesis.

However, there are some exceptions in the scatter plot. Several Manhattan districts showed high PM2.5 but low asthma visit rates, indicating that factors beyond outdoor pollution—such as housing quality, indoor allergens, and socioeconomic conditions—shape asthma risks.

Our analysis of the cockroach dataset supported this idea: districts with higher cockroach infestation levels showed higher asthma visit rates, highlighting the importance of indoor environmental conditions.

In summary, PM2.5 contributes to asthma disparities, but it is not the only driver. A full explanation requires considering both outdoor air pollution and indoor housing conditions, suggesting that effective public health interventions must address air quality, housing quality, and environmental equity together.