# Rethinking PM2.5 Exposure: Chronic Disease Trends in the U.S. (2015 â€“ 2019)
This project aims to analyze temporal and geographic trends of chronic diseases, including cardiovascular disease, respiratory disorders, and cancers, in the United States from 2015 to 2020. By examining variations across states and over time, the study seeks to improve understanding of population health patterns and support public health planning. In addition, the analysis considers PM2.5 air pollution, which is linked to respiratory diseases, to interpret chronic disease trends in the context of global environmental patterns.

In [7]:
import os
import pandas as pd
from config import DATA_DIR, aqs_epa_url, chronic_url, who_url
from load import retrieve_file_pm25, retrieve_file_chronic, retrieve_file_pm25_who
from process import process_pm25, process_chronic, process_pm25_who

#  U.S. EPA AQS API data

In [8]:
# Create a data directory
os.makedirs(DATA_DIR, exist_ok=True)

# --- EPA AQS API data ---
pm25_data = retrieve_file_pm25(aqs_epa_url)
pm25_5states_5years = process_pm25(pm25_data)
if pm25_5states_5years is not None:
    df_pm25 = pd.DataFrame(pm25_5states_5years)
    print(f"\nU.S. PM2.5 Data Head:\n{df_pm25.head()}\n")

Loading data from https://aqs.epa.gov/data/api/annualData/byState...
U.S. PM2.5 concentration data loaded successfully

Processing U.S. PM2.5 data...
    Data length: Year 20150101 - PM 2.5 concentration = 2910
    Data length: Year 20160101 - PM 2.5 concentration = 2588
    Data length: Year 20170101 - PM 2.5 concentration = 2826
    Data length: Year 20180101 - PM 2.5 concentration = 3197
    Data length: Year 20190101 - PM 2.5 concentration = 2902
U.S. PM2.5 concentration data processed successfully

U.S. PM2.5 Data Head:
                                                     20150101  \
California  [9.592542, 9.592542, 9.592542, 9.592542, 9.592...   
Colorado    [4.175, 4.175, 4.14359, 4.14359, 4.14359, 4.14...   
Illinois    [10.367273, 10.367273, 10.367273, 10.367273, 1...   
New York    [7.791525, 7.791525, 7.791525, 7.791525, 7.791...   
Texas       [9.575472, 9.575472, 9.575472, 9.575472, 9.575...   

                                                     20160101  \
California  [

# U.S. Chronic disease data from web

In [9]:
chronic_data = retrieve_file_chronic(chronic_url)
chronic_5state_5years = process_chronic(chronic_data)
if chronic_5state_5years is not None:
    df_chronic = pd.DataFrame(chronic_5state_5years)
    print(f"\nU.S. Chronic Disease Data Head:\n{df_chronic.head()}")

Loading data from https://data.cdc.gov/api/views/hksd-2xuw/rows.json?accessType=DOWNLOAD...
U.S. Chronic disease data loaded successfully

Processing U.S. chronic disease data...
    Data length: U.S. chronic disease (5 years) = 95256
    Data length: U.S. chronic disease (5 states, 5 years) = 8782
U.S. Chronic disease data processed successfully

U.S. Chronic Disease Data Head:
  year_start year_end       state  disease         unit value  \
0       2015     2019  California   Cancer       Number   486   
1       2015     2019    Colorado   Cancer       Number  2880   
2       2015     2019    New York   Cancer       Number  2547   
3       2015     2019       Texas   Cancer  per 100,000   2.9   
4       2019     2019    Illinois  Alcohol       Number   4.6   

                                      geolocation  
0   POINT (-120.99999953799971 37.63864012300047)  
1  POINT (-106.13361092099967 38.843840757000464)  
2    POINT (-75.54397042699964 42.82700103200045)  
3   POINT (-99.4267

# Global PM2.5 data from Google drive

In [10]:
pm25_who_data = retrieve_file_pm25_who(who_url, extract_dir=DATA_DIR)
pm25_who_5years = process_pm25_who(pm25_who_data)
if pm25_who_5years is not None:
    df_pm25_who = pd.DataFrame(pm25_who_5years)
    print(f"\nPM 2.5 Worldwide Data Head:\n{df_pm25_who.head()}\n")

Loading data from https://drive.google.com/file/d/1Biiamr8qiEv3IZi0o8E7O1ylMBfcuBJh/view?usp=share_link...
Worldwide PM2.5 concentration data saved to ../data/who_pm25.csv
Loading ../data/who_pm25.csv into DataFrame...
Worldwide PM2.5 concentration data loaded successfully

Processing PM 2.5 worldwide data...
    Data length: PM 2.5 worldwide (5 years) = 4725
PM 2.5 worldwide data processed successfully

PM 2.5 Worldwide Data Head:
                                           Indicator  \
0  Concentrations of fine particulate matter (PM2.5)   
1  Concentrations of fine particulate matter (PM2.5)   
2  Concentrations of fine particulate matter (PM2.5)   
3  Concentrations of fine particulate matter (PM2.5)   
4  Concentrations of fine particulate matter (PM2.5)   

                                            Location  Period  FactValueNumeric  
0                                              Kenya    2019             10.01  
1                                Trinidad and Tobago    2019     