# Research question: 
Is there a critical influence of Air pollution and Socioeconomic status in shaping cardiovascular disease morbidity, mortality rates across U.S. states and what is the public health and social justice implication?

# Problem statement: 
Cardiovascular disease (CVD) is a leading cause of death in the United States, with growing evidence suggesting that air pollution exposure measured as particulate matter 2.5(PM 2.5) influences cardiovascular morbidity, mortality and disproportionately affects low-income populations. Individuals from lower socioeconomic backgrounds are more likely to live in areas with higher pollution levels, overcrowding, limited healthcare access, and economic stressors that contribute to CVD risk factors such as hypertension and obesity. These inequalities raise concerns about how socioeconomic and environmental conditions intersect in shaping public health outcomes. To what extent does socioeconomic status influence cardiovascular mortality rates, and how does exposure to air pollution also influence health risks in disadvantaged populations?

# Data Definition 

# American Community Survey: 5-Year Estimates.
Last Updated: July 19, 2023.
https://catalog.data.gov/dataset/american-community-survey-5-year-estimates-data-profiles-5-year
This dataset consists of 2400 variables as part of the American community survey which provides data annually. The dataset covers broad social, housing, economic and demographic variables in all U.S. nations, states, districts, counties.The data are presented as counts and percentages.

# PM2.5 and cardiovascular mortality rate.
Last Updated: November 12, 2020
https://catalog.data.gov/dataset/annual-pm2-5-and-cardiovascular-mortality-rate-data-trends-modified-by-county-socioeconomi
The dataset comprises socioeconomic status information for 2,132 counties across the United States, provided by the U.S. Environmental Protection Agency. It also includes average annual cardiovascular mortality rates and total particulate matter 2.5 concentrations for each county over a 21-year span (1990–2010). The cardiovascular mortality data was collected from the U.S. National Center for Health Statistics, while PM2.5 levels were estimated using the EPA’s Community Multiscale Air Quality (CMAQ) modeling system. Additionally, socioeconomic data was extracted from the U.S. Census Bureau. 

# Heart Disease Mortality by State.
Last Updated: February 25, 2022
https://www.cdc.gov/nchs/pressroom/sosmap/heart_disease_mortality/heart_disease.htm
The dataset shows the number of deaths per 100,000 population attributed to heart disease in U.S. states with variables like death rate and number of deaths. It also adjusts for differences in age distribution and population size.



# Hypertension Mortality by State
Last Updated: March 3, 2022
https://www.cdc.gov/nchs/pressroom/sosmap/hypertension_mortality/hypertension.htm
The dataset shows the number of deaths per 100,000 population attributed to hypertension in U.S. states with variables like death rate and number of deaths. It also adjusts for differences in age distribution and population size.






In [33]:
# Import libraries
import numpy as np                  # Scientific Computing
import pandas as pd                 # Data Analysis
import matplotlib.pyplot as plt     # Plotting
import seaborn as sns               # Statistical Data Visualization

# makes sure pandas returns all the rows and columns for the dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Force pandas to display full numbers instead of scientific notation
# pd.options.display.float_format = '{:.0f}'.format

# Library to suppress warnings
import warnings
warnings.filterwarnings('ignore')

In [41]:
# Read the dataset
path = pd.read_csv('/Users/bayowaonabajo/Downloads/SES_PM25_CMR_data/County_annual_PM25_CMR.csv')

# Create the Dataframe
df_annualcounty_pm25_cmr = pd.DataFrame(path)

In [43]:
# Read the dataset
path = pd.read_csv('/Users/bayowaonabajo/Downloads/SES_PM25_CMR_data/County_SES_index_quintile.csv')

# Create the Dataframe
df_county_ses_index_quintile = pd.DataFrame(path)

In [37]:
# Read the dataset
path = pd.read_csv('/Users/bayowaonabajo/Downloads/data-table-heart-dx-mort.csv')

# Create the Dataframe
df_heart_dx_mort = pd.DataFrame(path)

In [39]:
# Read the dataset
path = pd.read_csv('/Users/bayowaonabajo/Downloads/data-table-htn-dx-mort.csv')

# Create the Dataframe
df_htn_dx_mort = pd.DataFrame(path)

In [58]:
# Read the dataset
path = pd.read_csv('/Users/bayowaonabajo/Downloads/acs_2022_states.csv')

# Create the Dataframe
df_acs_2022_states = pd.DataFrame(path)  

In [48]:
# Display first ten rows of the dataframe
df_annualcounty_pm25_cmr.head(10)

Unnamed: 0.1,Unnamed: 0,FIPS,Year,PM2.5,CMR
0,1,1001,1990,9.749792,471.758888
1,2,1001,1991,9.069443,456.869651
2,3,1001,1992,9.105352,520.014377
3,4,1001,1993,8.752873,454.436425
4,5,1001,1994,9.024049,415.035332
5,6,1001,1995,8.404545,352.065432
6,7,1001,1996,8.349826,452.984639
7,8,1001,1997,8.5091,420.085364
8,9,1001,1998,8.566814,486.99475
9,10,1001,1999,9.059593,417.782427


In [50]:
df_county_ses_index_quintile.head(10)

Unnamed: 0.1,Unnamed: 0,FIPS,SES_index_1990,SES_index_2000,SES_index_2010,SES_quintile_1990,SES_quintile_2000,SES_quintile_2010
0,1,1001,-0.079387,-0.322846,-0.40515,Q3,Q3,Q2
1,2,1003,-0.18724,-0.467794,-0.403987,Q3,Q2,Q2
2,3,1005,1.279538,2.013751,1.740142,Q5,Q5,Q5
3,4,1009,0.124421,-0.375181,-0.405849,Q4,Q3,Q2
4,5,1011,2.877256,3.519681,2.617074,Q5,Q5,Q5
5,6,1013,1.922153,1.858747,1.680438,Q5,Q5,Q5
6,7,1015,0.103711,0.44846,0.913785,Q4,Q4,Q5
7,8,1017,0.660426,0.829457,1.443492,Q4,Q5,Q5
8,9,1021,0.492201,0.316738,0.340982,Q4,Q4,Q4
9,10,1023,1.802146,1.774375,0.742904,Q5,Q5,Q5


In [52]:
df_heart_dx_mort.head(10)

Unnamed: 0,YEAR,STATE,RATE,DEATHS,URL
0,2022,AL,234.2,14958,/nchs/pressroom/states/alabama/al.htm
1,2022,AK,145.7,1013,/nchs/pressroom/states/alaska/ak.htm
2,2022,AZ,148.5,14593,/nchs/pressroom/states/arizona/az.htm
3,2022,AR,224.1,8664,/nchs/pressroom/states/arkansas/ar.htm
4,2022,CA,142.4,66340,/nchs/pressroom/states/california/ca.htm
5,2022,CO,131.4,8389,/nchs/pressroom/states/colorado/co.htm
6,2022,CT,137.8,6899,/nchs/pressroom/states/connecticut/ct.htm
7,2022,DE,156.8,2220,/nchs/pressroom/states/delaware/de.htm
8,2022,District of Columbia,182.6,1239,/nchs/pressroom/states/DC/DC1.htm
9,2022,FL,140.9,49877,/nchs/pressroom/states/florida/fl.htm


In [54]:
df_htn_dx_mort.head(10)

Unnamed: 0,YEAR,STATE,RATE,DEATHS,URL
0,2022,AL,13.2,849,/nchs/pressroom/states/alabama/al.htm
1,2022,AK,8.6,56,/nchs/pressroom/states/alaska/ak.htm
2,2022,AZ,11.3,1109,/nchs/pressroom/states/arizona/az.htm
3,2022,AR,12.1,454,/nchs/pressroom/states/arkansas/ar.htm
4,2022,CA,14.4,6727,/nchs/pressroom/states/california/ca.htm
5,2022,CO,6.4,400,/nchs/pressroom/states/colorado/co.htm
6,2022,CT,7.7,386,/nchs/pressroom/states/connecticut/ct.htm
7,2022,DE,8.1,113,/nchs/pressroom/states/delaware/de.htm
8,2022,District of Columbia,11.9,82,/nchs/pressroom/states/DC/DC1.htm
9,2022,FL,9.3,3289,/nchs/pressroom/states/florida/fl.htm


In [60]:
# Display first ten rows of the dataframe
df_acs_2022_states.head(10)

Unnamed: 0,state,median_income,total_population_poverty,poverty_count,total_population_uninsured,uninsured_count,total_population_education,education_count,state.1,poverty_rate,uninsured_rate,education_percent_highschool
0,Alabama,59609,4890427,768897,4944981,39485,3428520,572252,1,15.722492,0.798486,16.690934
1,Alaska,86370,717293,75227,706392,16409,485871,93744,2,10.487625,2.322931,19.29401
2,Arizona,72581,7017776,916876,7060320,147689,4878959,958447,4,13.065051,2.091817,19.644498
3,Arkansas,56335,2931377,475729,2964272,39858,2031847,317437,5,16.228858,1.344613,15.623076
4,California,91905,38643585,4685272,38874540,312643,26842698,5935292,6,12.12432,0.804236,22.111384
5,Colorado,87598,5653289,540105,5675719,66669,3982760,1083618,8,9.553819,1.174635,27.207715
6,Connecticut,90213,3507563,355692,3567016,22291,2520790,573917,9,10.140716,0.62492,22.767347
7,Delaware,79325,969075,107790,979853,8413,700364,139213,10,11.122978,0.858598,19.877235
8,District of Columbia,101722,649184,98039,661596,3359,484596,124860,11,15.101882,0.507712,25.765793
9,Florida,67917,21171700,2725633,21300363,336566,15579847,3154240,12,12.873945,1.580095,20.245642


In [63]:
# Display last ten rows of the dataframe
df_annualcounty_pm25_cmr.tail(10)

Unnamed: 0.1,Unnamed: 0,FIPS,Year,PM2.5,CMR
44762,44763,56037,2001,3.429981,358.140819
44763,44764,56037,2002,3.46839,270.540457
44764,44765,56037,2003,3.320721,302.581187
44765,44766,56037,2004,3.164634,323.950863
44766,44767,56037,2005,3.340059,272.988165
44767,44768,56037,2006,3.77691,247.510138
44768,44769,56037,2007,3.609803,292.450269
44769,44770,56037,2008,3.2971,182.189745
44770,44771,56037,2009,3.119896,242.828987
44771,44772,56037,2010,3.230996,254.860863


In [None]:
df_county_ses_index_quintile.tail(10)

In [None]:
df_heart_dx_mort.tail(10)

In [None]:
df_htn_dx_mort.tail(10)

In [None]:
# Display last ten rows of the dataframe
df_acs_2022_states.tail(10)

In [68]:
# This is the number of rows and columns in the data
df_annualcounty_pm25_cmr.shape

(44772, 5)

In [None]:
df_county_ses_index_quintile.shape

In [None]:
df_heart_dx_mort.shape

In [None]:
df_htn_dx_mort.shape

In [66]:
df_acs_2022_states.shape

(52, 12)