# Health Science Data Analysis

In this project, we analyze health science data focusing on deaths caused by pneumonia, flu, or COVID-19 in various U.S. states between 2020 and 2023. To relate this to this year's theme, travel, we are looking at the relationship with travel patterns using quarterly travel data for the U.S.

In this Jupyter Notebook, we are creating a data visualization for the ratio of deaths caused by pneumonia, flu, or COVID-19 with the total number of deaths in that week for each state, over time. The reason we use this ratio is because it is more representative of the impact of these airborne illnesses on the death rate in each state.


In [4]:
import pandas as pd
import plotly.express as px

## Data Loading and Preprocessing

In this section, we load the health science dataset, filtered through excel to only include the US states and `All Ages` age group, and preprocess it by handling missing values and filtering the relevant columns. We also add necessary columns such as `Death Ratio` and `State_Abbrev` for downstream analysis.

In [3]:
data = pd.read_csv('HealthScienceDataStates.csv')
data['Week Ending Date'] = pd.to_datetime(data['Week Ending Date'])
data.head()

  data['Week Ending Date'] = pd.to_datetime(data['Week Ending Date'])


Unnamed: 0,Data As Of,Start Week,End Week,MMWRyear,MMWRweek,Week Ending Date,Group,Indicator,Jurisdiction,Age Group,COVID-19 Deaths,Total Deaths,Pneumonia Deaths,Influenza Deaths,Pneumonia or Influenza,"Pneumonia, Influenza, or COVID-19 Deaths"
0,11/2/23,12/29/19,1/4/20,2020,1,2020-01-04,By Week,Week-ending,Alabama,All Ages,0.0,1098.0,67.0,,72.0,72.0
1,11/2/23,12/29/19,1/4/20,2020,1,2020-01-04,By Week,Week-ending,Alaska,All Ages,0.0,91.0,,,,
2,11/2/23,12/29/19,1/4/20,2020,1,2020-01-04,By Week,Week-ending,Arizona,All Ages,0.0,1278.0,83.0,,87.0,87.0
3,11/2/23,12/29/19,1/4/20,2020,1,2020-01-04,By Week,Week-ending,Arkansas,All Ages,0.0,697.0,57.0,,63.0,63.0
4,11/2/23,12/29/19,1/4/20,2020,1,2020-01-04,By Week,Week-ending,California,All Ages,0.0,5865.0,465.0,43.0,508.0,508.0


In [5]:

state_abbrev = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
}
data['State_Abbrev'] = data['Jurisdiction'].map(state_abbrev)

data_filtered = data[data['Pneumonia, Influenza, or COVID-19 Deaths'].notna()]
data_filtered0= data[data['COVID-19 Deaths'].notna()]
data_filtered1= data[data['Pneumonia Deaths'].notna()]
data_filtered2= data[data['Influenza Deaths'].notna()]

df = data_filtered[['Jurisdiction', 'Week Ending Date', 'Pneumonia, Influenza, or COVID-19 Deaths', 'State_Abbrev']].copy()
df = df.sort_values('Week Ending Date')

dfr = data_filtered[['Jurisdiction', 'Week Ending Date', 'Pneumonia, Influenza, or COVID-19 Deaths', 'Total Deaths', 'State_Abbrev']].copy()

dfr['Death Ratio'] = dfr['Pneumonia, Influenza, or COVID-19 Deaths'] / dfr['Total Deaths']

dfr = dfr.sort_values('Week Ending Date')


df0 = data_filtered0[['Jurisdiction', 'Week Ending Date', 'COVID-19 Deaths', 'State_Abbrev']].copy()
df0 = df0.sort_values('Week Ending Date')

df1 = data_filtered1[['Jurisdiction', 'Week Ending Date', 'Pneumonia Deaths', 'State_Abbrev']].copy()
df1 = df1.sort_values('Week Ending Date')

df2 = data_filtered2[['Jurisdiction', 'Week Ending Date', 'Influenza Deaths', 'State_Abbrev']].copy()
df2 = df2.sort_values('Week Ending Date')

In [6]:

dfr.head()

Unnamed: 0,Jurisdiction,Week Ending Date,"Pneumonia, Influenza, or COVID-19 Deaths",Total Deaths,State_Abbrev,Death Ratio
0,Alabama,2020-01-04,72.0,1098.0,AL,0.065574
28,Nevada,2020-01-04,43.0,506.0,NV,0.08498
29,New Hampshire,2020-01-04,19.0,266.0,NH,0.071429
30,New Jersey,2020-01-04,82.0,1641.0,NJ,0.04997
31,New Mexico,2020-01-04,42.0,433.0,NM,0.096998


## Data Visualization: Health-Related Deaths by State

We visualize health-related deaths (pneumonia, flu, and COVID-19) across various states over time. These choropleth maps provide insights into spatial variations in health outcomes. We use the plotly library to visualize this data.


In [8]:
fig = px.choropleth(
    dfr,
    locations='State_Abbrev',
    locationmode='USA-states',
    color='Death Ratio',
    hover_name='Jurisdiction',
    animation_frame=dfr['Week Ending Date'].dt.strftime('%Y-%m-%d'),
    color_continuous_scale='Blues',
    scope="usa",
    title="Ratio of Pneumonia, Influenza, or COVID-19 Deaths to Total Deaths Over Time by State",
    range_color=(0, dfr['Death Ratio'].max())
)
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 240
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 40

fig.update_layout(
    geo=dict(
        scope='usa',
        projection_type='albers usa'
    ),
    title_x=0.5,
    height=600,
    width=1000
)


fig.show()

### Additional Visualizations

These additional visualizations are not used in our presentation, but they show deaths separated by each disease (pneumonia, flu, and COVID-19).

In [9]:
import plotly.express as px

fig = px.choropleth(
    df0,
    locations='State_Abbrev',
    locationmode='USA-states',
    color='COVID-19 Deaths',
    hover_name='Jurisdiction',
    animation_frame=df0['Week Ending Date'].dt.strftime('%Y-%m-%d'),
    color_continuous_scale='Blues',
    scope="usa",
    title="Deaths from COVID-19 Over Time by State",
)
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 50

fig.update_layout(
    geo=dict(
        scope='usa',
        projection_type='albers usa'
    ),
    title_x=0.5,
    height=600,
    width=1000
)

fig.show()

In [10]:
import plotly.express as px

fig = px.choropleth(
    df1,
    locations='State_Abbrev',
    locationmode='USA-states',
    color='Pneumonia Deaths',
    hover_name='Jurisdiction',
    animation_frame=df1['Week Ending Date'].dt.strftime('%Y-%m-%d'),
    color_continuous_scale='Blues',
    scope="usa",
    title="Deaths from Pneumonia Over Time by State",
)
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 50

fig.update_layout(
    geo=dict(
        scope='usa',
        projection_type='albers usa'
    ),
    title_x=0.5,
    height=600,
    width=1000
)

fig.show()

In [11]:
import plotly.express as px

fig = px.choropleth(
    df2,
    locations='State_Abbrev',
    locationmode='USA-states',
    color='Influenza Deaths',
    hover_name='Jurisdiction',
    animation_frame=df2['Week Ending Date'].dt.strftime('%Y-%m-%d'),
    color_continuous_scale='Blues',
    scope="usa",
    title="Deaths from Influenza Over Time by State",
)
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 50

fig.update_layout(
    geo=dict(
        scope='usa',
        projection_type='albers usa'
    ),
    title_x=0.5,
    height=600,
    width=1000
)

fig.show()

In [12]:
import plotly.express as px

fig = px.choropleth(
    df,
    locations='State_Abbrev',
    locationmode='USA-states',
    color='Pneumonia, Influenza, or COVID-19 Deaths',
    hover_name='Jurisdiction',
    animation_frame=df['Week Ending Date'].dt.strftime('%Y-%m-%d'),
    color_continuous_scale='Blues',
    scope="usa",
    title="Deaths from Pneumonia, Influenza, or COVID-19 Over Time by State",
)

fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 50

fig.update_layout(
    geo=dict(
        scope='usa',
        projection_type='albers usa'
    ),
    title_x=0.5,
    height=600,
    width=1000
)

fig.show()