# Data Visualization Final Project
## Cleo Falvey

This project features data regarding death by pneumonia, collected from the CDC [here](https://data.cdc.gov/NCHS/Provisional-Death-Counts-for-Influenza-Pneumonia-a/ynw2-4viq). Pneumonia is a pathology that includes the inflammation of the lung. Diseases that cause pneumonia, such as influenza and Covid-19, have come into the spotlight recently with the emergence of the pandemic.

The Covid-19 pandemic has become a huge public health issue in the United States ever since being identified in Wuhan, China in 2019. As of the time of this project's submission, there have been over 50 million cases and 763,000 deaths in the US alone. This has been in part due to the fact that it is a novel virus, poor management strategies, and the pandemic of disinformation surrounding Covid-19 vaccines. In a NSF-REU program at the University of Chattanooga, Tennessee back in 2020, I used different data from Apple and Google mobility scores as a proxy for social distancing. I correlated this with R_0, the reproductive number of the virus, and temperature. You can see that visualization [here](https://www.youtube.com/watch?v=SJlsJsTr03s). I think that the data is very interesting and am excited to work with it in a new way, especially after the pandemic has progressed for over a year and a half now. 

To start working with this data, let's first import our packages. We will be using a variety of packages for the each step of the visualizations. I have listed them below and commented their purpose.

In [None]:
import pandas as pd # for reading in and cleaning data
from bokeh.io import output_notebook # for visualizing data
from bokeh.plotting import figure, show # for printing data to our screen
from math import pi # to potentially make a pie chart
from bokeh.palettes import Category20c # to potentially make a pie chart
from bokeh.transform import cumsum # to potentially make a pie chart
import hvplot.pandas # to create interactives
import numpy as np # to use math
import panel as pn  # to display dashboards

# Data Importation

In [None]:
pneumonia = pd.read_csv("pneumonia.csv")

In [None]:
# ideas for graphs: want to have a variety of different visualizations with compelling colors

# by age group - bar graphs?
# by state (map)
# line graph of x - time, y - counts of Pneumonia, Covid, Flu

# line graph of correlation between covid and flu - color by case count?

print(pneumonia)



In [None]:
pneumonia.columns = pneumonia.columns.str.replace(' ','_')
pneumonia.columns = pneumonia.columns.str.replace('-','_')
pneumonia['End_Week'] = pd.to_datetime(pneumonia['End_Week'])

    It's now time to start visualizing our data. Let's first look at the deaths due to Covid-19 over time. Let's summarize our data by grouping it by age group. We also want to filter our data to be the total sums of the data by filtering it so that it includes the United States data as the locality and not the other states. Here, we can see total deaths from pneumonia (from Covid-19, influenza, and other souces) broken down by age group.

In [None]:
pneumonia_us = pneumonia[pneumonia.Jurisdiction == 'United States']
pneumonia_us.groupby('Age_Group', axis=0)

output_notebook()
# pneumonia_us.plot(kind='scatter',x='End_Week',y='Total_Deaths')

color_mapper = {
    'All Ages':'teal',
    '0-17 years':'indigo',
    '18-64 years':'orange',
    '65 years and over':'skyblue'
}
def visualize_data(selections):
    # create the mask by checking which labels are in our selections
    # and which values are less than or equal to our threshold
    mask = (pneumonia_us.Age_Group.isin(selections))
    return pneumonia_us[mask].hvplot.scatter( # apply the mask and scatter via hvPlot
        x='End_Week',                       # put the id on the x-axis
        y='Total_Deaths',                    # put the value on the y-xis
        color='Age_Group',
        cmap=color_mapper
    )
label_selector = pn.widgets.MultiSelect(name='Age_Group', options=['All Ages', '0-17 years', '18-64 years','65 years and over'], value=['All Ages', '0-17 years', '18-64 years','65 years and over'])
interaction = pn.interact(visualize_data, selections=label_selector)

interaction

We can see that in the visualization above, the highest mortalities were people who were 65 years or older, and the lowest mortalities were in the 0-17 year old age class. However, one thing I would have liked to see in the CDC data is a higher stratification of age groups. The middle age class, 18 to 64 years, spans 46 years of life and obviously a 64-year-old is going to have different health outcomes and prognoses than a 19-year-old despite still being in the same age category in this dataset.

However, let's do some more digging. Let's look at the response by state (amongst all age classes). We will be using only the influenza data to do this because the data table actually becomes more inclusive in another unintuitive way - going from influenza only to influenza or Covid-19 without further breakdowns). 

In [None]:
import matplotlib.pyplot as plt

flu = pneumonia[['Influenza_Deaths','End_Week', 'Jurisdiction']]
flu = flu[~ flu.Jurisdiction.isin(['United States', 'HHS Region 1', 'HHS Region 2' ,
    'HHS Region 3' ,'HHS Region 4','HHS Region 5', 'HHS Region 6' ,'HHS Region 7', 'HHS Region 8',
 'HHS Region 9', 'HHS Region 10'])]
          
# flu.plot(kind='scatter',x='End_Week', y='Influenza_Deaths')

flu.Jurisdiction.unique()

In [None]:
import geopandas as gpd

usa = gpd.read_file("./states21basic/")
print(usa)
usa.plot()

In [None]:
mergedata = usa.merge(pneumonia, how='inner', left_on='state_name', right_on='Jurisdiction')
mergedata


mergedata.plot()



In [None]:
slider = Slider(title = 'Month', 
                start = 1, end = 12, 
                step = 1, value = 1)

In [None]:

mergedata.hvplot(color='Total_Deaths')