# Deaths above or below normal
> Making a layered chart comprising an bar chart of weekly excess deaths, a tick chart of weekly COVID-19 deaths and a rectangle chart highlighting the timeperiod when the pandemic tore through the states and cities.

- toc: false
- hide: false
- branch: master
- badges: true
- comments: false
- author: Shantam Raj
- categories: []
- image: images/above_below_normal.png

We will make charts from the NYT article on [What Is the Real Coronavirus Death Toll in Each State?](https://www.nytimes.com/interactive/2020/05/05/us/coronavirus-death-toll-us.html)

The charts look like the following -  
![deaths above or below normal](images/above_below_normal.png)

**Whats the purpose of this visualization?**

> Comparing recent totals of deaths from all causes can provide a more complete picture of the pandemic’s impact than tracking only deaths of people with confirmed diagnoses. Epidemiologists refer to fatalities in the gap between the observed and normal numbers of deaths as “excess deaths.” 

> Indeed, in nearly every state with an unusual number of deaths in recent weeks, that number is higher than the state’s reported number of deaths from Covid-19. On our charts, we have marked the number of official coronavirus deaths with red lines, so you can see how they match up with the total number of excess deaths. 

> Measuring excess deaths is crude because it does not capture all the details of how people died. But many epidemiologists believe it is the best way to measure the impact of the virus in real time. It shows how the virus is altering normal patterns of mortality where it strikes and undermines arguments that it is merely killing vulnerable people who would have died anyway. 

> Public health researchers use such methods to measure the impact of catastrophic events when official measures of mortality are flawed. 

> Measuring excess deaths does not tell us precisely how each person died. It is likely that most of the excess deaths in this period are because of the coronavirus itself, given the dangerousness of the virus and the well-documented problems with testing. But it is also possible that deaths from other causes have risen too, as hospitals have become stressed and people have been scared to seek care for ailments that are typically survivable. Some causes of death may be declining, as people stay inside more, drive less and limit their contact with others. 

We will use 2 datasets to generate our chart -

1. The excess deaths dataset from NYT
2. The COVID-19 deaths dataset also from NYT

Luckily both are in the same GitHub repository - [NYT Covid-19 Data](https://github.com/nytimes/covid-19-data)
However we need to do some significant preprocessing to arrive at the results. It took me a good amount of time to figure out the whole graph and once I had done it it just made so much sense :relieved:

The way these graphs are made is as follows -

1. First we chart the excess deaths. Excess deaths is calculated as the difference b/w all cause mortality data with expected deaths. These data are available from CDC().
2. NYT publishes the excess deaths data for NYC, so for starters we will use that before moving on to the other states.
3. Then we need to get the covid-19 related deaths from NYC which we can get from the NYT dataset or JHU CSSE dataset.
4. The challenge is actually combining the above.

#### How do we combine the above data?
The excess deaths data is weekly. With the starting and ending dates given (7 days duration), so what we are gonna do is we have to transform the COVID-19 deaths data in the same weekly format.

In [46]:
import pandas as pd
import numpy as np
import altair as alt

excess_uri = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/excess-deaths/deaths.csv'
county_uri = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'

ex_df = pd.read_csv(excess_uri)
co_df = pd.read_csv(county_uri)


Extracting data for NYC for year 2020 -

In [47]:
# collapse_show
ex_nyc = ex_df[(ex_df['placename'] == 'New York City')]
ex_nyc['year'] = ex_nyc['year'].astype(int)
ex_nyc = ex_nyc[ex_nyc['year'] == 2020]


In [48]:
# collapse_show
excess_deaths_chart = alt.Chart(ex_nyc).mark_bar(width=5).encode(
    x='week:N',
    y='excess_deaths:Q',
    color = alt.condition(alt.datum.excess_deaths>0, alt.value('orange'), alt.value('steelblue'))
).properties(height=500, width=alt.Step(10))

In [49]:
excess_deaths_chart

This was the easy part and we can see that we are following the trends properly by comparing it with the NYT chart -  
![](images/nyc_above_below_normal.png)  

There are some differences, which I think is due to the fact that the dataset provided by NYT is actually slightly different than the data they have used to plot. Also there is actually more data used in the article than there is in the GitHub repo. So to make the charts look completely similar we will have to dig in the CDC dataset - which we will ignore for now.

Now we will transform the COVID-19 deaths data to weekly deaths data

Now we will transform the COVID-19 deaths data

In [50]:
ny_co_df = co_df[co_df['county'] == 'New York City']
ny_co_df = ny_co_df[(ny_co_df['date'] > '2020-03-06') & (ny_co_df['date'] < '2020-05-17')]
ny_co_df['death_per_day'] = ny_co_df['deaths'].diff()
ny_co_df = ny_co_df[1:]
weekly_cov_death = ny_co_df.groupby(np.arange(len(ny_co_df))//7).sum()
weekly_cov_death['week'] = range(11,21)

In [51]:
# collapse
covid_tick_deaths = alt.Chart(weekly_cov_death).mark_tick(thickness=2, color='red').encode(
    x='week:N',
    y='death_per_day:Q'
)

In [52]:
excess_deaths_chart + covid_tick_deaths

Let's beautify a little -

In [53]:
# collapse
excess_deaths_chart = alt.Chart(ex_nyc).mark_bar(width=9).encode(
    x='week:N',
    y='excess_deaths:Q',
    color = alt.condition(alt.datum.excess_deaths>0, alt.value('#ffab00'), alt.value('#8FB8BB'))
).properties(height=500, width=alt.Step(10))

Lets add a rectangle from March to May to complete our chart

In [55]:
# collapse
source = alt.pd.DataFrame([{'start': 11, 'end': 20}])
rect = alt.Chart(source).mark_rect(opacity=1, fill='#eee', xOffset=5, x2Offset=5).encode(
    x='start:N',
    x2='end:N'
)

In [57]:
(excess_deaths_chart + covid_tick_deaths).configure_view(strokeWidth=0).configure_axis(grid=False)

We can see that it captures the trend extremely well and the datapoints are almost exactly the same.

In [56]:
(rect + excess_deaths_chart + covid_tick_deaths).configure_view(strokeWidth=0).configure_axis(grid=False)

#### TODO
The rest of the states using the CDC Data