# Covid19 Cases by State in USA Bar Chart Race 

### Apr. 28, 2020
#### By: Jeff Hale

In this notebook I will show how to use Python, pandas, and the bar_chart_race package to make a bar chart race of Covid19 cases over time by state.

County level dataset from [New York Times](https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv)  via [Kaggle](https://www.kaggle.com/fireballbyedimyrnmom/us-counties-covid-19-dataset).

#### Install bar_chart_race package

In [None]:
!pip install bar_chart_race

#### Import packages

In [None]:
import pandas as pd
from IPython.display import HTML
import bar_chart_race as bcr

#### Read data

In [None]:
data_path = "/kaggle/input/us-counties-covid-19-dataset/us-counties.csv"
#local: 'data/us-counties-2020-04-28.csv' 

df = pd.read_csv(data_path, index_col='date')
df.head()

#### Set index to datetime 

In [None]:
df.index = pd.to_datetime(df.index)

#### Explore data

In [None]:
df.info()

In [None]:
df_cases = df.loc[:, ['state', 'cases']]

### groupby date and state

In [None]:
df_states = df_cases.groupby(['date','state']).sum().reset_index()
df_states

In [None]:
df_states = df_states.set_index('date')
df_states.info()

In [None]:
df_states.head()

#### Pivot the data to get it in the correct format for the bar chart race.

Data needs to be in wide format, so states along the top, dates in the index, either deaths or cases for the values. 

In [None]:
df_pivoted = df_states.pivot(values='cases', columns='state')
df_pivoted.tail(2)

In [None]:
df_pivoted.info()

#### Make a DataFrame that starts after there are a few data points so X-axis labels look nicer at start.

In [None]:
df_pivoted_later = df_pivoted[df_pivoted.index >= "2020-02-20"]
df_pivoted_later.head(2)

#### Make the bar chart race and output to a file

In [None]:
bcr.bar_chart_race(
    df=df_pivoted_later,
    filename='covid19_county_state_h_later.mp4',
    orientation='h',
    sort='desc',
    label_bars=True,
    use_index=True,
    steps_per_period=10,
    period_length=300,
    figsize=(8, 6),
    cmap='dark24',
    title='COVID-19 Cases by State',
    bar_label_size=7,
    tick_label_size=7,
    period_label_size=16,
)

#### Make a bar chart race for inline notebook viewing.

In [None]:
bcr_html = bcr.bar_chart_race(df=df_pivoted_later, filename=None, period_length=300, figsize=(8, 6))

In [None]:
HTML(bcr_html)

#### Save cleaned state level data to output files

In [None]:
df_pivoted.to_csv('pivoted_covid19_through_apr_27_wf.csv')

In [None]:
df_pivoted_later.to_csv('pivoted_covid19_through_feb_20_to_apr_27_wf.csv')

## Future directions:

- Could pull from NYT repo directly so have most updated info.
- Could aggregate data by week.

### I hope you found this example of how to make a bar chart race after some data munging to be useful! 🎉

### Please upvote if you did! 😀
