# COVID-19 Bar Chart Race
This tutorial will visualize COVID-19 data as a bar chart race and will use a preprocessed dataset. For understanding how to process and serialize a dataset step by step, see [COVID-19 Time Series](covid_19_time_series.ipynb) tutorial (albeit the preprocessed dataset here differs slightly).

Click [here](covid_19_bar_chart_race.ipynb#final-animation) to see the full animation.

### load data

Here, we import ahlive and open a partially preprocessed dataset.

In [None]:
import ahlive as ah
import pandas as pd
df = ah.tutorial.open_dataset('covid19_global_cases')
display(df)

### preprocess data

To have a bit more variety from the previous tutorial, we can preprocess the dataset to be a 7 day rolling average of new daily confirmed cases. It's also better if we convert the values to integer because 0.75 case doesn't really make sense. Finally we subset a timeframe, starting in March.

In [None]:
df_diff = df.pivot_table(
    'cases', columns='country_region', index='date'
).diff()
df_roll = df_diff.rolling('7D').mean().dropna().astype(int)
df_melt = df_roll.dropna().reset_index().melt(
    'date', value_name='new_cases'
).sort_values('date')
df_new = df_melt.loc[df_melt['date'] >= '2020-03-01']
display(df_new)

### serialize data

Then we serialize the `country_region` as the `x` and `cases` as the `y`, also setting `country_region` as the `label` to group them as separate items. And for this tutorial, we will plot the confirmed new cases with `chart='bar'` and `preset='race'`, i.e. a bar chart race. We can go for a test run by setting `animate='test'` and calling `render`.

In [None]:
ah_df = ah.DataFrame(
    df_new, 'country_region', 'new_cases', label='country_region',
    chart='bar', preset='race', animate='tail')
print(ah_df)
ah_df.render()

### add labels

Not too shabby for a test run, but we can make a some improvements.

1. Scale the `new_cases` by 1000 to be more intuitive and update `ylabel`.

2. Use `barh` instead of `bar`.

3. Set `ylims='explore'` to lessen crowding of `bar_label`.

4. Add `state_labels` to show the `date`.

5. Add `inline_labels` to show the `cases`.

6. Add `title` to highlight the data shown is a 7-day rolling mean.

7. Add a `note` to cite the data.

8. Increase `figsize` to prevent left side from being cut-off.

9. Add "k" suffix to `inline_label` through `config`.

In [None]:
df_scale = df_new.copy()
df_scale['new_cases'] /= 1000

ah_df = ah.DataFrame(
    df_scale, 'country_region', 'new_cases', label='country_region',
    chart='barh', preset='race', ylabel='New Cases [thousand]',
    ylims='explore', state_labels='date', inline_labels='new_cases',
    title='New Confirmed COVID-19 Cases per Day, 7-Day Rolling Average',
    note='Source: JHU CSSE COVID-19', figsize=(15, 10),
    animate='tail'
).config('inline', suffix='k')
ah_df.render()

<div class="alert alert-warning">

The bars' labels are jumping around instantaneously here. `frames` can be set to a higher number to show a proper animation, but for the sake of this tutorials' filesize, it will be left until the end.

</div>

### tweak further

We can normalize by population (per 100k) as well.

In [None]:
df_pop = ah.tutorial.open_dataset('covid19_population')[['combined_key', 'population']]
df_norm = df_scale.merge(df_pop, left_on='country_region', right_on='combined_key')
df_norm['new_cases'] = df_norm['new_cases'] * 1000 / df_norm['population']
df_norm['new_cases'] *= 1e5

And also increase the number of bars shown with `limit` and fix `xlim0s=0`.

In [None]:
ah_df = ah.DataFrame(
    df_norm, 'country_region', 'new_cases', label='country_region',
    chart='barh', preset='race', ylabel='New Cases / 100k People',
    ylims='explore', state_labels='date', inline_labels='new_cases',
    title='New Confirmed COVID-19 Cases per Day, 7-Day Rolling Average',
    note='Source: JHU CSSE COVID-19', figsize=(15, 10), xlim0s=0,
    animate='tail'
).config('inline', suffix='k').config('preset', limit=7)
ah_df.render()

We can fix the length of the `country_region` labels and manually edit the finalized `xr.Dataset`, hiding `bar_label` where values are less than 10.

In [None]:
df_short = df_norm.copy()
df_short['country_region'] = df_short['country_region'].str[:20]

ah_df = ah.DataFrame(
    df_short, 'country_region', 'new_cases', label='country_region',
    chart='barh', preset='race', ylabel='New Cases / 100k People',
    ylims='explore', state_labels='date', inline_labels='new_cases',
    title='New Confirmed COVID-19 Cases per Day, 7-Day Rolling Average',
    note='Source: JHU CSSE COVID-19', figsize=(15, 10), xlim0s=0,
    animate='test'
).config('inline', suffix='/ 100k').config('preset', limit=7)
ah_df = ah_df.finalize()
ds = ah_df.data[1, 1]
ds['bar_label'] = ds['bar_label'].where(ds['y'] > 10, '')
ah_df.data[1, 1] = ds
ah_df.render()

### final animation

Looks good; can do a full animation by removing the `animate` keyword. For the sake of this tutorial's file size, the animation will begin from January 2021.

In [None]:
df_cut = df_short.loc[df_short['date'] > '2021-01-01']

ah_df = ah.DataFrame(
    df_cut, 'country_region', 'new_cases', label='country_region',
    chart='barh', preset='race', ylabel='New Cases / 100k People',
    ylims='explore', state_labels='date', inline_labels='new_cases',
    title='New Confirmed COVID-19 Cases per Day, 7-Day Rolling Average',
    note='Source: JHU CSSE COVID-19', figsize=(15, 10), xlim0s=0,
    frames=10, fps=25
).config('inline', suffix='/ 100k').config('preset', limit=7)

ah_df = ah_df.finalize()
ds = ah_df.data[1, 1]
ds['bar_label'] = ds['bar_label'].where(ds['y'] > 10, '')
ah_df.data[1, 1] = ds
ah_df.render()

That's about it for this tutorial. See the next tutorial to see how to create a geographic map!

### full code

```python
import ahlive as ah
import pandas as pd

# load dataset
df = ah.tutorial.open_dataset('covid19_global_cases')

# compute new cases per day
df_diff = df.pivot_table(
    'cases', columns='country_region', index='date'
).diff()
df_roll = df_diff.rolling('7D').mean().dropna().astype(int)
df_melt = df_roll.dropna().reset_index().melt(
    'date', value_name='new_cases'
).sort_values('date')
df_new = df_melt.loc[df_melt['date'] >= '2020-03-01']

# scale by a 1000
df_scale = df_new.copy()
df_scale['new_cases'] /= 1000

# normalize by population
df_pop = ah.tutorial.open_dataset('covid19_population')[['combined_key', 'population']]
df_norm = df_scale.merge(df_pop, left_on='country_region', right_on='combined_key')
df_norm['new_cases'] = df_norm['new_cases'] * 1000 / df_norm['population']
df_norm['new_cases'] *= 1e5

# shorten labels
df_short = df_norm.copy()
df_short['country_region'] = df_short['country_region'].str[:20]

# cut animation
df_cut = df_short.loc[df_short['date'] > '2021-01-01']

# serialize
ah_df = ah.DataFrame(
    df_cut, 'country_region', 'new_cases', label='country_region',
    chart='barh', preset='race', ylabel='New Cases / 100k People',
    ylims='explore', state_labels='date', inline_labels='new_cases',
    title='New Confirmed COVID-19 Cases per Day, 7-Day Rolling Average',
    note='Source: JHU CSSE COVID-19', figsize=(15, 10), xlim0s=0,
    frames=10, fps=25
).config('inline', suffix='/ 100k').config('preset', limit=7)

# postprocess; only show bar_label if y > 10
ah_df = ah_df.finalize()
ds = ah_df.data[1, 1]
ds['bar_label'] = ds['bar_label'].where(ds['y'] > 10, '')
ah_df.data[1, 1] = ds

ah_df.render()
```