<div style='text-align:center; margin-bottom:40px'>
    <img src='https://media.nature.com/lw1024/magazine-assets/d41586-020-00190-6/d41586-020-00190-6_17590322.jpg'/>
    The new coronavirus take its name from its crown-like halo (see <a href='https://www.nature.com/articles/d41586-020-00190-6'><b>nature.com</b></a>).
</div>


# Dashboarding the Coronavirus Pandemic - the Good, the Bad, and the Ugly
---

Recently, the widespread of the _New Coronavirus Disease (COVID-19)_ around the world [has reached pandemic levels](https://www.nytimes.com/2020/03/11/health/coronavirus-pandemic-who.html), according to the World Health Organization (WHO).

But, at least from the datascience perspective, the tracking of COVID-19 seems like a good news. Complete datasets are available online and updated on a daily and hourly basis. Here, we demonstrate a basic analysis of the data from the official [Coronavirus disease (COVID-2019) situation reports](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports) published daily by World Health Organization. In short, the epidemiological data accompanied with these reports tracks the disease speard world wide, and desribes the updated number of COVID-19 cases at the country level.

<img src='https://i.imgur.com/XpGB70y.png' style='width:200px; float:right; margin-left:20px'/>

This notebook was written with few major objectives in mind. Obviously, the first is to become better acquainted with the disease and its spreading process. Second, to demonstrate an analysis and visualization of potentially insightful points. And third, to leverage the frequent online data updates for dashboarding - i.e., pose questions that continously monitoring their answers will be beneficial.

This notebook is rather preliminary, and will monitor the following specific questions:

* What are the global, world-wide, trends of infection, recovery, and death?
* The same thing, at the country level.
* Furthermore, we split the countries according to their current treatment effectivity. Which is getting "healthier", and which is still on the bad path? The Good, the Bad, and the Ugly.
* Then, some distinctive country-level rates: recovery-rate and death-rate.

Finally, some open issues and potential future directions for deeper analysis.

Let's get started.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt; plt.style.use('ggplot')
import seaborn as sns
import plotly.express as px

%matplotlib inline

px.defaults.template = 'plotly_dark'
# px.defaults.template = 'ggplot2'

In [None]:
plt.rcParams['figure.dpi'] = 100

<img src='https://www.who.int/about/Logo-WHO.jpg' style='width:300px; float:right'/>

## The Data

The data originates from the daily [Coronavirus disease (COVID-2019) situation reports](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports) of World Health Organization. The Johns Hopkins University have transform the reports into a structured dataset available on their [GitHub repository](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series).

Load (and clean) the dataset.

In [None]:
# filename_pattern = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-{}.csv'
# filename_pattern = 'COVID-19-master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-{}.csv'
filename_pattern = "../input/novel-corona-virus-2019-dataset/time_series_covid_19_{}.csv"

confirmed = pd.read_csv(filename_pattern.format('confirmed')).set_index(['Province/State','Country/Region', 'Lat', 'Long'])
recovered = pd.read_csv(filename_pattern.format('recovered')).set_index(['Province/State','Country/Region', 'Lat', 'Long'])
deaths = pd.read_csv(filename_pattern.format('deaths')).set_index(['Province/State','Country/Region', 'Lat', 'Long'])

print(f'Dates: From {confirmed.columns[0]} to {confirmed.columns[-1]}.')
confirmed

Let's transform the three tables into a single one.

In [None]:
covid = pd.concat([pd.read_csv(filename_pattern.format(name.lower())) \
                       .melt(id_vars=['Province/State','Country/Region', 'Lat', 'Long'], var_name='Date', value_name=name) \
                       .set_index(['Province/State','Country/Region', 'Lat', 'Long', 'Date'])
                   for name in ['Confirmed', 'Recovered', 'Deaths']], axis=1).reset_index() \
    .assign(Infected=lambda df: df['Confirmed']-df['Recovered']-df['Deaths'])
covid['Date'] = pd.to_datetime(covid['Date'])
covid['Country/Region'] = covid['Country/Region'].replace({
    'Mainland China': 'China',
    'Republic of Korea': 'Korea, South',
    'Iran (Islamic Republic of)': 'Iran',
})

covid

## World-wide trends

So, what are the updated global numbers?

In [None]:
covid_world = covid.groupby('Date')[['Confirmed', 'Recovered', 'Deaths', 'Infected']].sum()
covid_world.iloc[[-1]].style.format('{:,}')

In [None]:
df = covid_world.reset_index().melt(id_vars='Date', var_name='Status', value_name='Subjects')
px.line(df, 'Date', 'Subjects', color='Status', title='World wide trends')

And, up to now, what are the computed recovery and death rates?

In [None]:
covid_world.iloc[[-1]].assign(
        RecoveryRate=lambda df: df['Recovered']/df['Confirmed'], 
        DeathRate=lambda df: df['Deaths']/df['Confirmed']) \
    .style.format('{:,}').format('{:.2%}', subset=['RecoveryRate', 'DeathRate'])

## Country-level trends

Let's see the currenly most-affected countries

In [None]:
country = confirmed.groupby(['Country/Region']).sum()
# country

In [None]:
latest_confirmed = country[country.columns[-1]].sort_values(ascending=False).astype(int).to_frame('Confirmed').reset_index()
display(latest_confirmed.nlargest(20, 'Confirmed').style.format('{:,}', subset='Confirmed'))
px.bar(latest_confirmed.nlargest(20, 'Confirmed'), x='Country/Region', y='Confirmed')

Now, let's examine the daily dynamics

In [None]:
covid_countries = covid.groupby(['Country/Region', 'Date'])['Confirmed', 'Recovered', 'Deaths', 'Infected'].sum()
covid_countries

As a start, we can focus on a few countries of interest

In [None]:
# countries = ['Mainland China', 'Republic of Korea', 'Italy', 'Iran (Islamic Republic of)', 'US', 'Israel']

import re
countries = [c for c in covid['Country/Region'].unique() 
             if re.search(r'China|Korea, South|Italy|Iran|US|Israel', c)]
print(f'Countries of interest: {countries}')

In [None]:
covid_countries_long = covid_countries.reset_index().melt(id_vars=['Date', 'Country/Region'], var_name='Status', value_name='Subjects')

fig = px.line(covid_countries_long[covid_countries_long['Country/Region'].isin(countries)], 'Date', 'Subjects', 
              color='Status', facet_col='Country/Region', 
              facet_col_wrap=2, height=600)
fig.update_yaxes(matches=None, showticklabels=True)
# fig.update_layout(legend_orientation="h")

Except for China, who seems to have reached a stable state with the number of confirmed COVID-19 cases (and even a **reduction** in the net infected subjects), all of the rest are still on their steeply increasing stage (Korea looks on a good direction, though). 

So, what is the growth rate? Let's switch to a log scale.

In [None]:
fig = px.scatter(covid_countries_long[covid_countries_long['Country/Region'].isin(countries)], 'Date', 'Subjects',
                 color='Status', facet_col='Country/Region', 
                 facet_col_wrap=2, log_y=True, height=600)
fig.update_yaxes(matches=None, showticklabels=True)
fig.update_traces(mode='lines+markers', line=dict(width=.5))
# fig.update_layout(legend_orientation="h")
fig.update_xaxes(range=['2020-02-20', '2020-03-12'])


About x10 per week!

Let's see it on the top ten most-affected countries.

In [None]:
top_countries = covid_countries_long[(covid_countries_long['Country/Region'].isin(latest_confirmed.nlargest(10, 'Confirmed')['Country/Region'][1:])) &
                 (covid_countries_long['Status']=='Confirmed')] \
    .rename({'Subjects': 'Confirmed'}, axis=1)

fig = px.scatter(top_countries, 'Date', 'Confirmed', color='Country/Region', 
                 log_y=True, height=600)
fig.update_traces(mode='lines+markers', line=dict(width=.5))
fig.update_layout(title='Exponential growth in the top ten most-affected countries (except China)')

This is intriguing in many aspects. To name a few:
1. The unique point in time where the disease outbreak is highly prominent.
2. The exponential and very high (x10) growth rate seems univeral across countries.
3. _(2020-03-13)_ Again, Korea is starting to succesfully reduce its COVID-19 growth rate.

(@TODO: quantify the exponent.)

### The Good, the Bad, and the Ugly

Let's first try to naïvely divide all countries into two: those who are still experiencing a daily increase of COVID-19 cases, and those who have already transformed to the healthy dynamics of reducing the numbers.

In [None]:
diffs = covid_countries.reset_index().groupby('Country/Region') \
    .apply(lambda g: g.sort_values('Date').assign(Diff=g['Infected'].diff())).reset_index(drop=True)
display(diffs.head())

fig = px.line(diffs[diffs['Country/Region'].isin(countries)], x='Date', y='Diff', color='Country/Region')
fig

In [None]:
recent_diffs = diffs.pivot_table(index='Date', columns='Country/Region', values='Diff').tail(7).mean()

xaxes_range = [diffs['Date'].max() - pd.DateOffset(weeks=2), diffs['Date'].max()]

df = recent_diffs.nlargest(10)
display(df.to_frame('Daily Diff').T.style.format('{:+,.2f}'))
fig = px.line(diffs[diffs['Country/Region'].isin(df.index)],
              x='Date', y='Diff', color='Country/Region')
fig.update_xaxes(range=xaxes_range)
display(fig)

df = recent_diffs[recent_diffs<0].nsmallest(10)
display(df.to_frame('Daily Diff').T.style.format('{:+,.2f}'))
fig = px.line(diffs[diffs['Country/Region'].isin(df.index)],
              x='Date', y='Diff', color='Country/Region')
fig.update_xaxes(range=xaxes_range)
display(fig)

In [None]:
diffs[diffs['Country/Region']=='Italy'].tail(10) \
    .style.format({'Diff':'{:+,}'}).background_gradient('Reds', subset=['Confirmed', 'Diff'])

### Recovery and Death Rates

In [None]:
rates = covid_countries.assign(
    RecoveryRate=lambda df: df['Recovered']/df['Confirmed'], 
    DeathRate=lambda df: df['Deaths']/df['Confirmed'])

rates.reset_index().groupby('Country/Region').tail(1) \
    .set_index(['Country/Region', 'Date']).loc[countries] \
    .sort_values('Infected', ascending=False) \
    .style.format('{:,}').format('{:.2%}', subset=['RecoveryRate', 'DeathRate']) \
    .background_gradient(subset='RecoveryRate', cmap='Greens') \
    .background_gradient(subset='DeathRate', cmap='Reds')

A recovery rate of 71% in China? can it be so?

Well, at least according to China's health officials statement from the [2020-03-01](http://www.xinhuanet.com/english/2020-03/01/c_138832732.htm) it is not far fetched..
> _The proportion of Chinese patients who recovered from the novel coronavirus disease (COVID-19) and were discharged from hospitals continued to rise in the past week to reach 52.1 percent, a health official said on Sunday._

And, as a second WOW, Korea's statistics might hint that a good treatment can substantially reduce the death rate.

In [None]:
df = rates.reset_index().melt(id_vars=['Date', 'Country/Region'], var_name='Status', value_name='Subjects')
df = df[df['Status'].isin(['RecoveryRate', 'DeathRate'])]
        
fig = px.line(df[df['Country/Region'].isin(countries)], 'Date', 'Subjects', color='Status', 
        facet_col='Country/Region', facet_col_wrap=3,
        height=600)
fig

## Next steps

..will be written later as a first step..

## References

Data
* [Coronavirus disease (COVID-2019) situation reports](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports) by World Health Organization.
* [GitHub data repository](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series) by Johns Hopkins.

Dashboards
* [Coronavirus COVID-19 (2019-nCoV)](https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6) @gisanddata
* [Coronavirus Update (Live)](https://www.worldometers.info/coronavirus/) @worldmeters
* https://www.technologyreview.com/s/615330/best-worst-coronavirus-dashboards/

Blogs
* TBD

## Appendix A - Map of China's Latest Numbers

In [None]:
china_latest = covid[covid['Country/Region']=='China'].groupby(['Province/State']).tail(1)

china_latest.sort_values('Confirmed', ascending=False) \
    .style.background_gradient(cmap='Pastel1_r', subset=['Confirmed', 'Recovered', 'Deaths', 'Infected'])

In [None]:
fig = px.scatter_geo(china_latest.assign(LogConfirmed=lambda df: np.log(df['Confirmed'])),
                     lat='Lat', lon='Long', size='LogConfirmed', color='Confirmed', hover_name='Province/State', 
                     hover_data=['Confirmed', 'Recovered'],
                     scope='asia')
fig.update_geos(fitbounds="locations")
fig.update_geos(
    resolution=50,
    showcoastlines=True, coastlinecolor="RebeccaPurple",
    showland=True, landcolor="LightGreen",
    showocean=True, oceancolor="LightBlue",
    showlakes=True, lakecolor="Blue",
    showrivers=True, rivercolor="Blue"
)
# fig.update_layout(height=300, margin={"r":0,"t":0,"l":0,"b":0})
