# COVID-19 by Country: Predicting the growth curve


Data Science in the field often requires finding out what useful questions can be answered with a dataset _as it is_ rather than a dataset we as professionals would like to have. For instance, I wanted to calculate the probabilty of death given their country's case history and number of tests currently available -- similar to experiments we can do with the Titanic dataset on Kaggle. (If you haven't done so yet, familiarize yourself with Kaggle datasets. There's even a great dataset on COVID-19 from South Korea https://www.kaggle.com/kimjihoo/coronavirusdataset!)

In my daily work, I make on timeseries prediction models such as anomaly detection. I've used Johns Hopkins University data on COVID-19 to train a simple model that will take in the number of daily confirmed cases, recovered cases, and deaths to predict the growth pattern ("high infection rate"/"managable infection rate") of the country's COVID-19 cases. A fully developed version of a model like this could help the international community to send more resources to the communities that are predicted to have high infection rates. 

When our team decides to develop an ML product, we think about the available data we have to train the models that will make up the product and the "use case". Stakeholders who are close to our clients' needs will frame a use case like this: "_As a user, I want to_ [see a dashboard/receive an alert/run an automation] _that_ [tells me something about my system/takes some action on my system]". Then, it is our job as the AI/ML team to figure out what type of model(s) would solve that problem and whether the data provided by the client would work with such a model. 




Suggested Timeline:

5 minutes: Introduction to self and topic.

35 minutes: Presentation. (There will be no questions or student interaction during the presentation.)

15 minutes: Q&A. Students will submit questions to the Lambda School Host, who will ask them on their behalf.

Suggested Presentation Format:

General Overview: what the topic is for, why it matters. You should specifically say something like, "By the end of this lecture, students should be able to XYZ"

Specific Examples: how the topic in use in industry or research. We use the I do You do We do model. So incorporating that into your lecture is important. 

Tutorial: Show how you would design something based on the principles taught

Additional Pointers:

Include live coding, if possible.

The audience will be varied in background, but as an example of an idealized target: somebody who doesn’t have specific prior knowledge, but is motivated and capable of learning from first-ish principles.


# Read in the data. 

In [50]:
from datetime import datetime
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as pyplot

In [15]:
covid_confirmed_df = pd.read_csv('../COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv') 

In [21]:
print("Number of Province/States: {}".format(len(covid_confirmed_df)))
print("Number of Countries: {}".format(len(covid_confirmed_df["Country/Region"].unique())))

covid_confirmed_df.head()

Number of Province/States: 482
Number of Countries: 166


Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/12/20,3/13/20,3/14/20,3/15/20,3/16/20,3/17/20,3/18/20,3/19/20,3/20/20,3/21/20
0,,Thailand,15.0,101.0,2,3,5,7,8,8,...,70,75,82,114,147,177,212,272,322,411
1,,Japan,36.0,138.0,2,1,2,2,4,4,...,639,701,773,839,825,878,889,924,963,1007
2,,Singapore,1.2833,103.8333,0,1,3,3,4,5,...,178,200,212,226,243,266,313,345,385,432
3,,Nepal,28.1667,84.25,0,0,0,1,1,1,...,1,1,1,1,1,1,1,1,1,1
4,,Malaysia,2.5,112.5,0,0,0,3,4,4,...,149,197,238,428,566,673,790,900,1030,1183


In [12]:
covid_deaths_df = pd.read_csv('../COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv') 

In [22]:
print("Number of Province/States: {}".format(len(covid_deaths_df)))
print("Number of Countries: {}".format(len(covid_deaths_df["Country/Region"].unique())))

covid_deaths_df.head()

Number of Province/States: 482
Number of Countries: 166


Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/12/20,3/13/20,3/14/20,3/15/20,3/16/20,3/17/20,3/18/20,3/19/20,3/20/20,3/21/20
0,,Thailand,15.0,101.0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
1,,Japan,36.0,138.0,0,0,0,0,0,0,...,16,19,22,22,27,29,29,29,33,35
2,,Singapore,1.2833,103.8333,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
3,,Nepal,28.1667,84.25,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,,Malaysia,2.5,112.5,0,0,0,0,0,0,...,0,0,0,0,0,2,2,2,3,4


In [13]:
covid_recovered_df = pd.read_csv('../COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv') 

In [23]:
print("Number of Province/States: {}".format(len(covid_recovered_df)))
print("Number of Countries: {}".format(len(covid_recovered_df["Country/Region"].unique())))

covid_recovered_df.head()

Number of Province/States: 482
Number of Countries: 166


Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/12/20,3/13/20,3/14/20,3/15/20,3/16/20,3/17/20,3/18/20,3/19/20,3/20/20,3/21/20
0,,Thailand,15.0,101.0,0,0,0,0,2,2,...,34,35,35,35,35,41,42,42,42,42
1,,Japan,36.0,138.0,0,0,0,0,1,1,...,118,118,118,118,144,144,144,150,191,232
2,,Singapore,1.2833,103.8333,0,0,0,0,0,0,...,96,97,105,105,109,114,114,114,124,140
3,,Nepal,28.1667,84.25,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
4,,Malaysia,2.5,112.5,0,0,0,0,0,0,...,26,26,35,42,42,49,60,75,87,114


In [42]:
confirmed_by_country_df = covid_confirmed_df.drop(['Province/State'], axis=1)

confirmed_by_country_df = covid_confirmed_df.groupby(covid_confirmed_df['Country/Region']).sum()

confirmed_by_country_df = confirmed_by_country_df.drop(['Long', 'Lat'], axis=1)

confirmed_by_country_datetime_df = confirmed_by_country_df.T

confirmed_by_country_datetime_df.tail()

Country/Region,Afghanistan,Albania,Algeria,Andorra,Angola,Antigua and Barbuda,Argentina,Armenia,Australia,Austria,...,Uganda,Ukraine,United Arab Emirates,United Kingdom,Uruguay,Uzbekistan,Venezuela,Vietnam,Zambia,Zimbabwe
3/17/20,22,55,60,39,0,1,68,78,452,1332,...,0,14,98,1960,29,10,33,66,0,0
3/18/20,22,59,74,39,0,1,79,84,568,1646,...,0,14,113,2642,50,15,36,75,2,0
3/19/20,22,64,87,53,0,1,97,115,681,2013,...,0,16,140,2716,79,23,42,85,2,0
3/20/20,24,70,90,75,1,1,128,136,791,2388,...,0,29,140,4014,94,33,42,91,2,1
3/21/20,24,76,139,88,2,1,158,160,1071,2814,...,1,47,153,5067,110,43,70,94,2,3


In [51]:
confirmed_by_country_datetime_df.index = pd.to_datetime(confirmed_by_country_datetime_df.index)

print(type(confirmed_by_country_datetime_df.index[0]))

confirmed_by_country_datetime_df[datetime(2020, 3, 20):]

<class 'pandas.tslib.Timestamp'>


Country/Region,Afghanistan,Albania,Algeria,Andorra,Angola,Antigua and Barbuda,Argentina,Armenia,Australia,Austria,...,Uganda,Ukraine,United Arab Emirates,United Kingdom,Uruguay,Uzbekistan,Venezuela,Vietnam,Zambia,Zimbabwe
2020-03-20,24,70,90,75,1,1,128,136,791,2388,...,0,29,140,4014,94,33,42,91,2,1
2020-03-21,24,76,139,88,2,1,158,160,1071,2814,...,1,47,153,5067,110,43,70,94,2,3


In [67]:
confirmed_by_country_datetime_df.columns
confirmed_by_country_datetime_df['Afghanistan'][datetime(2020, 3, 20):]

2020-03-20    24
2020-03-21    24
Name: Afghanistan, dtype: int64

In [75]:
# country_from_first_confirmed_dict = {}

for column, row in confirmed_by_country_datetime_df[datetime(2020, 3, 20):].iterrows():
    for x in row: 
        if x > 0:
            print(column, row) 

# for i, j in confirmed_by_country_datetime_df[datetime(2020, 3, 20):].iterrows(): 
#     print(i, j) 
        

(Timestamp('2020-03-20 00:00:00'), Country/Region
Afghanistan                  24
Albania                      70
Algeria                      90
Andorra                      75
Angola                        1
Antigua and Barbuda           1
Argentina                   128
Armenia                     136
Australia                   791
Austria                    2388
Azerbaijan                   44
Bahamas, The                  3
Bahrain                     285
Bangladesh                   20
Barbados                      5
Belarus                      69
Belgium                    2257
Benin                         2
Bhutan                        2
Bolivia                      15
Bosnia and Herzegovina       89
Brazil                      793
Brunei                       78
Bulgaria                    127
Burkina Faso                 40
Cabo Verde                    1
Cambodia                     51
Cameroon                     20
Canada                      943
Cape Verde            