# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f2fee23da10>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [6]:
# # For testing, restrict training data to that before a hypothetical predictor submission date
# HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
# df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93361,Zimbabwe,,Zimbabwe__nan,2020-12-12,57.0,2.0,1.0,2.0,3.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,3.0
93362,Zimbabwe,,Zimbabwe__nan,2020-12-13,27.0,2.0,1.0,2.0,3.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,3.0
93363,Zimbabwe,,Zimbabwe__nan,2020-12-14,112.0,2.0,1.0,2.0,3.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,3.0
93364,Zimbabwe,,Zimbabwe__nan,2020-12-15,0.0,2.0,1.0,2.0,3.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,3.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30
date_ls = []
geoid_ls = []
country_ls = []
newcase_ls = []
# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d]
        X_samples.append(X_sample)
        y_samples.append(y_sample)
        date_ls += [ list(gdf['Date'])[d] ]
        geoid_ls += [ list(gdf['GeoID'])[d] ]
        country_ls += [ list(gdf['CountryName'])[d]  ] 
        newcase_ls += [ list(gdf['NewCases'])[d]  ] 

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
import datetime
train_idx = [i for i in range(len(date_ls)) if date_ls[i] <= datetime.date(2020,7,31) ]
test_idx = [i for i in range(len(date_ls)) if date_ls[i] >= datetime.date(2020,8,1) ]
train_idx = np.array(train_idx)
test_idx = np.array(test_idx)

In [15]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [16]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = X_samples[train_idx,:], X_samples[test_idx,:],y_samples[train_idx], y_samples[test_idx]
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape,)

(48678, 390) (36442, 390) (48678,) (36442,)


In [17]:
!pip install lightgbm



In [18]:
import random

def seed_everything(seed=0):
    random.seed(seed)
    np.random.seed(seed)
    
seed_everything(42) 

In [19]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.

lasso_model = Lasso(random_state=42)
lasso_model.fit(X_train, y_train)

print('Lasso result:')
# Evaluate model
train_preds = lasso_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))
test_preds = lasso_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_lasso.pkl', 'wb') as model_file:
    pickle.dump(lasso_model, model_file)

from lightgbm import LGBMRegressor
lgbm_model = LGBMRegressor(random_state=42)
lgbm_model.fit(X_train, y_train)
# Evaluate model
train_preds = lgbm_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('lgbm result:')
print('Train MAE:', mae(train_preds, y_train))
test_preds = lgbm_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_lgbm.pkl', 'wb') as model_file:
    pickle.dump(lgbm_model, model_file)



import xgboost as xgb
xgb_model = xgb.XGBRegressor(random_state=42)
xgb_model.fit(X_train, y_train)
# Evaluate model
train_preds = xgb_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('xgb result:')
print('Train MAE:', mae(train_preds, y_train))
test_preds = xgb_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_xgb.pkl', 'wb') as model_file:
    pickle.dump(xgb_model, model_file)


Lasso result:
Train MAE: 119.27752311151873
Test MAE: 469.36176021168546
lgbm result:
Train MAE: 85.58646245869078
Test MAE: 665.2549123382636
xgb result:
Train MAE: 51.797418546131624
Test MAE: 650.3725903829637


In [20]:
# Lasso result:
# Train MAE: 119.2762230373797
# Test MAE: 469.3599710917388
# lgbm result:
# Train MAE: 85.58646245869078
# Test MAE: 665.2549123382636
# xgb result:
# Train MAE: 51.797418546131624
# Test MAE: 650.3725903829637

In [21]:
# Save the best model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(lasso_model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [22]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [23]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 6.358966931753322
2020-08-07: 16.497805474965627
2020-08-08: 13.310805764231398
2020-08-09: 12.986578802166878
2020-08-10: 14.727074827273483
2020-08-11: 8.863625136926228
2020-08-12: 8.516213970739916
2020-08-13: 15.616078265724848
2020-08-14: 15.648537609771457
2020-08-15: 13.758918481295186
2020-08-16: 15.654203818118813
2020-08-17: 14.456138493497093
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 11.583300705789245
2020-08-22: 15.976481129706318
2020-08-23: 26.1971095122912
2020-08-24: 32.99492312133856
2020-08-25: 24.883515126773887
2020-08-26: 28.421380714600843
2020-08-27: 37.9872947219277
2020-08-28: 42.00935999533128
2020-08-29: 42.26664872171004
2020-08-30: 47.7848218795783
2020-08-31: 61.99591924915734

Predicting for Afghanistan__nan
2020-08-01: 54.160949643765036
2020-08-02: 0
2020-08-03: 36.790071966068616
2020-08-04: 116.13054161005934
2020-08-05: 68.094964

2020-08-18: 112.35435004972423
2020-08-19: 116.14254926160967
2020-08-20: 124.7860039913806
2020-08-21: 127.43151481875628
2020-08-22: 126.97314872607635
2020-08-23: 137.53958655632306
2020-08-24: 146.8663257529826
2020-08-25: 149.39518828112142
2020-08-26: 166.6200789938102
2020-08-27: 176.21299124777664
2020-08-28: 154.24539950160857
2020-08-29: 151.22335050894884
2020-08-30: 149.93211725779696
2020-08-31: 141.9371217012138

Predicting for Burkina Faso__nan
2020-08-01: 54.62540928326092
2020-08-02: 63.751663394465744
2020-08-03: 126.98682664331386
2020-08-04: 101.28081989802266
2020-08-05: 104.55482823022774
2020-08-06: 52.38624991778754
2020-08-07: 47.9671977158046
2020-08-08: 55.5822648805282
2020-08-09: 85.43539436438752
2020-08-10: 129.90656427711136
2020-08-11: 129.7005727635452
2020-08-12: 96.13550335187784
2020-08-13: 50.071241447329854
2020-08-14: 19.711198822030443
2020-08-15: 1.0837470080197065
2020-08-16: 61.036027626985295
2020-08-17: 93.81494916818522
2020-08-18: 111.529


Predicting for Brazil__Amapa
2020-08-01: 187.39621242523165
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 5.284098653174887
2020-08-06: 0
2020-08-07: 131.55137351847895
2020-08-08: 158.21068347835575
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 48.081848798862296
2020-08-14: 91.0094880500511
2020-08-15: 103.75479609367895
2020-08-16: 22.08431828510619
2020-08-17: 0
2020-08-18: 0
2020-08-19: 42.99104167323209
2020-08-20: 59.86630955478645
2020-08-21: 94.62451283595097
2020-08-22: 124.72398504995833
2020-08-23: 59.488358958648355
2020-08-24: 0
2020-08-25: 17.52941091471807
2020-08-26: 56.40293137417251
2020-08-27: 49.77086712346041
2020-08-28: 83.94551763642748
2020-08-29: 109.07089682578558
2020-08-30: 42.29211732488383
2020-08-31: 0

Predicting for Brazil__Bahia
2020-08-01: 2295.4050189610757
2020-08-02: 449.24944971436025
2020-08-03: 8.23498703610261
2020-08-04: 0
2020-08-05: 1597.8340312364844
2020-08-06: 1329.7321874378495
2020-08-07: 1262.90416264865

2020-08-31: 0

Predicting for Brazil__Parana
2020-08-01: 2881.055089717776
2020-08-02: 651.7642830749749
2020-08-03: 0
2020-08-04: 0
2020-08-05: 2483.5439000765396
2020-08-06: 3358.9865145176477
2020-08-07: 2741.258832968745
2020-08-08: 3061.3961140788288
2020-08-09: 0
2020-08-10: 0
2020-08-11: 306.956586424247
2020-08-12: 1810.3992668193778
2020-08-13: 3779.2654418685006
2020-08-14: 4005.3695577068224
2020-08-15: 1596.9304236113612
2020-08-16: 644.8522585965027
2020-08-17: 0
2020-08-18: 0
2020-08-19: 2858.6046279400125
2020-08-20: 4598.321700894284
2020-08-21: 3993.3577153937194
2020-08-22: 2082.511848044395
2020-08-23: 951.6546182759097
2020-08-24: 0
2020-08-25: 767.1703521729685
2020-08-26: 3203.6057196891807
2020-08-27: 4733.289750712209
2020-08-28: 4691.704389515409
2020-08-29: 2271.5222790129746
2020-08-30: 995.0502478570828
2020-08-31: 55.26872945484354

Predicting for Brazil__Rio de Janeiro
2020-08-01: 1688.0903570121231
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 56.

2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Botswana__nan
2020-08-01: 189.31350950533863
2020-08-02: 112.97439130574415
2020-08-03: 0
2020-08-04: 53.38266192738791
2020-08-05: 241.79360838907274
2020-08-06: 74.79262778173535
2020-08-07: 19.95597004567819
2020-08-08: 181.25040974643912
2020-08-09: 149.88825085110415
2020-08-10: 0
2020-08-11: 70.04435872699608
2020-08-12: 230.85080653246303
2020-08-13: 128.46781794378904
2020-08-14: 25.003210061123333
2020-08-15: 192.64601526082
2020-08-16: 210.27250921577968
2020-08-17: 0
2020-08-18: 120.69860845781679
2020-08-19: 258.98946953594833
2020-08-20: 211.42924965046524
2020-08-21: 86.16463623158376
2020-08-22: 222.96942180501426
2020-08-23: 241.42211715072602
2020-08-24: 88.09879626304439
2020-08-25: 164.10441724137436
2020-08-26: 273.80752667305364
2020-08-27: 255.7572555949941
2020-08-28: 141.4444808491

2020-08-20: 8.75542901212971
2020-08-21: 17.389141441160607
2020-08-22: 28.19766087654371
2020-08-23: 27.884236034299423
2020-08-24: 35.00395719746627
2020-08-25: 30.051342011205307
2020-08-26: 26.706747513667853
2020-08-27: 0
2020-08-28: 15.550749886574199
2020-08-29: 28.148583228447784
2020-08-30: 30.364716804350422
2020-08-31: 21.393540819994772

Predicting for Costa Rica__nan
2020-08-01: 509.108486363477
2020-08-02: 324.04065505887223
2020-08-03: 0
2020-08-04: 41.72857012491659
2020-08-05: 1108.4129637101232
2020-08-06: 966.9236232145533
2020-08-07: 298.9310892143344
2020-08-08: 466.7769312730044
2020-08-09: 336.50170043815336
2020-08-10: 0
2020-08-11: 0
2020-08-12: 993.1832927333468
2020-08-13: 1099.6757426408467
2020-08-14: 32.03953466071556
2020-08-15: 417.8514620715958
2020-08-16: 319.52599862669217
2020-08-17: 0
2020-08-18: 195.49363220359623
2020-08-19: 1065.3469001515464
2020-08-20: 1107.6759564747972
2020-08-21: 255.89537897694743
2020-08-22: 394.89111780038905
2020-08-23: 

2020-08-30: 900.3670303726315
2020-08-31: 706.1189879397116

Predicting for Algeria__nan
2020-08-01: 187.87991348988194
2020-08-02: 179.80256040909322
2020-08-03: 125.56945844095011
2020-08-04: 114.62335173568012
2020-08-05: 42.57469970420806
2020-08-06: 0
2020-08-07: 5.181494535676123
2020-08-08: 4.592295546111984
2020-08-09: 15.97810535250543
2020-08-10: 46.645306852817725
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 43.53173186103567
2020-08-17: 18.81738777527776
2020-08-18: 57.3186979406752
2020-08-19: 17.043675567909858
2020-08-20: 0
2020-08-21: 0
2020-08-22: 39.037121493309044
2020-08-23: 92.11311183629341
2020-08-24: 102.20246045601573
2020-08-25: 106.3592993103894
2020-08-26: 70.73636447966916
2020-08-27: 0
2020-08-28: 0
2020-08-29: 52.814747261244094
2020-08-30: 108.98435265178085
2020-08-31: 99.16424055041749

Predicting for Ecuador__nan
2020-08-01: 478.05781783886357
2020-08-02: 219.020848865638
2020-08-03: 362.62938244990613
2020-08-04: 


Predicting for United Kingdom__Northern Ireland
2020-08-01: 163.4781650412774
2020-08-02: 135.60925796655982
2020-08-03: 48.91110690185971
2020-08-04: 10.767006840766896
2020-08-05: 0
2020-08-06: 43.39669794782513
2020-08-07: 49.94595857736592
2020-08-08: 113.96817109407755
2020-08-09: 100.89515013896153
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 11.211865402636384
2020-08-15: 92.40999082453695
2020-08-16: 100.51438896808969
2020-08-17: 54.22297048526462
2020-08-18: 6.032671692353738
2020-08-19: 0
2020-08-20: 0
2020-08-21: 65.11609325035249
2020-08-22: 153.76937123042921
2020-08-23: 197.81265205348217
2020-08-24: 147.02489257955568
2020-08-25: 87.41266150055117
2020-08-26: 95.22670460141859
2020-08-27: 55.916947983877236
2020-08-28: 127.08336738554189
2020-08-29: 207.12529260354955
2020-08-30: 232.41360941909028
2020-08-31: 162.77353015070435

Predicting for United Kingdom__Scotland
2020-08-01: 264.2382688252946
2020-08-02: 137.3686619972966
2020-08-03: 0
2020


Predicting for Croatia__nan
2020-08-01: 2737.685604183559
2020-08-02: 2726.137826397564
2020-08-03: 2063.242427317435
2020-08-04: 1042.760886711943
2020-08-05: 0
2020-08-06: 0
2020-08-07: 1321.6143644048366
2020-08-08: 2592.0761551406767
2020-08-09: 2479.7981783054315
2020-08-10: 1987.3428616770102
2020-08-11: 646.7704219842788
2020-08-12: 0
2020-08-13: 0
2020-08-14: 1017.7517360567425
2020-08-15: 2302.0027471327476
2020-08-16: 2710.3969582568097
2020-08-17: 1923.7329882601152
2020-08-18: 894.3936580283809
2020-08-19: 0
2020-08-20: 0
2020-08-21: 1081.0250789485312
2020-08-22: 2758.21242239206
2020-08-23: 3181.1777570074014
2020-08-24: 2398.3296023488365
2020-08-25: 1177.6663330644583
2020-08-26: 157.70304906626478
2020-08-27: 79.97175324700154
2020-08-28: 1337.872696287685
2020-08-29: 2899.029937348706
2020-08-30: 3451.2201016676545
2020-08-31: 2446.15347723261

Predicting for Haiti__nan
2020-08-01: 1.131461460909125
2020-08-02: 18.264256906497383
2020-08-03: 0
2020-08-04: 11.22428810

2020-08-21: 331.5873320195321
2020-08-22: 1370.5305012891795
2020-08-23: 1800.8981386562482
2020-08-24: 1479.3657965594848
2020-08-25: 839.1594308234988
2020-08-26: 92.11363684527271
2020-08-27: 0
2020-08-28: 429.65006000438274
2020-08-29: 1421.2674218860348
2020-08-30: 1903.1912413111404
2020-08-31: 1544.7650558442383

Predicting for Kazakhstan__nan
2020-08-01: 473.2366171905512
2020-08-02: 461.44610833858684
2020-08-03: 387.9980574568034
2020-08-04: 247.68241446866878
2020-08-05: 110.31348087965216
2020-08-06: 0
2020-08-07: 260.3493809957247
2020-08-08: 261.27440065525695
2020-08-09: 328.7660390414136
2020-08-10: 297.6405197239259
2020-08-11: 65.6836185375752
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 112.44659440482178
2020-08-16: 280.9695722372119
2020-08-17: 151.75337759606353
2020-08-18: 57.938398334150136
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 208.23100535317232
2020-08-23: 322.34109603997706
2020-08-24: 256.73844990952995
2020-08-25: 144.5270268456644


2020-08-10: 0
2020-08-11: 42.30165681561762
2020-08-12: 607.0966260162279
2020-08-13: 462.559897457871
2020-08-14: 89.16648030402831
2020-08-15: 588.4681067896164
2020-08-16: 549.6007121973188
2020-08-17: 0
2020-08-18: 128.72758738134385
2020-08-19: 568.1151444216124
2020-08-20: 522.5486750548614
2020-08-21: 223.8296933900894
2020-08-22: 534.5693652827666
2020-08-23: 623.5738747384871
2020-08-24: 147.15040747569606
2020-08-25: 155.62903670538816
2020-08-26: 577.8992249259376
2020-08-27: 705.0429011672643
2020-08-28: 356.65488483005106
2020-08-29: 539.4270772629728
2020-08-30: 584.8755331511015
2020-08-31: 172.09008432626777

Predicting for Latvia__nan
2020-08-01: 514.8381195624696
2020-08-02: 435.69294723774016
2020-08-03: 323.8934408042564
2020-08-04: 103.45786203991484
2020-08-05: 0
2020-08-06: 0
2020-08-07: 314.1427119824561
2020-08-08: 477.92388701083223
2020-08-09: 462.7534710629943
2020-08-10: 317.8219100663807
2020-08-11: 85.17110859035083
2020-08-12: 0
2020-08-13: 0
2020-08-14:

2020-08-30: 449.28915449384664
2020-08-31: 378.7732883886912

Predicting for Mongolia__nan
2020-08-01: 0
2020-08-02: 5.916006766848659
2020-08-03: 1.5909374407344297
2020-08-04: 8.826326072160843
2020-08-05: 8.25540724203488
2020-08-06: 1.4644553434305339
2020-08-07: 4.388151507673531
2020-08-08: 4.668787408827855
2020-08-09: 4.247962546384058
2020-08-10: 8.236005370845454
2020-08-11: 11.950397978041845
2020-08-12: 6.117435862085758
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Mozambique__nan
2020-08-01: 50.88486484003218
2020-08-02: 80.18084825933083
2020-08-03: 62.18591795781475
2020-08-04: 42.74321826420871
2020-08-05: 16.098479068590976
2020-08-06: 13.842972351589351
2020-08-07: 20.397368300644516
2020-08-08: 50.26634933410412
2020-08-09: 83.6403471

2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Oman__nan
2020-08-01: 18.03928381876176
2020-08-02: 0
2020-08-03: 72.43936286767149
2020-08-04: 380.6934415977238
2020-08-05: 323.408714122276
2020-08-06: 15.291671636354817
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 36.06618671519061
2020-08-11: 290.51577397267977
2020-08-12: 301.91017544070036
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 65.22445868750103
2020-08-18: 323.5856380022526
2020-08-19: 328.6441198998109
2020-08-20: 29.5964008086394
2020-08-21: 0
2020-08-22: 0
2020-08-23: 9.12580790160726
2020-08-24: 119.11297686991236
2020-08-25: 351.02574791056855
2020-08-26: 365.37336011151604
2020-08-27: 12.16612681891391
2020-08-28: 0
2020-08-29: 0
2020-08-30: 36.46421651620254
2020-08-31: 136.93939277768206

Predicting for Pakistan__nan
2020-08-01: 1441.3339469205916
2020-08-02: 1470.5282870542737
2020-08-03: 1294.084817700516
2020-08-04: 1112.0274472651563
2020-08-0

2020-08-06: 81.66848361929209
2020-08-07: 115.30605158729746
2020-08-08: 0
2020-08-09: 333.4401105343028
2020-08-10: 617.3536011209403
2020-08-11: 423.0132213771884
2020-08-12: 0.9191827304026177
2020-08-13: 0
2020-08-14: 0
2020-08-15: 19.76913599176831
2020-08-16: 297.7219634266711
2020-08-17: 450.1625009633756
2020-08-18: 484.1912400393175
2020-08-19: 2.8802669748554592
2020-08-20: 0
2020-08-21: 0
2020-08-22: 153.6598168108639
2020-08-23: 365.5890336886186
2020-08-24: 568.529042267539
2020-08-25: 463.59906718622557
2020-08-26: 118.77502751264635
2020-08-27: 0
2020-08-28: 0
2020-08-29: 157.96832828901023
2020-08-30: 428.58457347497574
2020-08-31: 613.0283738274369

Predicting for Romania__nan
2020-08-01: 4576.554622980523
2020-08-02: 4592.296615040554
2020-08-03: 3421.894004051946
2020-08-04: 1785.075818866228
2020-08-05: 0
2020-08-06: 0
2020-08-07: 2306.735125158463
2020-08-08: 3643.780149436453
2020-08-09: 3851.1342757112807
2020-08-10: 3310.1181670268393
2020-08-11: 636.12740633249

2020-08-04: 1003.0229022068603
2020-08-05: 640.2346617321424
2020-08-06: 326.28379200685674
2020-08-07: 1406.0494185040525
2020-08-08: 1671.5910980895233
2020-08-09: 1370.8812421172893
2020-08-10: 752.6952577633458
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 831.7656365797496
2020-08-16: 1143.128658862639
2020-08-17: 143.41091896598834
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 1335.689782641742
2020-08-23: 1678.617257491157
2020-08-24: 951.7458301990317
2020-08-25: 494.2896612966556
2020-08-26: 218.96790415974067
2020-08-27: 130.90280474500503
2020-08-28: 325.01040790001093
2020-08-29: 1495.7038855383275
2020-08-30: 1886.5653292846491
2020-08-31: 1076.7904538616795

Predicting for South Sudan__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-

2020-08-14: 38.9614278318965
2020-08-15: 34.29210233295571
2020-08-16: 39.63787957772182
2020-08-17: 45.42372142008037
2020-08-18: 38.47831590847508
2020-08-19: 37.853788374577874
2020-08-20: 37.29269942093272
2020-08-21: 30.39639784739655
2020-08-22: 26.49848821328463
2020-08-23: 35.020593568988645
2020-08-24: 36.510013623614036
2020-08-25: 34.066113218353465
2020-08-26: 33.54892735879556
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 6.907007208129473

Predicting for Timor-Leste__nan
2020-08-01: 0
2020-08-02: 4.139812969494217
2020-08-03: 5.687239164312439
2020-08-04: 15.564710276370974
2020-08-05: 35.59117808953723
2020-08-06: 45.31456662440873
2020-08-07: 54.99237600525629
2020-08-08: 63.0753750195787
2020-08-09: 79.75944205622966
2020-08-10: 91.82992210916876
2020-08-11: 107.6715628076528
2020-08-12: 122.17553761349333
2020-08-13: 134.74147513198233
2020-08-14: 138.06789863598414
2020-08-15: 137.7512054356375
2020-08-16: 143.36550820974486
2020-08-17: 152.1604

2020-08-31: 356.43889111058974

Predicting for United States__nan
2020-08-01: 121942.11974551363
2020-08-02: 130829.0898490751
2020-08-03: 108375.55348484653
2020-08-04: 86834.35167853803
2020-08-05: 62655.12394396552
2020-08-06: 36524.13370530018
2020-08-07: 66454.18286370006
2020-08-08: 89722.69637330214
2020-08-09: 100450.21452968931
2020-08-10: 100287.89486010693
2020-08-11: 44107.63876975114
2020-08-12: 14648.832947828934
2020-08-13: 2490.7124172822514
2020-08-14: 1650.630182073468
2020-08-15: 49149.83194549011
2020-08-16: 85483.38261228063
2020-08-17: 62335.02138824585
2020-08-18: 43953.19097612747
2020-08-19: 16744.10115140902
2020-08-20: 0
2020-08-21: 0
2020-08-22: 61626.63974472867
2020-08-23: 95746.24551313226
2020-08-24: 82520.30590229586
2020-08-25: 53542.248117391515
2020-08-26: 15440.855373091172
2020-08-27: 0
2020-08-28: 2893.9742172072165
2020-08-29: 60635.192071600766
2020-08-30: 102627.70790312166
2020-08-31: 88152.41490292136

Predicting for United States__Alaska
202

2020-08-01: 346.5776905797287
2020-08-02: 649.8135748643491
2020-08-03: 531.9382509850453
2020-08-04: 400.16270498101227
2020-08-05: 475.38696954836666
2020-08-06: 105.02263552533606
2020-08-07: 201.646745297192
2020-08-08: 334.0363185055197
2020-08-09: 454.45960407144344
2020-08-10: 492.7449865513953
2020-08-11: 339.6165101929339
2020-08-12: 172.12385749512282
2020-08-13: 53.758763624939974
2020-08-14: 0
2020-08-15: 93.45083090872623
2020-08-16: 380.15645200562665
2020-08-17: 359.3962506969603
2020-08-18: 288.7743727525218
2020-08-19: 153.9961716928207
2020-08-20: 0
2020-08-21: 0
2020-08-22: 171.1704422065029
2020-08-23: 401.57431131108405
2020-08-24: 416.38584612834296
2020-08-25: 349.33430046703035
2020-08-26: 132.3901141039309
2020-08-27: 0
2020-08-28: 0
2020-08-29: 156.92265378603713
2020-08-30: 397.40053755132254
2020-08-31: 451.27112878199284

Predicting for United States__Florida
2020-08-01: 6487.527716172015
2020-08-02: 7245.290977854369
2020-08-03: 6460.635614292611
2020-08-0

2020-08-30: 26.38572797119637
2020-08-31: 0

Predicting for United States__Michigan
2020-08-01: 2006.830133306095
2020-08-02: 4744.929838894662
2020-08-03: 2187.9193049204077
2020-08-04: 0
2020-08-05: 1096.1599071158726
2020-08-06: 2798.0166582487946
2020-08-07: 563.1992159012531
2020-08-08: 1116.402951118709
2020-08-09: 3192.67520867231
2020-08-10: 1729.1303207140295
2020-08-11: 0
2020-08-12: 363.62896480087466
2020-08-13: 2656.229571031996
2020-08-14: 0
2020-08-15: 1523.9489567561486
2020-08-16: 3169.9801858606825
2020-08-17: 1054.3332214007464
2020-08-18: 0
2020-08-19: 698.7409307529741
2020-08-20: 2136.0678209331836
2020-08-21: 150.16603963597504
2020-08-22: 1553.0329025254541
2020-08-23: 3021.143627149219
2020-08-24: 1895.7361598848097
2020-08-25: 244.86727337553697
2020-08-26: 658.0030520976562
2020-08-27: 2027.3565114218377
2020-08-28: 693.5863452872521
2020-08-29: 1480.2410295430764
2020-08-30: 2846.5165175014617
2020-08-31: 2190.6589500167556

Predicting for United States__Min


Predicting for United States__Oklahoma
2020-08-01: 1387.0249008900191
2020-08-02: 2656.3265297632242
2020-08-03: 3528.637730959518
2020-08-04: 2700.3378252608495
2020-08-05: 589.3489388450332
2020-08-06: 338.88669319992266
2020-08-07: 507.2292758874122
2020-08-08: 1253.5173911552629
2020-08-09: 2486.4332962251583
2020-08-10: 3730.538408510714
2020-08-11: 2318.31849397376
2020-08-12: 297.6802889139775
2020-08-13: 0
2020-08-14: 0
2020-08-15: 1197.8632662183109
2020-08-16: 2496.2290604454693
2020-08-17: 3482.4143503139653
2020-08-18: 2413.8026424658747
2020-08-19: 656.9879585416062
2020-08-20: 0
2020-08-21: 14.469016550835136
2020-08-22: 1383.4060920796192
2020-08-23: 2918.1232791598295
2020-08-24: 3797.604329944066
2020-08-25: 2713.9095618797633
2020-08-26: 833.0120563998453
2020-08-27: 0
2020-08-28: 176.0535808487791
2020-08-29: 1564.36708165945
2020-08-30: 3274.4553577971565
2020-08-31: 4058.628960474959

Predicting for United States__Oregon
2020-08-01: 650.6239162322082
2020-08-02: 8

2020-08-05: 1599.044506390132
2020-08-06: 789.1011958966179
2020-08-07: 999.2139929612053
2020-08-08: 1194.779890598681
2020-08-09: 1331.3382860514503
2020-08-10: 1796.274490704805
2020-08-11: 1607.7668451169943
2020-08-12: 604.3922682102708
2020-08-13: 32.90582925185291
2020-08-14: 0
2020-08-15: 366.19609236176825
2020-08-16: 931.510423154287
2020-08-17: 1232.7036948210366
2020-08-18: 1392.7674874853597
2020-08-19: 628.9076390646445
2020-08-20: 0
2020-08-21: 0
2020-08-22: 613.4265987032885
2020-08-23: 1260.3633946112218
2020-08-24: 1570.3934958423201
2020-08-25: 1589.3674861581007
2020-08-26: 729.213817926544
2020-08-27: 0
2020-08-28: 0
2020-08-29: 578.1374355926689
2020-08-30: 1301.0919053679013
2020-08-31: 1685.6006975554315

Predicting for United States__Virgin Islands
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0.02181213962486872
2020-08-07: 3.9954168106563097
2020-08-08: 0
2020-08-09: 0.6105950632657144
2020-08-10: 0
2020-08-11: 0
2020-08-12

2020-08-12: 35.705393458759886
2020-08-13: 51.52448365023076
2020-08-14: 60.47091264152137
2020-08-15: 73.41676350609913
2020-08-16: 87.10191406124412
2020-08-17: 97.71902724989516
2020-08-18: 81.91286948507876
2020-08-19: 78.24983074562716
2020-08-20: 84.07656041792458
2020-08-21: 98.83171826629041
2020-08-22: 129.3292897582296
2020-08-23: 157.6448800907325
2020-08-24: 168.4885888889666
2020-08-25: 151.96089770260988
2020-08-26: 155.1911181383489
2020-08-27: 119.66435840577918
2020-08-28: 134.0286179548586
2020-08-29: 161.13011342210612
2020-08-30: 181.68008828358273
2020-08-31: 171.9253135108159

Predicting for Zimbabwe__nan
2020-08-01: 16.80171679543004
2020-08-02: 54.603056710363376
2020-08-03: 0
2020-08-04: 0
2020-08-05: 21.44678864900689
2020-08-06: 0
2020-08-07: 0
2020-08-08: 2.2233929017899747
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 5.09024874530693
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020

In [24]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,0.0
214,Aruba,,2020-08-02,0.0
215,Aruba,,2020-08-03,0.0
216,Aruba,,2020-08-04,0.0
217,Aruba,,2020-08-05,0.0


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [25]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [26]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,0.0
Aruba,,2020-08-02,0.0
Aruba,,2020-08-03,0.0
Aruba,,2020-08-04,0.0
Afghanistan,,2020-08-01,54.160949643765036
Afghanistan,,2020-08-02,0.0
Afghanistan,,2020-08-03,36.790071966068616
Afghanistan,,2020-08-04,116.13054161005934
Angola,,2020-08-01,30.8527429345367


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [29]:
import sys,os,os.path
sys.path.append(os.path.expanduser('/home/thinng/code/2020/covid-xprize/'))

In [30]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [31]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [32]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 209 ms, sys: 86.3 ms, total: 295 ms
Wall time: 6.26 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [33]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-23
End date: 2021-06-21


In [34]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [35]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-23 to 2021-06-21...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 4.45 s, sys: 630 ms, total: 5.08 s
Wall time: 2min 14s
