# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f0509d150d0>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
# HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-05-06")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91202,Zimbabwe,,Zimbabwe__nan,2020-05-02,-6.0,3.0,3.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,0.0
91203,Zimbabwe,,Zimbabwe__nan,2020-05-03,0.0,3.0,3.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,0.0
91204,Zimbabwe,,Zimbabwe__nan,2020-05-04,0.0,3.0,3.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
91205,Zimbabwe,,Zimbabwe__nan,2020-05-05,0.0,3.0,3.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 21

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [23]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2)

In [37]:
import xgboost as xgb
from xgboost.sklearn import XGBRegressor
from sklearn.model_selection import GridSearchCV

xgb1 = xgb.XGBRegressor()
parameters = {'nthread':[4], #when use hyperthread, xgboost may become slower
              'objective':['reg:linear'],
              'learning_rate': [.03, 0.05, .07], #so called `eta` value
              'max_depth': [5, 6, 7],
              'min_child_weight': [4],
              'silent': [1],
              'subsample': [0.7],
              'colsample_bytree': [0.7],
              'n_estimators': [500]}

xgb_grid = GridSearchCV(xgb1,
                        parameters,
                        cv = 2,
                        n_jobs = 5,
                        verbose=True)

xgb_grid.fit(X_train, y_train)

print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

Fitting 2 folds for each of 9 candidates, totalling 18 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done  18 out of  18 | elapsed:  4.6min finished


0.9218636602684334
{'colsample_bytree': 0.7, 'learning_rate': 0.05, 'max_depth': 5, 'min_child_weight': 4, 'n_estimators': 500, 'nthread': 4, 'objective': 'reg:linear', 'silent': 1, 'subsample': 0.7}


In [44]:
model = xgb.XGBRegressor(colsample_bytree=0.7, learning_rate=0.05, max_depth=5, min_child_weight=4, n_estimators=500, nthread=4, objective='reg:linear', silent=1, subsample=0.7)
model.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster=None, colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=0.7, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints=None,
             learning_rate=0.05, max_delta_step=0, max_depth=5,
             min_child_weight=4, missing=nan, monotone_constraints=None,
             n_estimators=500, n_jobs=4, nthread=4, num_parallel_tree=1,
             objective='reg:linear', random_state=0, reg_alpha=0, reg_lambda=1,
             scale_pos_weight=1, silent=1, subsample=0.7, tree_method=None,
             validate_parameters=False, verbosity=None)

In [45]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 22.01049804495378
Test MAE: 50.55399752046113


In [46]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
# with open('models/model.pkl', 'wb') as model_file:
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [37]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [38]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 59.36062255525157
2020-08-02: 86.35045739959925
2020-08-03: 106.00011692462029
2020-08-04: 125.98135230643517
2020-08-05: 144.993791329175
2020-08-06: 171.40154448944634
2020-08-07: 196.68858568685908
2020-08-08: 220.2749052519795
2020-08-09: 243.89316468108422
2020-08-10: 267.6505308278968
2020-08-11: 292.3299848131692
2020-08-12: 317.28504370003645
2020-08-13: 341.60679652164356
2020-08-14: 366.35073584081016
2020-08-15: 391.3936733613328
2020-08-16: 416.7074069902301
2020-08-17: 442.25948550633194
2020-08-18: 467.93923746786567
2020-08-19: 493.8326829330999
2020-08-20: 519.9617208396685
2020-08-21: 546.3193144238694
2020-08-22: 572.902303109842
2020-08-23: 599.7001692493895
2020-08-24: 626.7204566068767
2020-08-25: 653.9698696613566
2020-08-26: 681.4498239302334
2020-08-27: 698.2984955094389
2020-08-28: 717.8646612427472
2020-08-29: 739.2639962504306
2020-08-30: 760.5868111230948
2020-08-31: 782.8017137674522

Predicting for Afghanistan__nan
20

2020-08-27: 2590.031905372164
2020-08-28: 2618.723011551787
2020-08-29: 2654.965086371872
2020-08-30: 2690.0238415942035
2020-08-31: 2725.5079915091537

Predicting for Azerbaijan__nan
2020-08-01: 2171.685204309538
2020-08-02: 2774.9685098425857
2020-08-03: 2802.20691467383
2020-08-04: 2867.1517784946886
2020-08-05: 2450.1440280792076
2020-08-06: 2588.2619513160666
2020-08-07: 2764.168080012811
2020-08-08: 2821.453703141138
2020-08-09: 2865.171545108749
2020-08-10: 2863.6623159916935
2020-08-11: 2906.724810547015
2020-08-12: 2971.054286831618
2020-08-13: 3022.7628864328276
2020-08-14: 3069.7281276295594
2020-08-15: 3112.395441859058
2020-08-16: 3158.8851071082627
2020-08-17: 3209.327194435161
2020-08-18: 3259.3332971352143
2020-08-19: 3308.7500773774605
2020-08-20: 3357.9705342096763
2020-08-21: 3407.858223653459
2020-08-22: 3458.6495745219504
2020-08-23: 3509.873434435563
2020-08-24: 3561.385468808839
2020-08-25: 3613.237976931171
2020-08-26: 3665.539061326971
2020-08-27: 3723.25241270


Predicting for Belarus__nan
2020-08-01: 963.8881598558596
2020-08-02: 1289.683501835255
2020-08-03: 1251.37734496691
2020-08-04: 1289.5447847683977
2020-08-05: 1133.407329997236
2020-08-06: 1208.91873722667
2020-08-07: 1303.7893815233183
2020-08-08: 1336.9258179899746
2020-08-09: 1367.182638275186
2020-08-10: 1381.4359776707138
2020-08-11: 1414.5242436896126
2020-08-12: 1456.88780224289
2020-08-13: 1492.5809294865328
2020-08-14: 1526.5031942292235
2020-08-15: 1559.2033558372414
2020-08-16: 1593.7081322631434
2020-08-17: 1630.0085975659827
2020-08-18: 1666.1016322263667
2020-08-19: 1702.0418763564617
2020-08-20: 1738.0823603436397
2020-08-21: 1774.5541617245453
2020-08-22: 1811.5375874872384
2020-08-23: 1848.8181827652043
2020-08-24: 1886.3446442667596
2020-08-25: 1924.1528913502077
2020-08-26: 1962.2883222237338
2020-08-27: 1982.0194736383037
2020-08-28: 2000.2529887864266
2020-08-29: 2027.1185260580446
2020-08-30: 2054.3275145844605
2020-08-31: 2082.0009368680207

Predicting for Beli


Predicting for Central African Republic__nan
2020-08-01: 54.68489353066612
2020-08-02: 80.75724214038458
2020-08-03: 99.5670382304975
2020-08-04: 119.28424161983469
2020-08-05: 139.5474799675958
2020-08-06: 165.77536033807655
2020-08-07: 190.7558819530141
2020-08-08: 214.18805109356754
2020-08-09: 237.71813790473627
2020-08-10: 261.55594684949097
2020-08-11: 286.2084941929698
2020-08-12: 311.07787557004707
2020-08-13: 335.32854306331075
2020-08-14: 360.01384611709483
2020-08-15: 385.0167119653764
2020-08-16: 410.2832968679166
2020-08-17: 435.77817445615625
2020-08-18: 461.40033768915
2020-08-19: 487.2377648348593
2020-08-20: 513.3126605877436
2020-08-21: 539.6154440272185
2020-08-22: 566.1419056001282
2020-08-23: 592.8825091458737
2020-08-24: 619.8453024547725
2020-08-25: 647.0370474930298
2020-08-26: 674.4588869922754
2020-08-27: 691.2488106645221
2020-08-28: 710.7556708606699
2020-08-29: 733.5827508142834
2020-08-30: 757.3680643315024
2020-08-31: 780.9505769932375

Predicting for Ca

2020-08-14: 359.20256818837254
2020-08-15: 384.2016817862968
2020-08-16: 409.4615206969948
2020-08-17: 434.9487367592298
2020-08-18: 460.5633160662245
2020-08-19: 486.3937307934568
2020-08-20: 512.4619026485395
2020-08-21: 538.7576334107339
2020-08-22: 565.2768000836977
2020-08-23: 592.0100449341335
2020-08-24: 618.9654926840662
2020-08-25: 646.14988461485
2020-08-26: 673.5642888422669
2020-08-27: 682.4588967583004
2020-08-28: 689.7597303686357
2020-08-29: 707.0957916680984
2020-08-30: 723.962006068828
2020-08-31: 740.9652369995431

Predicting for Cape Verde__nan
2020-08-01: 71.69294775941854
2020-08-02: 108.25791232167573
2020-08-03: 124.27580858780044
2020-08-04: 149.4885736319107
2020-08-05: 164.0239842187994
2020-08-06: 189.95749710915746
2020-08-07: 216.85471304537472
2020-08-08: 240.5054963544248
2020-08-09: 264.67150511247496
2020-08-10: 288.26571031375653
2020-08-11: 312.90986576742745
2020-08-12: 338.1793297538597
2020-08-13: 362.71307524093726
2020-08-14: 387.66678865552234
2

2020-08-06: 412.6597298898254
2020-08-07: 454.0606190120115
2020-08-08: 480.35379962904057
2020-08-09: 505.1830056180523
2020-08-10: 526.6102074423497
2020-08-11: 553.3199947451565
2020-08-12: 582.3365795883017
2020-08-13: 609.3495199444487
2020-08-14: 636.2071921723223
2020-08-15: 663.008043935138
2020-08-16: 690.4644117646696
2020-08-17: 718.5205452325101
2020-08-18: 746.6273853251271
2020-08-19: 774.8566684547352
2020-08-20: 803.2879148291167
2020-08-21: 831.9978779600874
2020-08-22: 861.000584857539
2020-08-23: 890.2373072972857
2020-08-24: 919.7012865975396
2020-08-25: 949.40624673913
2020-08-26: 979.3644929414329
2020-08-27: 998.7183330128216
2020-08-28: 1026.155082111994
2020-08-29: 1055.810134909895
2020-08-30: 1084.2893895032494
2020-08-31: 1112.7926915558287

Predicting for Eritrea__nan
2020-08-01: 55.2505648622553
2020-08-02: 83.432694787
2020-08-03: 100.41869606667035
2020-08-04: 119.9802470407272
2020-08-05: 140.44124048030585
2020-08-06: 166.74106471976938
2020-08-07: 191

2020-08-21: 969.3747836592497
2020-08-22: 999.5638197299234
2020-08-23: 1029.976158920676
2020-08-24: 1060.606008371829
2020-08-25: 1091.4853169461148
2020-08-26: 1122.6374229059124
2020-08-27: 1143.1993808415696
2020-08-28: 1171.852499993997
2020-08-29: 1202.7314049719316
2020-08-30: 1232.444153028076
2020-08-31: 1262.1923523399457

Predicting for United Kingdom__Wales
2020-08-01: 575.6274072823428
2020-08-02: 760.2729997764172
2020-08-03: 572.0523863824817
2020-08-04: 509.69991701479535
2020-08-05: 530.6799689500638
2020-08-06: 616.3402168632351
2020-08-07: 675.9822986656989
2020-08-08: 684.3193587667924
2020-08-09: 695.9707412207358
2020-08-10: 721.454753607158
2020-08-11: 756.1578353716209
2020-08-12: 789.8375989562369
2020-08-13: 816.8365926956797
2020-08-14: 843.248716485698
2020-08-15: 871.6667983767074
2020-08-16: 901.665493514304
2020-08-17: 931.9882296293906
2020-08-18: 961.7697334382547
2020-08-19: 991.5167035039785
2020-08-20: 1021.6969477300242
2020-08-21: 1052.32872955954

2020-08-28: 1292.1487412615606
2020-08-29: 1333.7547962166425
2020-08-30: 1373.6231136276833
2020-08-31: 1413.2788240717111

Predicting for Croatia__nan
2020-08-01: 1567.2608464705806
2020-08-02: 2118.98979776587
2020-08-03: 2110.587700237367
2020-08-04: 2352.287171498006
2020-08-05: 1973.7320655703281
2020-08-06: 2036.97276219276
2020-08-07: 2184.9697667238834
2020-08-08: 2234.861676311534
2020-08-09: 2292.377032687285
2020-08-10: 2295.7485563758723
2020-08-11: 2327.407378815719
2020-08-12: 2382.167934423372
2020-08-13: 2428.1795900455477
2020-08-14: 2472.147569740741
2020-08-15: 2511.35115582753
2020-08-16: 2552.3240862950174
2020-08-17: 2596.949487414749
2020-08-18: 2641.6095903882187
2020-08-19: 2686.040406538136
2020-08-20: 2730.2512618433725
2020-08-21: 2774.8538097138626
2020-08-22: 2820.2311094482825
2020-08-23: 2866.0483642099093
2020-08-24: 2912.167496078709
2020-08-25: 2958.593772713325
2020-08-26: 3005.397722878542
2020-08-27: 3033.8886276692488
2020-08-28: 3060.96915915453

2020-08-22: 622.7825407811129
2020-08-23: 650.0028980527252
2020-08-24: 677.4468047931559
2020-08-25: 705.1215490261368
2020-08-26: 733.0304176453708
2020-08-27: 758.200470371449
2020-08-28: 779.7162917780538
2020-08-29: 807.5224271442203
2020-08-30: 835.4595160773845
2020-08-31: 862.7119370710327

Predicting for Jordan__nan
2020-08-01: 1609.090956239986
2020-08-02: 2296.1847936351387
2020-08-03: 2192.5530241348383
2020-08-04: 2188.1709182669388
2020-08-05: 1906.7838523264968
2020-08-06: 2021.606070401984
2020-08-07: 2182.1190019469755
2020-08-08: 2224.3057037281837
2020-08-09: 2253.0677094389175
2020-08-10: 2259.5027492057293
2020-08-11: 2299.840930694859
2020-08-12: 2357.0870369152353
2020-08-13: 2402.20305325324
2020-08-14: 2442.6874194155184
2020-08-15: 2481.197731454336
2020-08-16: 2523.0331716331925
2020-08-17: 2567.9632437928617
2020-08-18: 2612.355728596963
2020-08-19: 2656.1674240095585
2020-08-20: 2699.9886579804956
2020-08-21: 2744.449927963056
2020-08-22: 2789.664446455862


2020-08-20: 993.6211984974368
2020-08-21: 1023.9247469126235
2020-08-22: 1054.5588180736177
2020-08-23: 1085.4318738807838
2020-08-24: 1116.5347353137927
2020-08-25: 1147.8880241013915
2020-08-26: 1179.512471220728
2020-08-27: 1208.4381588275155
2020-08-28: 1244.4280480664618
2020-08-29: 1277.0557438195374
2020-08-30: 1310.3421542203218
2020-08-31: 1343.9830941907833

Predicting for Sri Lanka__nan
2020-08-01: 415.50369562169635
2020-08-02: 562.7746619592808
2020-08-03: 577.4665761642445
2020-08-04: 589.9731209390811
2020-08-05: 538.837523442568
2020-08-06: 586.2143660893607
2020-08-07: 639.2693827800395
2020-08-08: 668.5496875045894
2020-08-09: 693.8184083626265
2020-08-10: 713.2982580978046
2020-08-11: 741.5053768220996
2020-08-12: 773.4717480853012
2020-08-13: 802.5331620958038
2020-08-14: 830.8994318684829
2020-08-15: 858.9254849143188
2020-08-16: 887.9236019464142
2020-08-17: 917.790196780111
2020-08-18: 947.6595870428666
2020-08-19: 977.5750607908044
2020-08-20: 1007.6617588570653

2020-08-20: 503.9641598612433
2020-08-21: 530.189533356576
2020-08-22: 556.6358449666031
2020-08-23: 583.2955465275313
2020-08-24: 610.1776022572208
2020-08-25: 637.28855710076
2020-08-26: 664.6287075662276
2020-08-27: 689.2237261286753
2020-08-28: 720.8455669738876
2020-08-29: 755.0220630908123
2020-08-30: 786.9168258388388
2020-08-31: 818.1921004967458

Predicting for Mexico__nan
2020-08-01: 4782.782669347631
2020-08-02: 6412.66894526198
2020-08-03: 6664.801978274375
2020-08-04: 6926.385433761324
2020-08-05: 5772.14409113227
2020-08-06: 5987.090993999939
2020-08-07: 6390.635228024158
2020-08-08: 6526.271815771243
2020-08-09: 6618.132837735544
2020-08-10: 6571.549455281944
2020-08-11: 6628.441417035468
2020-08-12: 6748.124029551321
2020-08-13: 6842.714806979987
2020-08-14: 6924.256789939351
2020-08-15: 6991.55142672974
2020-08-16: 7066.59975980916
2020-08-17: 7152.0948087246215
2020-08-18: 7237.190923448789
2020-08-19: 7320.554501290076
2020-08-20: 7402.703860292423
2020-08-21: 7485.9

2020-08-19: 539.4015581650724
2020-08-20: 565.9046339711415
2020-08-21: 592.6419649688066
2020-08-22: 619.6162170639182
2020-08-23: 646.8096660867179
2020-08-24: 674.2268079019145
2020-08-25: 701.8745974451665
2020-08-26: 729.756230779217
2020-08-27: 739.1231537886515
2020-08-28: 746.9006868046188
2020-08-29: 766.2047394743515
2020-08-30: 784.3162464033678
2020-08-31: 802.334681437492

Predicting for Nigeria__nan
2020-08-01: 271.4301595802333
2020-08-02: 378.3654165361079
2020-08-03: 406.2673915459398
2020-08-04: 423.23159714533165
2020-08-05: 394.1581110234102
2020-08-06: 431.53231535792764
2020-08-07: 474.5025757371572
2020-08-08: 503.037725893967
2020-08-09: 528.1920173313415
2020-08-10: 548.8641245318499
2020-08-11: 575.410505093025
2020-08-12: 604.7507563973759
2020-08-13: 632.2166525728171
2020-08-14: 659.335662706199
2020-08-15: 686.22921859815
2020-08-16: 713.8154746211937
2020-08-17: 742.0749336768515
2020-08-18: 770.412899691971
2020-08-19: 798.8566114156292
2020-08-20: 827.4

2020-08-27: 685.0859339911915
2020-08-28: 697.7520683530092
2020-08-29: 719.3317792768084
2020-08-30: 762.5378727686186
2020-08-31: 808.0447258399317

Predicting for Poland__nan
2020-08-01: 4517.571879411634
2020-08-02: 5785.405390596045
2020-08-03: 5862.156379497507
2020-08-04: 6408.2785994223095
2020-08-05: 5317.201302107098
2020-08-06: 5487.532276212717
2020-08-07: 5840.045358707751
2020-08-08: 5944.308049687707
2020-08-09: 6057.904235224428
2020-08-10: 6022.634599675798
2020-08-11: 6071.899120868687
2020-08-12: 6180.440991703842
2020-08-13: 6266.234380406663
2020-08-14: 6345.106687925683
2020-08-15: 6410.062609076644
2020-08-16: 6480.310048514392
2020-08-17: 6560.052249994312
2020-08-18: 6639.568129296315
2020-08-19: 6718.042283143404
2020-08-20: 6795.468338744434
2020-08-21: 6873.66995990198
2020-08-22: 6953.632976554782
2020-08-23: 7034.425894411736
2020-08-24: 7115.66509927389
2020-08-25: 7197.356695207726
2020-08-26: 7279.705592418509
2020-08-27: 7344.120252723927
2020-08-28: 7

2020-08-01: 87.75435829255872
2020-08-02: 125.12704811120216
2020-08-03: 138.3756315278718
2020-08-04: 160.25864428574073
2020-08-05: 174.68351492578574
2020-08-06: 202.60852660565058
2020-08-07: 230.14565823819547
2020-08-08: 253.57404040780136
2020-08-09: 277.3943883870653
2020-08-10: 300.9856090025394
2020-08-11: 325.9404982133422
2020-08-12: 351.4320310948669
2020-08-13: 376.0522306211302
2020-08-14: 401.0582003952348
2020-08-15: 426.34380124845615
2020-08-16: 451.93882668067766
2020-08-17: 477.8145241421004
2020-08-18: 503.8018762751687
2020-08-19: 529.9934724634516
2020-08-20: 556.42031357304
2020-08-21: 583.0817313387483
2020-08-22: 609.9763585479745
2020-08-23: 637.0875834726394
2020-08-24: 664.4219393026935
2020-08-25: 691.9874442813855
2020-08-26: 719.7864835885343
2020-08-27: 729.0697947553863
2020-08-28: 736.7627148779848
2020-08-29: 751.5188310987353
2020-08-30: 768.4355476047019
2020-08-31: 785.3872999559005

Predicting for Singapore__nan
2020-08-01: 58.3428597945996
2020

2020-08-13: 336.3142364592442
2020-08-14: 361.01567107557617
2020-08-15: 386.02790348177876
2020-08-16: 411.29863609000455
2020-08-17: 436.80147927323384
2020-08-18: 462.4326426073598
2020-08-19: 488.27961666105165
2020-08-20: 514.3635789390921
2020-08-21: 540.6747474685721
2020-08-22: 567.2099206891896
2020-08-23: 593.9595377828671
2020-08-24: 620.9314901666994
2020-08-25: 648.1324201567154
2020-08-26: 675.5634254130566
2020-08-27: 692.3625982192359
2020-08-28: 717.2220035490557
2020-08-29: 747.2549619627346
2020-08-30: 775.2260623239667
2020-08-31: 802.4442321868432

Predicting for Slovak Republic__nan
2020-08-01: 876.3344202512701
2020-08-02: 1165.723968123411
2020-08-03: 1269.2436989145845
2020-08-04: 1395.4812091512272
2020-08-05: 1173.5740179524264
2020-08-06: 1217.4403904300925
2020-08-07: 1307.9670885347903
2020-08-08: 1355.8368212327418
2020-08-09: 1399.549831894234
2020-08-10: 1409.273609024012
2020-08-11: 1437.0188860614048
2020-08-12: 1478.3283787301457
2020-08-13: 1515.753

2020-08-20: 526.4382523082095
2020-08-21: 552.8500956245432
2020-08-22: 579.4887015642862
2020-08-23: 606.3422831763444
2020-08-24: 633.4182633704002
2020-08-25: 660.7237555577074
2020-08-26: 688.2604144642023
2020-08-27: 705.1664027769248
2020-08-28: 724.7903504740195
2020-08-29: 750.7104675621244
2020-08-30: 774.3752356478092
2020-08-31: 798.2303123006252

Predicting for Tajikistan__nan
2020-08-01: 74.40634418688
2020-08-02: 107.37717825578481
2020-08-03: 124.99265074407137
2020-08-04: 144.92633215467754
2020-08-05: 161.33711465199937
2020-08-06: 188.64533944979388
2020-08-07: 215.18590739809449
2020-08-08: 238.84628981750075
2020-08-09: 262.5054663290776
2020-08-10: 286.12909860156117
2020-08-11: 310.96768116110593
2020-08-12: 336.2237903707506
2020-08-13: 360.72787546046965
2020-08-14: 385.61433160387793
2020-08-15: 410.78513552427273
2020-08-16: 436.2544890079019
2020-08-17: 461.9868511757979
2020-08-18: 487.83912721521295
2020-08-19: 513.8981665298792
2020-08-20: 540.191613129807

2020-08-28: 1148.1162704161966
2020-08-29: 1176.1075265836355
2020-08-30: 1209.7029032175287
2020-08-31: 1240.2832484264666

Predicting for United States__Alabama
2020-08-01: 1683.3992020587457
2020-08-02: 2241.83151269207
2020-08-03: 2409.0595540223535
2020-08-04: 2397.215760537894
2020-08-05: 2037.9482184874792
2020-08-06: 2149.1872759342546
2020-08-07: 2301.695165654424
2020-08-08: 2369.479991764325
2020-08-09: 2406.454451135297
2020-08-10: 2405.0079505794115
2020-08-11: 2443.6134366396373
2020-08-12: 2501.241793230604
2020-08-13: 2549.944461033807
2020-08-14: 2593.1169165056976
2020-08-15: 2632.0766176723837
2020-08-16: 2674.5725727766944
2020-08-17: 2720.628154972738
2020-08-18: 2766.5433208240097
2020-08-19: 2811.8724975231257
2020-08-20: 2856.9526533571416
2020-08-21: 2902.64089831068
2020-08-22: 2949.153579810107
2020-08-23: 2996.08769259488
2020-08-24: 3043.288486463661
2020-08-25: 3090.792043660656
2020-08-26: 3138.704167756981
2020-08-27: 3176.2108845540424
2020-08-28: 3216.

2020-08-23: 6657.814856674729
2020-08-24: 6735.815186207436
2020-08-25: 6814.284492365824
2020-08-26: 6893.435524065214
2020-08-27: 6970.399673496783
2020-08-28: 7054.861931141414
2020-08-29: 7140.8024499207495
2020-08-30: 7225.019110109267
2020-08-31: 7309.585884539739

Predicting for United States__Georgia
2020-08-01: 2266.6148961075055
2020-08-02: 3419.407124714761
2020-08-03: 3460.2648257156047
2020-08-04: 3478.231620397464
2020-08-05: 2947.3079545164323
2020-08-06: 3069.1319325880995
2020-08-07: 3311.851647015195
2020-08-08: 3390.814908269405
2020-08-09: 3432.4123700910814
2020-08-10: 3421.345831452922
2020-08-11: 3463.7746735329188
2020-08-12: 3539.097937622211
2020-08-13: 3599.2858408945012
2020-08-14: 3650.6976431905036
2020-08-15: 3696.463208697793
2020-08-16: 3746.9127689488037
2020-08-17: 3802.8483435615085
2020-08-18: 3858.4326739567496
2020-08-19: 3912.934164671718
2020-08-20: 3967.0200073593796
2020-08-21: 4021.875894560112
2020-08-22: 4077.8637497590803
2020-08-23: 4134.

2020-08-03: 2995.9948868657816
2020-08-04: 3261.383830274382
2020-08-05: 2736.784312669845
2020-08-06: 2842.7705771870656
2020-08-07: 3033.323360698539
2020-08-08: 3094.1750887773364
2020-08-09: 3160.112114593653
2020-08-10: 3155.2037474097597
2020-08-11: 3193.4477165277012
2020-08-12: 3260.6466916986624
2020-08-13: 3315.482939579923
2020-08-14: 3367.0001967768344
2020-08-15: 3412.103958351849
2020-08-16: 3460.1357639150233
2020-08-17: 3512.9511180573804
2020-08-18: 3565.620729786913
2020-08-19: 3617.852973001352
2020-08-20: 3669.7213409707233
2020-08-21: 3722.119657731003
2020-08-22: 3775.5065721256165
2020-08-23: 3829.403519059794
2020-08-24: 3883.6305564790664
2020-08-25: 3938.203202238759
2020-08-26: 3993.224775965691
2020-08-27: 4037.90844888837
2020-08-28: 4090.909817713065
2020-08-29: 4147.821569414399
2020-08-30: 4203.620571674074
2020-08-31: 4259.1369860443565

Predicting for United States__Maryland
2020-08-01: 1379.902727667215
2020-08-02: 1807.5226086715936
2020-08-03: 1793.


Predicting for United States__Nebraska
2020-08-01: 750.8879989478212
2020-08-02: 993.5180079004797
2020-08-03: 1000.018677866281
2020-08-04: 1012.1440533221476
2020-08-05: 897.313606527141
2020-08-06: 964.3569263734089
2020-08-07: 1041.5295311508487
2020-08-08: 1075.1900894879489
2020-08-09: 1102.5473823501552
2020-08-10: 1118.3498298495465
2020-08-11: 1149.8116571368228
2020-08-12: 1188.0449525945867
2020-08-13: 1221.297962575822
2020-08-14: 1252.997019243665
2020-08-15: 1283.7754298087914
2020-08-16: 1316.1314984474402
2020-08-17: 1349.908091001125
2020-08-18: 1383.5693703083905
2020-08-19: 1417.1408928351589
2020-08-20: 1450.8297766980857
2020-08-21: 1484.9151118217144
2020-08-22: 1519.4480025296878
2020-08-23: 1554.2594435859921
2020-08-24: 1589.3090699415175
2020-08-25: 1624.625844984298
2020-08-26: 1660.2479923055241
2020-08-27: 1685.3273648153013
2020-08-28: 1713.197374122469
2020-08-29: 1745.9394139275164
2020-08-30: 1777.8049759684727
2020-08-31: 1809.8785990840597

Predictin

2020-08-01: 4549.281357218752
2020-08-02: 6146.654998736234
2020-08-03: 6123.022367000461
2020-08-04: 6022.257668729226
2020-08-05: 5149.2418588981645
2020-08-06: 5440.9854454247115
2020-08-07: 5822.509335360467
2020-08-08: 5921.7306872099225
2020-08-09: 5964.142361867695
2020-08-10: 5932.673068502089
2020-08-11: 6002.121130612603
2020-08-12: 6116.533343004398
2020-08-13: 6201.539988766809
2020-08-14: 6272.190826470396
2020-08-15: 6334.995668514097
2020-08-16: 6407.119717652151
2020-08-17: 6487.557218169922
2020-08-18: 6566.5536488500875
2020-08-19: 6643.574795491363
2020-08-20: 6719.994448094449
2020-08-21: 6797.800475535821
2020-08-22: 6877.34366924033
2020-08-23: 6957.535636099929
2020-08-24: 7038.0381531013245
2020-08-25: 7119.012857227145
2020-08-26: 7200.719653371662
2020-08-27: 7272.389405984876
2020-08-28: 7352.608849396139
2020-08-29: 7438.425623886596
2020-08-30: 7522.616870902911
2020-08-31: 7606.528128428844

Predicting for United States__Rhode Island
2020-08-01: 662.605945

2020-08-01: 106.17351728382157
2020-08-02: 143.5818073129896
2020-08-03: 161.69094138157868
2020-08-04: 183.50462160416708
2020-08-05: 193.62684938925818
2020-08-06: 222.5905462901324
2020-08-07: 250.92954406893438
2020-08-08: 274.99543529829566
2020-08-09: 299.08385768247496
2020-08-10: 322.3712621891723
2020-08-11: 347.4664611252739
2020-08-12: 373.24272451253125
2020-08-13: 398.1135694193972
2020-08-14: 423.32438490448027
2020-08-15: 448.74740083121833
2020-08-16: 474.51226040001023
2020-08-17: 500.5886503708802
2020-08-18: 526.7778686518681
2020-08-19: 553.1662045051851
2020-08-20: 579.7829155759989
2020-08-21: 606.6372588328459
2020-08-22: 633.7306044224026
2020-08-23: 661.0429665948259
2020-08-24: 688.5793141212787
2020-08-25: 716.3474092010015
2020-08-26: 744.3506727979504
2020-08-27: 761.7282183035311
2020-08-28: 781.8279490817722
2020-08-29: 808.2275911204142
2020-08-30: 834.1373201592103
2020-08-31: 859.3810935679144

Predicting for United States__Washington
2020-08-01: 1604.

2020-08-04: 118.325747521373
2020-08-05: 138.73739569980722
2020-08-06: 164.8625203766143
2020-08-07: 189.80741813051853
2020-08-08: 213.18406554910098
2020-08-09: 236.73206533789082
2020-08-10: 260.5898690918989
2020-08-11: 285.22812404305154
2020-08-12: 310.08285481551314
2020-08-13: 334.3195937975345
2020-08-14: 358.9979745206503
2020-08-15: 383.99613405580754
2020-08-16: 409.25426691369194
2020-08-17: 434.73955108568157
2020-08-18: 460.352220844643
2020-08-19: 486.1808679432341
2020-08-20: 512.2473431965996
2020-08-21: 538.541294548505
2020-08-22: 565.05862138168
2020-08-23: 591.7900107319235
2020-08-24: 618.7436061260712
2020-08-25: 645.926143542471
2020-08-26: 673.338672500046
2020-08-27: 690.1191721462201
2020-08-28: 714.9597076511591
2020-08-29: 739.0235285288145
2020-08-30: 762.169426285592
2020-08-31: 786.358925351592

Predicting for Yemen__nan
2020-08-01: 16.68721041970934
2020-08-02: 44.07251853981379
2020-08-03: 27.052564111903642
2020-08-04: 68.48320300532117
2020-08-05: 

In [None]:
# Check the predictions
preds_df.head()

# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [None]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

In [None]:
!head predictions/2020-08-01_2020-08-04.csv

# Test cases
We can generate a prediction file. Let's validate a few cases...

In [None]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [None]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [None]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [None]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

In [None]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

### Check it

In [None]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")