# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [3]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [4]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x23e3458bdc0>)

In [5]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [6]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'M1_Wildcard', 'ConfirmedCases',
       'ConfirmedDeaths', 'StringencyIndex', 'StringencyIndexForDisplay',
       'StringencyLegacyIndex', 'StringencyLegacyIndexForDisplay',
       'GovernmentResponseIndex', 'Gove

In [7]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [8]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [9]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [10]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [11]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [12]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [13]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87064,Zimbabwe,,Zimbabwe__nan,2020-07-27,78.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87065,Zimbabwe,,Zimbabwe__nan,2020-07-28,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87066,Zimbabwe,,Zimbabwe__nan,2020-07-29,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87067,Zimbabwe,,Zimbabwe__nan,2020-07-30,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [14]:
# Set number of past days to use to make predictions
nb_lookback_days = 2

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [15]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [16]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.5,
                                                    random_state=301)

In [None]:
# Création Random forest : 
from sklearn.utils import shuffle
from sklearn.ensemble import RandomForestRegressor
import time

rf = RandomForestRegressor(n_estimators=10, criterion='mae', max_depth = None, 
                          min_samples_split=2, max_features= None)

start_time = time.time()
rf.fit(X_train, y_train)
interval = time.time() - start_time
print ('Total time in seconds:', interval)

In [17]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 140.7106897417153
Test MAE: 152.49541464435478


In [18]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.0010999660607081468
Day -6 NewCases 0.4395737128463477
Day -5 NewCases 0.21709357272996888
Day -4 NewCases 0.058996470378446894
Day -3 NewCases 0.06947070311451588
Day -2 NewCases 0.051910773903622094
Day -1 NewCases 0.2384190930581341
Day -26 C6_Stay at home requirements 4.314077585030425
Day -22 C2_Workplace closing 9.71354013447354
Day -17 C2_Workplace closing 5.767515924930496
Intercept 26.55512761911598


In [19]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [20]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [21]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 58.83064662471841
2020-08-02: 71.32284867347803
2020-08-03: 78.98708861551941
2020-08-04: 90.23987091036582
2020-08-05: 87.81079198105863
2020-08-06: 98.75906970570075
2020-08-07: 131.04183736458052
2020-08-08: 147.02193631910268
2020-08-09: 158.95064405735275
2020-08-10: 169.94035162547763
2020-08-11: 177.51587340468132
2020-08-12: 193.48225636298656
2020-08-13: 216.82128801942588
2020-08-14: 234.0388040158979
2020-08-15: 248.5583794521748
2020-08-16: 261.9657420386088
2020-08-17: 275.29735675692655
2020-08-18: 293.2898014228077
2020-08-19: 314.0742201473187
2020-08-20: 332.42691393545425
2020-08-21: 349.2299624107331
2020-08-22: 365.4979344992149
2020-08-23: 382.5309899042977
2020-08-24: 407.8898919660621
2020-08-25: 430.0819325457263
2020-08-26: 450.57044796445103
2020-08-27: 465.9978584772198
2020-08-28: 484.6446272952148
2020-08-29: 515.3477978671945
2020-08-30: 541.9000831891367
2020-08-31: 566.2609038448626

Predicting for Afghanistan__nan


2020-08-02: 2777.1793731666517
2020-08-03: 3002.2532845060327
2020-08-04: 2959.1075502670433
2020-08-05: 2639.3173861240666
2020-08-06: 1674.873575540922
2020-08-07: 2514.9605970299313
2020-08-08: 2970.0581444511813
2020-08-09: 3126.5711702281747
2020-08-10: 3100.8314389444704
2020-08-11: 2834.0040623697723
2020-08-12: 2564.854958718102
2020-08-13: 2961.2970849714084
2020-08-14: 3256.7404379252716
2020-08-15: 3377.0312143574165
2020-08-16: 3363.640932693805
2020-08-17: 3234.857010850697
2020-08-18: 3208.2398625175565
2020-08-19: 3439.483387568118
2020-08-20: 3639.91741226108
2020-08-21: 3740.556528944445
2020-08-22: 3755.7379719229025
2020-08-23: 3749.1725295270558
2020-08-24: 3805.5712797194724
2020-08-25: 3970.801255014097
2020-08-26: 4123.770478132545
2020-08-27: 4218.650336524838
2020-08-28: 4269.3769743450175
2020-08-29: 4316.145834391408
2020-08-30: 4406.200249245206
2020-08-31: 4545.121766420641

Predicting for Burundi__nan
2020-08-01: 56.71148451382387
2020-08-02: 66.5469936994

2020-08-19: 1620.9646491202138
2020-08-20: 1695.6945613816024
2020-08-21: 1723.7039689120152
2020-08-22: 1719.910282776772
2020-08-23: 1720.919942001573
2020-08-24: 1778.2787837839169
2020-08-25: 1867.4463489032764
2020-08-26: 1930.6039992369529
2020-08-27: 1957.2777312436401
2020-08-28: 1975.0767186240846
2020-08-29: 2003.2450517324155
2020-08-30: 2061.036255660064
2020-08-31: 2132.0571166671166

Predicting for Belarus__nan
2020-08-01: 1204.931488291726
2020-08-02: 1465.5176483110426
2020-08-03: 1541.6147793577647
2020-08-04: 1617.1783836778868
2020-08-05: 1378.394237964058
2020-08-06: 920.1094841658811
2020-08-07: 1392.696539460197
2020-08-08: 1601.8422914573061
2020-08-09: 1680.4555130833924
2020-08-10: 1697.3100711066822
2020-08-11: 1543.4490727419052
2020-08-12: 1426.3238101842044
2020-08-13: 1648.8576157957573
2020-08-14: 1795.6606181758484
2020-08-15: 1863.4445039117431
2020-08-16: 1869.868795482892
2020-08-17: 1805.204547507959
2020-08-18: 1794.379413864639
2020-08-19: 1922.448

2020-08-03: 131.84435339908464
2020-08-04: 108.80358523131275
2020-08-05: 113.84522997669325
2020-08-06: 144.68838809659547
2020-08-07: 241.5276659877805
2020-08-08: 292.5087160248432
2020-08-09: 231.620208142748
2020-08-10: 219.07386156960308
2020-08-11: 231.06335024071993
2020-08-12: 266.6348892366243
2020-08-13: 324.94398202192326
2020-08-14: 350.0832028573628
2020-08-15: 332.8496193024895
2020-08-16: 333.21599076339504
2020-08-17: 350.5740180508514
2020-08-18: 383.32544870480797
2020-08-19: 422.17165036966526
2020-08-20: 441.73434607335804
2020-08-21: 444.2460887609155
2020-08-22: 454.4017496652266
2020-08-23: 475.34492323667627
2020-08-24: 505.04300835143795
2020-08-25: 535.4232101353365
2020-08-26: 555.449425495507
2020-08-27: 568.4301806276973
2020-08-28: 585.4407857854446
2020-08-29: 609.0183426694277
2020-08-30: 637.2788551477993
2020-08-31: 664.9227551848761

Predicting for Central African Republic__nan
2020-08-01: 52.419979739204436
2020-08-02: 67.10765315697206
2020-08-03: 

Predicting for Congo__nan
2020-08-01: 57.56692588285915
2020-08-02: 89.78934111712333
2020-08-03: 126.4903002008672
2020-08-04: 89.61098653513523
2020-08-05: 88.22949245828299
2020-08-06: 102.93374398939838
2020-08-07: 138.27118566425233
2020-08-08: 167.3828835310932
2020-08-09: 185.26003194635186
2020-08-10: 177.88519157248015
2020-08-11: 183.7064639044048
2020-08-12: 201.80457250185364
2020-08-13: 228.83359431766775
2020-08-14: 252.90325546378034
2020-08-15: 267.9350584733194
2020-08-16: 273.755490829491
2020-08-17: 285.6700080023059
2020-08-18: 310.8742933359081
2020-08-19: 335.9196643546652
2020-08-20: 358.24385694948853
2020-08-21: 380.9855248265685
2020-08-22: 395.9549308130669
2020-08-23: 424.16186575092377
2020-08-24: 451.0903751940145
2020-08-25: 477.239952424775
2020-08-26: 512.2063757440012
2020-08-27: 538.7062926234125
2020-08-28: 562.973546105963
2020-08-29: 592.3683279182429
2020-08-30: 622.0851971578744
2020-08-31: 653.0607097295165

Predicting for Colombia__nan
2020-08-

2020-08-03: 70.1525604413709
2020-08-04: 76.29632794566658
2020-08-05: 83.09882510111169
2020-08-06: 94.40781724640128
2020-08-07: 123.73402055974265
2020-08-08: 138.7887719594426
2020-08-09: 149.10961049036993
2020-08-10: 159.24041332867657
2020-08-11: 170.41962589463182
2020-08-12: 186.5611829695476
2020-08-13: 208.47440433438456
2020-08-14: 224.80168329205017
2020-08-15: 238.3655112264836
2020-08-16: 251.81309264309388
2020-08-17: 266.57989168614586
2020-08-18: 290.3366297427418
2020-08-19: 311.6964551223661
2020-08-20: 329.98726202424007
2020-08-21: 346.87817615438996
2020-08-22: 363.87212850891717
2020-08-23: 392.7083250034523
2020-08-24: 417.81427210076265
2020-08-25: 440.86017062033324
2020-08-26: 462.394437918786
2020-08-27: 474.6763542004929
2020-08-28: 495.5534621462462
2020-08-29: 522.1688209076079
2020-08-30: 547.7926062216592
2020-08-31: 572.2913047178698

Predicting for Dominica__nan
2020-08-01: 51.79226700901885
2020-08-02: 65.11825254325383
2020-08-03: 69.17773244027025

2020-08-01: 319.404093273413
2020-08-02: 428.03525965382494
2020-08-03: 413.93557371877756
2020-08-04: 401.3586362148692
2020-08-05: 370.1633570730034
2020-08-06: 283.4866169200371
2020-08-07: 423.0971787087767
2020-08-08: 494.0161802791702
2020-08-09: 501.50127818079585
2020-08-10: 499.2362559546446
2020-08-11: 479.7046898384445
2020-08-12: 471.8078037066581
2020-08-13: 545.8652291854834
2020-08-14: 594.5741623411791
2020-08-15: 611.2073235871302
2020-08-16: 617.152849887167
2020-08-17: 616.8843699841971
2020-08-18: 627.9752962023216
2020-08-19: 675.1193929675509
2020-08-20: 712.3708775545466
2020-08-21: 733.1100985136533
2020-08-22: 746.4914449315456
2020-08-23: 748.7103763856618
2020-08-24: 768.6821871257176
2020-08-25: 804.5347501321559
2020-08-26: 835.992022644578
2020-08-27: 850.3057770365801
2020-08-28: 865.4070061632331
2020-08-29: 879.3768538067208
2020-08-30: 902.906588073844
2020-08-31: 933.7462498003582

Predicting for Ethiopia__nan
2020-08-01: 475.66312136701106
2020-08-02

2020-08-05: 269.6120703938446
2020-08-06: 377.972208835257
2020-08-07: 696.9565834115118
2020-08-08: 743.4336158124139
2020-08-09: 661.1587105424164
2020-08-10: 546.6157444359656
2020-08-11: 509.068683317108
2020-08-12: 607.9504047841239
2020-08-13: 767.1914920928142
2020-08-14: 803.8441834994403
2020-08-15: 764.522147490079
2020-08-16: 715.3541536165554
2020-08-17: 718.3671698757958
2020-08-18: 799.7260692389904
2020-08-19: 891.6084220566441
2020-08-20: 922.797069131908
2020-08-21: 912.9138672893247
2020-08-22: 902.3574655657242
2020-08-23: 935.5613988223637
2020-08-24: 999.7969025303748
2020-08-25: 1062.768494087053
2020-08-26: 1094.4657459571
2020-08-27: 1100.797386698504
2020-08-28: 1114.673920963101
2020-08-29: 1152.7572435935754
2020-08-30: 1206.8107558364336
2020-08-31: 1257.6452198327274

Predicting for United Kingdom__Wales
2020-08-01: 630.6743169655472
2020-08-02: 596.6489775302208
2020-08-03: 345.83219607730376
2020-08-04: 209.51894777082686
2020-08-05: 197.23055583282667
20

2020-08-01: 364.24955612830377
2020-08-02: 433.9739312909629
2020-08-03: 410.9065096334241
2020-08-04: 415.7571726504677
2020-08-05: 362.9167032019064
2020-08-06: 292.34780770339
2020-08-07: 446.65738278292724
2020-08-08: 502.4416288104401
2020-08-09: 506.7220256664329
2020-08-10: 507.8305605847656
2020-08-11: 482.75414683082386
2020-08-12: 482.8426945411956
2020-08-13: 561.7548196954511
2020-08-14: 604.5171672248349
2020-08-15: 619.5192149191446
2020-08-16: 625.8511837477628
2020-08-17: 624.7633913138594
2020-08-18: 645.5404238245816
2020-08-19: 695.7337751597759
2020-08-20: 731.2182270711106
2020-08-21: 751.6793654408145
2020-08-22: 765.6761718911833
2020-08-23: 779.5411519179776
2020-08-24: 807.1168147779591
2020-08-25: 857.9156342089383
2020-08-26: 893.3427136430618
2020-08-27: 910.6035691967553
2020-08-28: 930.898963101184
2020-08-29: 954.1884679742112
2020-08-30: 1006.6758416068784
2020-08-31: 1052.8782238895565

Predicting for Iran__nan
2020-08-01: 9103.190404572182
2020-08-02: 

2020-08-10: 1061.3822591036014
2020-08-11: 998.6702003294114
2020-08-12: 962.3107759493533
2020-08-13: 1125.187103166401
2020-08-14: 1210.354415627382
2020-08-15: 1231.5158047297273
2020-08-16: 1226.6051160003021
2020-08-17: 1206.5737332751132
2020-08-18: 1233.1129103330331
2020-08-19: 1329.3532444433345
2020-08-20: 1394.2056110407348
2020-08-21: 1423.6550706737357
2020-08-22: 1435.8105048116759
2020-08-23: 1457.0848701922719
2020-08-24: 1501.1970672037396
2020-08-25: 1571.8134318574598
2020-08-26: 1628.1412376700569
2020-08-27: 1660.897562749065
2020-08-28: 1689.133679195957
2020-08-29: 1724.586859990016
2020-08-30: 1774.848402717445
2020-08-31: 1835.8842307536252

Predicting for Kenya__nan
2020-08-01: 985.3003248287125
2020-08-02: 1294.1758937112045
2020-08-03: 1192.6552855221298
2020-08-04: 1214.270417999688
2020-08-05: 976.9572131594747
2020-08-06: 720.7968546054323
2020-08-07: 1142.0187387026335
2020-08-08: 1328.7554031543223
2020-08-09: 1323.7531299488828
2020-08-10: 1304.2747640

2020-08-05: 723.1940256612057
2020-08-06: 442.0848621926028
2020-08-07: 628.2155103347765
2020-08-08: 730.6102801182201
2020-08-09: 722.6378071892613
2020-08-10: 705.6930037977952
2020-08-11: 758.6610373918406
2020-08-12: 692.9881923330153
2020-08-13: 782.1708045631573
2020-08-14: 846.1885707151138
2020-08-15: 857.5732885215988
2020-08-16: 869.9737071828412
2020-08-17: 892.2375757480165
2020-08-18: 899.0758435669089
2020-08-19: 956.4230154466173
2020-08-20: 1003.4390975311012
2020-08-21: 1027.1810054265395
2020-08-22: 1049.9663164428646
2020-08-23: 1084.2790869841574
2020-08-24: 1113.545974377916
2020-08-25: 1160.7112823058164
2020-08-26: 1203.0879735584604
2020-08-27: 1235.1318763582717
2020-08-28: 1267.4658326015833
2020-08-29: 1304.0265223070505
2020-08-30: 1342.2899395935751
2020-08-31: 1387.4118317851853

Predicting for Sri Lanka__nan
2020-08-01: 370.92398305495124
2020-08-02: 484.49187395353135
2020-08-03: 509.07868045240514
2020-08-04: 526.1081213692783
2020-08-05: 434.435993666

2020-08-01: 57.62281697941087
2020-08-02: 71.09903862795726
2020-08-03: 89.15204786141113
2020-08-04: 120.1908696177512
2020-08-05: 105.99841428050226
2020-08-06: 105.10727563703429
2020-08-07: 135.60040018646993
2020-08-08: 153.57587286775623
2020-08-09: 173.2340580484361
2020-08-10: 191.50248718086397
2020-08-11: 193.52830742928347
2020-08-12: 203.5983163486933
2020-08-13: 225.8386001336736
2020-08-14: 245.085110697512
2020-08-15: 264.2743958841595
2020-08-16: 280.4794693616043
2020-08-17: 291.0850919865554
2020-08-18: 311.94794027700107
2020-08-19: 333.696080193859
2020-08-20: 354.3076749317743
2020-08-21: 374.4001796128043
2020-08-22: 392.4491175575284
2020-08-23: 419.4328202640753
2020-08-24: 443.3188431968038
2020-08-25: 466.91123505532687
2020-08-26: 490.161645162421
2020-08-27: 512.9541255582646
2020-08-28: 536.4573341045643
2020-08-29: 563.3178949129452
2020-08-30: 589.548242555416
2020-08-31: 615.6182547639793

Predicting for Mexico__nan
2020-08-01: 4068.4784961285645
2020-08

2020-08-01: 807.89648584542
2020-08-02: 1156.5892124133384
2020-08-03: 1081.5409355990332
2020-08-04: 1121.2734841109855
2020-08-05: 985.0709338216931
2020-08-06: 663.6950195701662
2020-08-07: 1007.9585952836879
2020-08-08: 1204.1090421763597
2020-08-09: 1214.4043434118225
2020-08-10: 1219.811584406532
2020-08-11: 1125.9762545748008
2020-08-12: 1049.4914078324632
2020-08-13: 1220.9266436008415
2020-08-14: 1340.4661934312744
2020-08-15: 1372.9318108013172
2020-08-16: 1376.2921732470186
2020-08-17: 1339.3512683344338
2020-08-18: 1349.2871710925178
2020-08-19: 1453.112812948455
2020-08-20: 1535.7976411607
2020-08-21: 1574.5438579846955
2020-08-22: 1589.3661578623528
2020-08-23: 1602.4169491333118
2020-08-24: 1640.7345914093503
2020-08-25: 1717.463628033514
2020-08-26: 1784.3993410985327
2020-08-27: 1819.4855007032697
2020-08-28: 1848.3077773108075
2020-08-29: 1880.2493241059565
2020-08-30: 1929.2625808016496
2020-08-31: 1994.9801795323283

Predicting for Namibia__nan
2020-08-01: 93.937152

2020-08-01: 318.9054181135641
2020-08-02: 278.77947007056525
2020-08-03: 176.47586162524937
2020-08-04: 285.89026132541585
2020-08-05: 483.1009810353888
2020-08-06: 279.4179203684163
2020-08-07: 373.33717206529167
2020-08-08: 365.8144176293351
2020-08-09: 345.1200500822061
2020-08-10: 425.1000036720563
2020-08-11: 490.70350140310757
2020-08-12: 439.68745267415625
2020-08-13: 474.6923979967511
2020-08-14: 481.96990752394527
2020-08-15: 494.1065517019153
2020-08-16: 546.1764116644623
2020-08-17: 579.6419169905654
2020-08-18: 588.3774358861932
2020-08-19: 613.4417580157648
2020-08-20: 631.1401422288162
2020-08-21: 655.8890793211327
2020-08-22: 695.1320738236456
2020-08-23: 744.5726519757869
2020-08-24: 770.4788508889404
2020-08-25: 798.2776517401313
2020-08-26: 825.180246889235
2020-08-27: 861.4858672041772
2020-08-28: 903.0085748808573
2020-08-29: 945.7020211092259
2020-08-30: 979.6227728510693
2020-08-31: 1013.0413912415636

Predicting for Pakistan__nan
2020-08-01: 2263.1209915748823
20

2020-08-26: 8299.317129163417
2020-08-27: 8426.384026038104
2020-08-28: 8464.094076908606
2020-08-29: 8527.434024744047
2020-08-30: 8725.040078439144
2020-08-31: 9009.958487502292

Predicting for Paraguay__nan
2020-08-01: 701.9317284832008
2020-08-02: 807.5177631054584
2020-08-03: 738.0279618655512
2020-08-04: 803.5264578842115
2020-08-05: 651.0373131904468
2020-08-06: 499.5735098478699
2020-08-07: 786.7882978307205
2020-08-08: 872.7729929479951
2020-08-09: 872.4538633006416
2020-08-10: 883.4690596794427
2020-08-11: 809.1568109152669
2020-08-12: 792.6670278711878
2020-08-13: 930.376538681821
2020-08-14: 995.8837086949253
2020-08-15: 1015.4663864116734
2020-08-16: 1020.8386642651972
2020-08-17: 999.577853397652
2020-08-18: 1028.3460129764896
2020-08-19: 1110.366279469094
2020-08-20: 1163.4527755956587
2020-08-21: 1190.9579498479761
2020-08-22: 1205.4341621157077
2020-08-23: 1225.4593832620972
2020-08-24: 1266.456382635533
2020-08-25: 1327.5090897920563
2020-08-26: 1375.835462708354
2020

Predicting for Senegal__nan
2020-08-01: 63.5781538942601
2020-08-02: 76.12232701041901
2020-08-03: 82.53493478046839
2020-08-04: 89.88827451835158
2020-08-05: 92.91048286883185
2020-08-06: 101.52795836570115
2020-08-07: 135.28042223758632
2020-08-08: 151.3949284076912
2020-08-09: 162.1949873482468
2020-08-10: 172.35514263675822
2020-08-11: 181.65628968607024
2020-08-12: 197.2210519247036
2020-08-13: 221.10234716154227
2020-08-14: 238.31491654590468
2020-08-15: 252.25929989660403
2020-08-16: 265.551977456669
2020-08-17: 279.5284761520379
2020-08-18: 291.8040314825438
2020-08-19: 311.45385867919117
2020-08-20: 329.15086213211265
2020-08-21: 345.10183140088446
2020-08-22: 360.80547193342824
2020-08-23: 366.87596749496396
2020-08-24: 380.96789976250494
2020-08-25: 399.16775932904125
2020-08-26: 416.8502619730451
2020-08-27: 425.1600884120723
2020-08-28: 438.3932720658295
2020-08-29: 450.026800266643
2020-08-30: 465.26008790654976
2020-08-31: 482.7599071482765

Predicting for Singapore__nan

2020-08-01: 53.623838088091276
2020-08-02: 65.84175037081039
2020-08-03: 72.70755059529631
2020-08-04: 78.00242475269792
2020-08-05: 81.21977481159698
2020-08-06: 94.6558162299232
2020-08-07: 125.02209664898402
2020-08-08: 140.41554201908542
2020-08-09: 150.96614169541476
2020-08-10: 160.21644329355343
2020-08-11: 170.1674433695418
2020-08-12: 187.16325251385064
2020-08-13: 209.70183099891813
2020-08-14: 226.28518535300643
2020-08-15: 239.83963296500514
2020-08-16: 252.73868335403878
2020-08-17: 267.073492975741
2020-08-18: 291.2231344963395
2020-08-19: 312.9469729384257
2020-08-20: 331.39380690071107
2020-08-21: 348.21970226023495
2020-08-22: 364.91980827955933
2020-08-23: 393.6096880754758
2020-08-24: 418.9214445906797
2020-08-25: 442.1788822283113
2020-08-26: 463.8016382361112
2020-08-27: 480.34317514751854
2020-08-28: 502.1062862405895
2020-08-29: 529.1526724127464
2020-08-30: 555.3625866113588
2020-08-31: 580.4486819839906

Predicting for Slovak Republic__nan
2020-08-01: 1277.4855

2020-08-19: 335.39675645672565
2020-08-20: 354.94441188829944
2020-08-21: 372.263295575907
2020-08-22: 388.7378077374816
2020-08-23: 417.0010477498678
2020-08-24: 442.78803917916235
2020-08-25: 467.1603222734929
2020-08-26: 489.6234464616574
2020-08-27: 502.285923359603
2020-08-28: 523.0820593111333
2020-08-29: 549.7360707211493
2020-08-30: 576.032793856407
2020-08-31: 601.4962083968741

Predicting for Thailand__nan
2020-08-01: 56.40564521243708
2020-08-02: 68.69951310482266
2020-08-03: 76.7598414448148
2020-08-04: 81.296445855666
2020-08-05: 82.14406359509556
2020-08-06: 96.10010539835423
2020-08-07: 127.72554333587497
2020-08-08: 143.53259967198744
2020-08-09: 154.50404610978597
2020-08-10: 163.14785405880514
2020-08-11: 172.1495032802506
2020-08-12: 189.44045048673328
2020-08-13: 212.62666572715835
2020-08-14: 229.5525775038915
2020-08-15: 243.2405940286008
2020-08-16: 255.77944238548588
2020-08-17: 269.7434083926661
2020-08-18: 288.3171966860853
2020-08-19: 309.0346268353876
2020-0

2020-08-12: 184.2883762006387
2020-08-13: 206.22291412963398
2020-08-14: 222.40819858066436
2020-08-15: 235.70051728097292
2020-08-16: 248.81867112531415
2020-08-17: 263.55177687529044
2020-08-18: 276.10654477850915
2020-08-19: 294.73445166694853
2020-08-20: 311.6493018245153
2020-08-21: 327.0991279444173
2020-08-22: 342.6749844749313
2020-08-23: 348.9673805882944
2020-08-24: 362.92634018566736
2020-08-25: 380.4488700036808
2020-08-26: 397.5171253090433
2020-08-27: 405.400305605452
2020-08-28: 418.43643458096227
2020-08-29: 429.98673497305407
2020-08-30: 444.918821278731
2020-08-31: 461.8724160409075

Predicting for Uganda__nan
2020-08-01: 302.8051064271074
2020-08-02: 326.3227880400922
2020-08-03: 337.8434426703549
2020-08-04: 338.5842701840295
2020-08-05: 322.032593613171
2020-08-06: 253.80947563260787
2020-08-07: 375.29532778619705
2020-08-08: 412.7840992610344
2020-08-09: 427.56343876182814
2020-08-10: 434.1930290584874
2020-08-11: 424.2267972427295
2020-08-12: 421.79974288252197
2

2020-08-17: 13490.975554017616
2020-08-18: 13496.676440518964
2020-08-19: 14553.884718937743
2020-08-20: 15281.932513982965
2020-08-21: 15487.193290834297
2020-08-22: 15415.584389507361
2020-08-23: 15341.380002273074
2020-08-24: 15608.723123231648
2020-08-25: 16298.527497037121
2020-08-26: 16833.24267616378
2020-08-27: 17081.90014058835
2020-08-28: 17185.27489560339
2020-08-29: 17326.013883026953
2020-08-30: 17680.943207386194
2020-08-31: 18214.31960112521

Predicting for United States__Colorado
2020-08-01: 4816.339187437715
2020-08-02: 5554.52710131141
2020-08-03: 5387.306836182313
2020-08-04: 4489.18622944456
2020-08-05: 3696.5556243794886
2020-08-06: 2916.6424535002166
2020-08-07: 4890.62589477672
2020-08-08: 5506.208155997108
2020-08-09: 5386.832341754152
2020-08-10: 4914.398957076387
2020-08-11: 4436.066298345746
2020-08-12: 4410.360258734925
2020-08-13: 5340.030103247873
2020-08-14: 5746.094088605615
2020-08-15: 5706.802019991284
2020-08-16: 5469.937836363649
2020-08-17: 5278.119

2020-08-01: 1312.7151654609793
2020-08-02: 1530.8542292532698
2020-08-03: 1218.6792520751446
2020-08-04: 1185.0351126840164
2020-08-05: 1212.8252973826893
2020-08-06: 862.8813132694837
2020-08-07: 1382.9471088103214
2020-08-08: 1518.2815384780856
2020-08-09: 1410.5856419366178
2020-08-10: 1398.3186848616429
2020-08-11: 1366.0954925223364
2020-08-12: 1317.3859584936695
2020-08-13: 1554.496633826578
2020-08-14: 1642.2189468887136
2020-08-15: 1620.3027803180935
2020-08-16: 1620.7236940229996
2020-08-17: 1615.0166981189504
2020-08-18: 1653.126511053882
2020-08-19: 1783.8706196436597
2020-08-20: 1850.7125623702304
2020-08-21: 1866.3009069952645
2020-08-22: 1883.740475991237
2020-08-23: 1916.5433561289083
2020-08-24: 1975.4253626527857
2020-08-25: 2065.3225254730764
2020-08-26: 2126.029952639734
2020-08-27: 2157.593913348381
2020-08-28: 2192.794200611563
2020-08-29: 2239.5675056710174
2020-08-30: 2303.755883805517
2020-08-31: 2378.5555982467486

Predicting for United States__Illinois
2020-08

2020-08-06: 4671.700899402452
2020-08-07: 7219.902188527033
2020-08-08: 8131.554594654367
2020-08-09: 7201.777588450921
2020-08-10: 6954.401301054314
2020-08-11: 7558.423436328475
2020-08-12: 6823.412929933199
2020-08-13: 7921.992640165173
2020-08-14: 8374.820324540688
2020-08-15: 8062.97814458015
2020-08-16: 8066.458216809095
2020-08-17: 8253.04463682857
2020-08-18: 8224.597611393718
2020-08-19: 8789.749573289451
2020-08-20: 9068.744100068881
2020-08-21: 9037.806874700549
2020-08-22: 9124.190346353833
2020-08-23: 9281.463944054381
2020-08-24: 9448.147199665156
2020-08-25: 9809.190371921342
2020-08-26: 10036.488876933117
2020-08-27: 10131.428313339939
2020-08-28: 10272.85953643554
2020-08-29: 10454.012901407112
2020-08-30: 10676.37317088428
2020-08-31: 10962.451812374347

Predicting for United States__Minnesota
2020-08-01: 6186.2217908168805
2020-08-02: 6743.681821524891
2020-08-03: 6673.6802285716585
2020-08-04: 6972.658166745846
2020-08-05: 5689.109965029104
2020-08-06: 3980.45761627

2020-08-15: 4651.257341480736
2020-08-16: 4556.902218543754
2020-08-17: 4356.862592269107
2020-08-18: 4384.352085223666
2020-08-19: 4737.153017306558
2020-08-20: 4991.012941440548
2020-08-21: 5092.746026091134
2020-08-22: 5071.539255188205
2020-08-23: 5037.859420121641
2020-08-24: 5139.227389407225
2020-08-25: 5376.39935705968
2020-08-26: 5568.680703024176
2020-08-27: 5667.971636031294
2020-08-28: 5707.561107646672
2020-08-29: 5756.682729738115
2020-08-30: 5884.701271853238
2020-08-31: 6072.490509088854

Predicting for United States__New Mexico
2020-08-01: 2737.948975465311
2020-08-02: 2828.0603397357145
2020-08-03: 2565.860570583326
2020-08-04: 2570.492415352447
2020-08-05: 2147.332801033023
2020-08-06: 1638.0316929579453
2020-08-07: 2700.106829385356
2020-08-08: 2883.464025314793
2020-08-09: 2807.8151313610556
2020-08-10: 2752.915737214704
2020-08-11: 2514.7255734027226
2020-08-12: 2466.8775116872107
2020-08-13: 2940.933833083181
2020-08-14: 3097.0323340742157
2020-08-15: 3096.511526

2020-08-12: 1532.542018578823
2020-08-13: 1839.9287971049407
2020-08-14: 1990.6150301231532
2020-08-15: 1985.8913681312247
2020-08-16: 1926.6881813564983
2020-08-17: 1875.0770791166092
2020-08-18: 1927.9398232595
2020-08-19: 2101.2874749454763
2020-08-20: 2203.833117475684
2020-08-21: 2223.1448272260573
2020-08-22: 2210.99983071059
2020-08-23: 2215.181691598962
2020-08-24: 2283.752663209939
2020-08-25: 2399.133194965614
2020-08-26: 2479.2351872180957
2020-08-27: 2510.983951316491
2020-08-28: 2530.3631073652637
2020-08-29: 2565.7145511820286
2020-08-30: 2637.275309473208
2020-08-31: 2727.574532144269

Predicting for United States__South Dakota
2020-08-01: 978.5313692485526
2020-08-02: 1221.8534587785316
2020-08-03: 1128.0478021315357
2020-08-04: 996.3389518955786
2020-08-05: 834.5036066477986
2020-08-06: 665.0921177150578
2020-08-07: 1083.7151325091686
2020-08-08: 1243.3851384902691
2020-08-09: 1212.3077121768824
2020-08-10: 1139.1418646040288
2020-08-11: 1047.8125599664577
2020-08-12: 

2020-08-13: 6065.871243607649
2020-08-14: 6491.471958123582
2020-08-15: 6348.725982735381
2020-08-16: 6005.1631402236835
2020-08-17: 5807.505698872053
2020-08-18: 6061.429614424781
2020-08-19: 6670.112424649934
2020-08-20: 6951.755218381215
2020-08-21: 6919.616098144852
2020-08-22: 6789.751681068897
2020-08-23: 6780.459545411834
2020-08-24: 7029.427951714276
2020-08-25: 7406.369655223323
2020-08-26: 7618.351926059033
2020-08-27: 7658.883166640768
2020-08-28: 7661.287815526995
2020-08-29: 7750.751941046445
2020-08-30: 7978.790030273762
2020-08-31: 8252.348578931982

Predicting for United States__West Virginia
2020-08-01: 944.5232004803574
2020-08-02: 1091.0930242391287
2020-08-03: 1081.029251873499
2020-08-04: 956.7846684009315
2020-08-05: 746.9565228976969
2020-08-06: 623.6404622204207
2020-08-07: 1020.4294902183832
2020-08-08: 1150.0056423616884
2020-08-09: 1149.3154746754412
2020-08-10: 1075.9892598683205
2020-08-11: 971.7392356399649
2020-08-12: 982.3772532653348
2020-08-13: 1176.77

2020-08-27: 444.77041853320804
2020-08-28: 461.7595235454223
2020-08-29: 480.9293160491151
2020-08-30: 501.35741751775276
2020-08-31: 522.2219490701939

Predicting for South Africa__nan
2020-08-01: 2352.9530736819106
2020-08-02: 2707.796992781089
2020-08-03: 2894.391731333827
2020-08-04: 2704.097581188429
2020-08-05: 2173.294272101071
2020-08-06: 1583.324792231194
2020-08-07: 2521.7326858524516
2020-08-08: 2865.8160561665004
2020-08-09: 2965.36528633069
2020-08-10: 2838.670051756688
2020-08-11: 2531.2825529187944
2020-08-12: 2422.4374978771875
2020-08-13: 2864.1502655359477
2020-08-14: 3108.8775141146157
2020-08-15: 3181.0914780275943
2020-08-16: 3112.9630249348625
2020-08-17: 2984.6410654556307
2020-08-18: 3023.4418362421425
2020-08-19: 3272.7331342053626
2020-08-20: 3444.987884170018
2020-08-21: 3511.344886400539
2020-08-22: 3497.9891735337455
2020-08-23: 3486.5785641303814
2020-08-24: 3568.971229050586
2020-08-25: 3738.0298581302136
2020-08-26: 3871.4316378265207
2020-08-27: 3943.52

In [22]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,58.830647
214,Aruba,,2020-08-02,71.322849
215,Aruba,,2020-08-03,78.987089
216,Aruba,,2020-08-04,90.239871
217,Aruba,,2020-08-05,87.810792


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [23]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [24]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,58.83064662471841
Aruba,,2020-08-02,71.32284867347803
Aruba,,2020-08-03,78.98708861551941
Aruba,,2020-08-04,90.23987091036582
Afghanistan,,2020-08-01,149.7731977908312
Afghanistan,,2020-08-02,287.29069834944903
Afghanistan,,2020-08-03,277.525581745693
Afghanistan,,2020-08-04,264.8071924260645
Angola,,2020-08-01,198.69836835573278


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [27]:
import os,sys,inspect
current_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parent_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(current_dir))))
sys.path.insert(0, parent_dir)

In [28]:
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [29]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [30]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
Wall time: 3.83 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [31]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-01
End date: 2021-05-30


In [33]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [34]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-01 to 2021-05-30...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
Wall time: 5min 17s
