# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [32]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [33]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [34]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7feb46d18a00>)

In [35]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [36]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [37]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [38]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [39]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [40]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [41]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [42]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [43]:
# Only look at US regions
df = df[(df['CountryName'] == 'United States') & (df['RegionName'].notnull())]

In [44]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
72306,United States,Alaska,United States__Alaska,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
72307,United States,Alaska,United States__Alaska,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
72308,United States,Alaska,United States__Alaska,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
72309,United States,Alaska,United States__Alaska,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
72310,United States,Alaska,United States__Alaska,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90415,United States,Wyoming,United States__Wyoming,2020-07-27,45.0,3.0,1.0,1.0,3.0,2.0,1.0,1.0,3.0,2.0,2.0,2.0,2.0
90416,United States,Wyoming,United States__Wyoming,2020-07-28,69.0,3.0,1.0,1.0,3.0,2.0,1.0,1.0,3.0,2.0,2.0,2.0,2.0
90417,United States,Wyoming,United States__Wyoming,2020-07-29,39.0,3.0,1.0,1.0,3.0,2.0,1.0,1.0,3.0,2.0,2.0,2.0,2.0
90418,United States,Wyoming,United States__Wyoming,2020-07-30,58.0,3.0,1.0,1.0,3.0,2.0,1.0,1.0,3.0,2.0,2.0,2.0,2.0


In [45]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positivea
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [46]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [47]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [48]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [49]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 121.03904646267992
Test MAE: 131.26026845169318


In [50]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.19242687998242594
Day -6 NewCases 0.2138179052370295
Day -5 NewCases 0.05736538230446278
Day -3 NewCases 0.023319698648850497
Day -2 NewCases 0.14122245394695032
Day -1 NewCases 0.402786408062919
Day -25 C2_Workplace closing 16.83112580530535
Day -22 C5_Close public transport 1.8046454711037945
Day -18 C6_Stay at home requirements 0.38183273726443134
Day -5 H3_Contact tracing 7.467119677015083
Intercept 43.16729605914975


In [51]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [32]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [33]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 54.44881708233056
2020-08-02: 74.42053319285834
2020-08-03: 83.76505684808843
2020-08-04: 89.70637523148649
2020-08-05: 95.33469756114599
2020-08-06: 96.9450701122124
2020-08-07: 108.07686916546967
2020-08-08: 133.94341467368184
2020-08-09: 152.96280205953656
2020-08-10: 167.37680374208267
2020-08-11: 179.26167539326275
2020-08-12: 189.15207127751415
2020-08-13: 197.87956592451826
2020-08-14: 211.28988474415002
2020-08-15: 229.71761848618405
2020-08-16: 247.33771868586038
2020-08-17: 263.46509033671435
2020-08-18: 278.21115372025986
2020-08-19: 291.590246481266
2020-08-20: 304.63914555571694
2020-08-21: 319.6781540710706
2020-08-22: 336.78453324137763
2020-08-23: 354.27690773104433
2020-08-24: 371.49804776791416
2020-08-25: 388.15278728249575
2020-08-26: 404.1653747094015
2020-08-27: 420.1156324430059
2020-08-28: 429.7844343752716
2020-08-29: 445.0021961262279
2020-08-30: 461.5993438254635
2020-08-31: 478.8197330618992

Predicting for Afghanistan_

2020-08-25: 2858.2077660941745
2020-08-26: 2882.7245029225674
2020-08-27: 2874.3974911801065
2020-08-28: 2873.494947183805
2020-08-29: 2918.2076923533236
2020-08-30: 2991.1903869820185
2020-08-31: 3075.7903960442127

Predicting for Azerbaijan__nan
2020-08-01: 2483.0186354558
2020-08-02: 3220.3107457551355
2020-08-03: 3640.1973338496123
2020-08-04: 3824.1448058583123
2020-08-05: 3097.421939524536
2020-08-06: 1886.9642826398108
2020-08-07: 1815.1763209915007
2020-08-08: 2562.8238502794734
2020-08-09: 3053.6055759078245
2020-08-10: 3434.0513277599143
2020-08-11: 3595.4522710078136
2020-08-12: 3303.919856297018
2020-08-13: 2845.5043791656594
2020-08-14: 2772.873805843756
2020-08-15: 3015.4468531483276
2020-08-16: 3287.6868516797745
2020-08-17: 3544.341483971542
2020-08-18: 3679.372218627404
2020-08-19: 3605.540971312554
2020-08-20: 3454.218144047411
2020-08-21: 3424.8384394436725
2020-08-22: 3524.502452669178
2020-08-23: 3680.0231984361476
2020-08-24: 3847.1765716031005
2020-08-25: 3960.14

2020-08-01: 671.7792392809029
2020-08-02: 739.2278288298077
2020-08-03: 775.3725492119611
2020-08-04: 764.0819012503576
2020-08-05: 581.2874758991
2020-08-06: 405.73206915740326
2020-08-07: 437.2665383086944
2020-08-08: 650.6381190493136
2020-08-09: 744.6203552276346
2020-08-10: 807.0973900009044
2020-08-11: 816.679172842615
2020-08-12: 749.9515032121715
2020-08-13: 681.4150981263421
2020-08-14: 698.6355295925526
2020-08-15: 780.0812966956229
2020-08-16: 847.109252399113
2020-08-17: 900.7106899495619
2020-08-18: 923.853242895752
2020-08-19: 910.9674719398951
2020-08-20: 894.4998047680737
2020-08-21: 909.7123240656414
2020-08-22: 951.6551021596707
2020-08-23: 998.3006234958127
2020-08-24: 1041.4086105621618
2020-08-25: 1070.1334132748366
2020-08-26: 1082.1080885278734
2020-08-27: 1090.6559296973041
2020-08-28: 1095.8746809994072
2020-08-29: 1121.9844450329642
2020-08-30: 1155.4655185272018
2020-08-31: 1190.0298498096731

Predicting for Belarus__nan
2020-08-01: 1175.0236222659491
2020-08

2020-08-02: 277.4908487812172
2020-08-03: 221.72910215197408
2020-08-04: 354.4628358786764
2020-08-05: 691.751420356524
2020-08-06: 597.2514573903062
2020-08-07: 438.3033238140434
2020-08-08: 505.99503761352275
2020-08-09: 460.47550990068277
2020-08-10: 444.71849432222376
2020-08-11: 532.1424166534575
2020-08-12: 656.3024620085095
2020-08-13: 650.6928631957702
2020-08-14: 625.6819070932112
2020-08-15: 639.4641782052519
2020-08-16: 628.5520309886377
2020-08-17: 634.4809785906174
2020-08-18: 684.8477814850908
2020-08-19: 743.3020009612529
2020-08-20: 763.8787685557198
2020-08-21: 774.6305283094378
2020-08-22: 788.108996687591
2020-08-23: 794.8105525377107
2020-08-24: 810.5407194960923
2020-08-25: 844.4338412656837
2020-08-26: 881.8702845043772
2020-08-27: 908.7550359976009
2020-08-28: 933.1432790799289
2020-08-29: 953.560892268582
2020-08-30: 971.218758558911
2020-08-31: 992.9664431886719

Predicting for Brazil__Espirito Santo
2020-08-01: 985.2218967261937
2020-08-02: 606.5868817495923
2

2020-08-01: 661.1569190985516
2020-08-02: 334.7120316112498
2020-08-03: 267.36121416128134
2020-08-04: 273.17129637459936
2020-08-05: 458.2625478806988
2020-08-06: 506.74753577857996
2020-08-07: 416.20779579243754
2020-08-08: 557.4177909548038
2020-08-09: 490.1516308922385
2020-08-10: 457.73142705899926
2020-08-11: 475.43845294083013
2020-08-12: 548.9227210758652
2020-08-13: 573.8766384518038
2020-08-14: 585.2807375188253
2020-08-15: 630.8849317854821
2020-08-16: 625.2004142207721
2020-08-17: 622.083876856164
2020-08-18: 641.2507142096891
2020-08-19: 677.5045771186712
2020-08-20: 702.5122830124064
2020-08-21: 727.8976061969992
2020-08-22: 756.2202375803098
2020-08-23: 769.4751792069216
2020-08-24: 781.8146330956342
2020-08-25: 802.4962871155819
2020-08-26: 829.1449513457843
2020-08-27: 854.6766537872794
2020-08-28: 883.9284948245603
2020-08-29: 910.9123506214133
2020-08-30: 932.5832009288533
2020-08-31: 953.3352923375508

Predicting for Brazil__Piaui
2020-08-01: 293.5561588364309
2020-

2020-08-01: 189.50979246135455
2020-08-02: 129.3661758052552
2020-08-03: 120.28105226512784
2020-08-04: 140.67824600067465
2020-08-05: 207.76954340209954
2020-08-06: 206.53341324248527
2020-08-07: 187.1517955208683
2020-08-08: 235.4278506492455
2020-08-09: 232.1385770169732
2020-08-10: 235.79860676088194
2020-08-11: 256.4613620192137
2020-08-12: 287.6650831268567
2020-08-13: 297.62626855578935
2020-08-14: 307.733623092167
2020-08-15: 329.93861575293585
2020-08-16: 340.31612963543785
2020-08-17: 352.1072293050919
2020-08-18: 371.03659810679414
2020-08-19: 392.4081928094196
2020-08-20: 408.1842547140483
2020-08-21: 424.7020144604463
2020-08-22: 443.1795219484168
2020-08-23: 458.66112236374363
2020-08-24: 474.77547640823184
2020-08-25: 493.6547755880927
2020-08-26: 513.3788497090752
2020-08-27: 531.9365363025227
2020-08-28: 563.0063914890424
2020-08-29: 586.6571821900701
2020-08-30: 608.0938344091426
2020-08-31: 629.1442808388734

Predicting for Brazil__nan
2020-08-01: 28739.337539716857


2020-08-02: 83.6773106099496
2020-08-03: 97.27905143288024
2020-08-04: 111.0501723330394
2020-08-05: 111.36346895646437
2020-08-06: 106.11382583008
2020-08-07: 116.86695342356778
2020-08-08: 145.42654871871264
2020-08-09: 164.46431190743732
2020-08-10: 181.63161996983922
2020-08-11: 196.27872769827925
2020-08-12: 204.6745893379146
2020-08-13: 210.99534739766054
2020-08-14: 223.937367671688
2020-08-15: 242.84176611903857
2020-08-16: 260.88862446082317
2020-08-17: 278.4802717257284
2020-08-18: 294.4560856026627
2020-08-19: 307.5838006487441
2020-08-20: 319.898533081483
2020-08-21: 334.6851655003443
2020-08-22: 351.88086538091545
2020-08-23: 369.7223995242688
2020-08-24: 387.72794923056756
2020-08-25: 405.0590091481578
2020-08-26: 421.21664404302885
2020-08-27: 437.0623529063123
2020-08-28: 429.91984773381523
2020-08-29: 439.13395037301893
2020-08-30: 451.94688922867067
2020-08-31: 466.6915850903964

Predicting for Cameroon__nan
2020-08-01: 70.19842238063069
2020-08-02: 123.75521006981883

2020-08-22: 388.33035143671907
2020-08-23: 407.9508415701122
2020-08-24: 427.5763578373343
2020-08-25: 445.92500885944514
2020-08-26: 462.1770147102683
2020-08-27: 477.56819031978864
2020-08-28: 470.35157204120696
2020-08-29: 480.17854509035794
2020-08-30: 494.0440736380302
2020-08-31: 509.9200174213089

Predicting for Cyprus__nan
2020-08-01: 203.8929379491388
2020-08-02: 273.9179490287635
2020-08-03: 543.8120202087313
2020-08-04: 546.5020272783139
2020-08-05: 438.6045228987452
2020-08-06: 301.50074913071273
2020-08-07: 284.7512515074684
2020-08-08: 336.8932757129604
2020-08-09: 412.30910818101154
2020-08-10: 527.9649670196519
2020-08-11: 562.6519608856837
2020-08-12: 534.9829213379908
2020-08-13: 489.55777377917957
2020-08-14: 479.4056360079101
2020-08-15: 500.54385791507445
2020-08-16: 548.2213024116586
2020-08-17: 609.3125342119116
2020-08-18: 643.9865840294009
2020-08-19: 649.6182546754764
2020-08-20: 643.649842322961
2020-08-21: 646.3351003839329
2020-08-22: 663.24116305437
2020-0

2020-08-17: 264.88180885100854
2020-08-18: 279.18979564899706
2020-08-19: 292.29560507859816
2020-08-20: 305.1626086152006
2020-08-21: 319.796655142756
2020-08-22: 336.7365827339057
2020-08-23: 354.6836876479816
2020-08-24: 372.29451864387556
2020-08-25: 388.9285385671411
2020-08-26: 404.87198287217973
2020-08-27: 420.70203056536855
2020-08-28: 456.5881552693155
2020-08-29: 481.3452185999056
2020-08-30: 504.50124836918604
2020-08-31: 526.532497960861

Predicting for Spain__nan
2020-08-01: 5455.465630398522
2020-08-02: 6249.438025297397
2020-08-03: 2918.149060708151
2020-08-04: 5693.348680006647
2020-08-05: 9882.804969363451
2020-08-06: 4637.926560907664
2020-08-07: 4132.072590032519
2020-08-08: 5588.5298489750485
2020-08-09: 5656.768693791163
2020-08-10: 5010.712399127009
2020-08-11: 6453.760896022803
2020-08-12: 7434.634446646003
2020-08-13: 6126.255624509553
2020-08-14: 5887.379221640987
2020-08-15: 6243.544109049337
2020-08-16: 6232.451085194183
2020-08-17: 6243.292735562489
2020-08

2020-08-31: 11803.153744083267

Predicting for United Kingdom__Northern Ireland
2020-08-01: 326.2448895635625
2020-08-02: 406.2999620824357
2020-08-03: 409.0447796584024
2020-08-04: 367.77051661195753
2020-08-05: 255.9739254966994
2020-08-06: 213.89143676993666
2020-08-07: 240.10502996384253
2020-08-08: 358.0780633890831
2020-08-09: 424.2355905673779
2020-08-10: 454.8095619926419
2020-08-11: 447.30561346626615
2020-08-12: 411.8906617026493
2020-08-13: 392.8776632911565
2020-08-14: 411.52547742655713
2020-08-15: 463.9701346392712
2020-08-16: 509.63770083423515
2020-08-17: 540.8807368071349
2020-08-18: 551.9836022397028
2020-08-19: 548.8138098650031
2020-08-20: 549.067932898906
2020-08-21: 565.5349602251173
2020-08-22: 596.695597282909
2020-08-23: 629.9453709649093
2020-08-24: 658.338115718228
2020-08-25: 677.5036424708507
2020-08-26: 689.3603507060425
2020-08-27: 701.2723777971015
2020-08-28: 714.8953995315238
2020-08-29: 738.2700709372562
2020-08-30: 764.9612787301679
2020-08-31: 790.8

2020-08-01: 57.25254269862994
2020-08-02: 77.39873513503798
2020-08-03: 80.0670098141818
2020-08-04: 95.90041442472632
2020-08-05: 114.045045094691
2020-08-06: 104.36329060369327
2020-08-07: 113.47056192936569
2020-08-08: 139.27053273617688
2020-08-09: 156.88408977531444
2020-08-10: 169.77567904964843
2020-08-11: 186.233603843416
2020-08-12: 199.70017151731997
2020-08-13: 206.08076308415008
2020-08-14: 218.52277231426
2020-08-15: 236.4202291682974
2020-08-16: 253.04348420552037
2020-08-17: 268.9008810038498
2020-08-18: 285.4782019418795
2020-08-19: 300.19056925848895
2020-08-20: 312.94568097310446
2020-08-21: 327.6962901678932
2020-08-22: 344.4546090171565
2020-08-23: 361.4533026476423
2020-08-24: 378.6565139973054
2020-08-25: 396.0472917822673
2020-08-26: 412.6731991300315
2020-08-27: 428.76325593346394
2020-08-28: 438.43161008400614
2020-08-29: 453.52370127216614
2020-08-30: 469.94070212414607
2020-08-31: 487.2067426954367

Predicting for Guyana__nan
2020-08-01: 78.77455101156313
202

2020-08-10: 1084.1074336125748
2020-08-11: 1166.3215382229082
2020-08-12: 1132.4788667926173
2020-08-13: 994.163931531342
2020-08-14: 984.5934428224174
2020-08-15: 1070.2658838997918
2020-08-16: 1146.5465404856295
2020-08-17: 1216.8075227384984
2020-08-18: 1275.5117194317786
2020-08-19: 1276.7943099190747
2020-08-20: 1243.2238778295837
2020-08-21: 1248.0934956451006
2020-08-22: 1289.3531648722533
2020-08-23: 1340.1778891713611
2020-08-24: 1395.0968521656218
2020-08-25: 1442.164484429854
2020-08-26: 1463.980359122464
2020-08-27: 1471.5080884750173
2020-08-28: 1499.2070495241655
2020-08-29: 1535.0241008014382
2020-08-30: 1578.2027360128486
2020-08-31: 1625.381140712837

Predicting for Iceland__nan
2020-08-01: 51.314446844146765
2020-08-02: 71.65161996315142
2020-08-03: 82.41344543980532
2020-08-04: 88.75997150527702
2020-08-05: 94.59248125047871
2020-08-06: 96.4032536869629
2020-08-07: 107.16749284350422
2020-08-08: 131.98838662292098
2020-08-09: 150.96534491933306
2020-08-10: 165.769701

2020-08-23: 345.5128373552448
2020-08-24: 362.45527479049895
2020-08-25: 379.03569816642175
2020-08-26: 395.1452555449358
2020-08-27: 411.2123417410517
2020-08-28: 423.30034360910287
2020-08-29: 439.1760993533935
2020-08-30: 463.23735695901786
2020-08-31: 483.26114270072344

Predicting for South Korea__nan
2020-08-01: 500.8884931280671
2020-08-02: 745.8608375377138
2020-08-03: 830.1101099745134
2020-08-04: 837.7115228895208
2020-08-05: 782.6921561539665
2020-08-06: 489.9389521135112
2020-08-07: 466.4364762592623
2020-08-08: 628.3572947267378
2020-08-09: 762.7630197684592
2020-08-10: 846.179345785753
2020-08-11: 891.2736339786981
2020-08-12: 861.4155921545894
2020-08-13: 765.1178287481705
2020-08-14: 752.0656694624945
2020-08-15: 812.6753471611979
2020-08-16: 885.5824656987355
2020-08-17: 949.4298953243197
2020-08-18: 992.6843250226347
2020-08-19: 995.0202777701498
2020-08-20: 972.7967193958508
2020-08-21: 975.5157759451716
2020-08-22: 1007.975194806885
2020-08-23: 1053.7528449876695
20

2020-08-07: 347.69077380571366
2020-08-08: 505.85977143052224
2020-08-09: 590.7377549953522
2020-08-10: 651.3295286258611
2020-08-11: 656.1512601588735
2020-08-12: 603.8146797532878
2020-08-13: 555.7681073666348
2020-08-14: 569.4773091503394
2020-08-15: 632.8722613656697
2020-08-16: 691.1801598692546
2020-08-17: 738.7921113832564
2020-08-18: 758.7066707501644
2020-08-19: 751.0854045495686
2020-08-20: 741.6422661265462
2020-08-21: 755.3851073803477
2020-08-22: 790.3284665529985
2020-08-23: 830.9362236501869
2020-08-24: 868.7669869816986
2020-08-25: 894.3651739406927
2020-08-26: 906.9869579781146
2020-08-27: 916.9864311255712
2020-08-28: 937.5681212028229
2020-08-29: 966.0272946614081
2020-08-30: 999.278084186264
2020-08-31: 1032.5548228603352

Predicting for Macao__nan
2020-08-01: 45.62544718711239
2020-08-02: 62.21591366720995
2020-08-03: 73.26724228424465
2020-08-04: 81.29930156092614
2020-08-05: 87.92846421795497
2020-08-06: 92.36970477095713
2020-08-07: 103.29856815834697
2020-08-08


Predicting for Mauritius__nan
2020-08-01: 46.009205522413055
2020-08-02: 62.65302235706191
2020-08-03: 73.46839727754043
2020-08-04: 82.80455615716977
2020-08-05: 91.1398722145319
2020-08-06: 93.7289279502277
2020-08-07: 104.29125498794534
2020-08-08: 126.99648987118586
2020-08-09: 144.1180734431219
2020-08-10: 158.56210662838703
2020-08-11: 171.64457148825335
2020-08-12: 182.81422081593593
2020-08-13: 192.2911441015131
2020-08-14: 205.4480527705445
2020-08-15: 222.61795502415623
2020-08-16: 239.2230891617649
2020-08-17: 255.07297538874474
2020-08-18: 270.145323759293
2020-08-19: 284.06182239523923
2020-08-20: 297.4918953356694
2020-08-21: 312.43666552847543
2020-08-22: 329.0055159302409
2020-08-23: 345.952259897324
2020-08-24: 362.8908098035663
2020-08-25: 379.5515372378755
2020-08-26: 395.7292244411983
2020-08-27: 411.814437774265
2020-08-28: 404.6339064515573
2020-08-29: 413.47956147015793
2020-08-30: 425.69288119199905
2020-08-31: 439.7240028142958

Predicting for Malawi__nan
2020

2020-08-31: 782.1701873777932

Predicting for Nepal__nan
2020-08-01: 690.6378793582666
2020-08-02: 820.9301708899383
2020-08-03: 854.7129461780115
2020-08-04: 890.4395702767339
2020-08-05: 796.0060699558153
2020-08-06: 504.55256004114597
2020-08-07: 509.0499912535465
2020-08-08: 722.7017776187737
2020-08-09: 831.6125496704643
2020-08-10: 901.1972600102887
2020-08-11: 944.4168230756633
2020-08-12: 900.4361223164258
2020-08-13: 802.388163336912
2020-08-14: 804.4957481528945
2020-08-15: 882.1819472939143
2020-08-16: 951.6884627572782
2020-08-17: 1012.0129575460954
2020-08-18: 1051.9737166188388
2020-08-19: 1048.2961051293955
2020-08-20: 1025.2456300098531
2020-08-21: 1034.5681015143787
2020-08-22: 1073.910403467738
2020-08-23: 1120.9444221479862
2020-08-24: 1168.4417659694768
2020-08-25: 1205.5133915274332
2020-08-26: 1222.8045436993243
2020-08-27: 1231.2783419869238
2020-08-28: 1268.8232810161417
2020-08-29: 1306.4639317829797
2020-08-30: 1348.7637375516852
2020-08-31: 1392.2060882932856

2020-08-29: 9364.456380771795
2020-08-30: 9631.424822007015
2020-08-31: 9893.815542670449

Predicting for Puerto Rico__nan
2020-08-01: 523.7150587286287
2020-08-02: 882.8352577662689
2020-08-03: 780.0374361925097
2020-08-04: 776.7582625708473
2020-08-05: 656.2172095559965
2020-08-06: 439.93795690181054
2020-08-07: 433.45327611771324
2020-08-08: 633.349340834491
2020-08-09: 789.93110788949
2020-08-10: 826.4449850316795
2020-08-11: 844.6483892143888
2020-08-12: 794.0343042282134
2020-08-13: 713.6292118281359
2020-08-14: 713.0658751610771
2020-08-15: 794.1282712848152
2020-08-16: 875.0559348503797
2020-08-17: 925.3868921247735
2020-08-18: 953.7491482678734
2020-08-19: 946.2785287537343
2020-08-20: 925.8112210352623
2020-08-21: 934.4337203598366
2020-08-22: 976.2419810760158
2020-08-23: 1026.0940224817004
2020-08-24: 1069.7136607704379
2020-08-25: 1101.3507971234167
2020-08-26: 1115.6712632292329
2020-08-27: 1123.1080845273013
2020-08-28: 1150.10550778292
2020-08-29: 1184.7366752180355
202

2020-08-28: 684.6602291073283
2020-08-29: 707.8959611654963
2020-08-30: 733.2603347561907
2020-08-31: 758.7801577749239

Predicting for Senegal__nan
2020-08-01: 104.14078238239989
2020-08-02: 156.19363547672197
2020-08-03: 171.00662270768794
2020-08-04: 146.66541004203793
2020-08-05: 152.267125712236
2020-08-06: 131.13889027997345
2020-08-07: 141.23278270459826
2020-08-08: 184.76089482241122
2020-08-09: 220.38320811246734
2020-08-10: 238.49408696273133
2020-08-11: 244.48640910890708
2020-08-12: 250.20394650397708
2020-08-13: 249.97457951023034
2020-08-14: 262.20050587185773
2020-08-15: 287.38194285790854
2020-08-16: 312.7836052852299
2020-08-17: 332.2921518675863
2020-08-18: 346.77122777540427
2020-08-19: 358.1610143311262
2020-08-20: 367.78816892828706
2020-08-21: 382.24404853138475
2020-08-22: 402.1855878916741
2020-08-23: 423.51931021425617
2020-08-24: 443.35000053257096
2020-08-25: 461.06390679920537
2020-08-26: 476.8061084337611
2020-08-27: 491.82814187568636
2020-08-28: 484.78763

2020-08-12: 184.5625272853569
2020-08-13: 193.93350772152587
2020-08-14: 206.9090348018857
2020-08-15: 224.37337916847696
2020-08-16: 242.06387477520235
2020-08-17: 258.51412934793495
2020-08-18: 273.09849361575414
2020-08-19: 286.5657632723075
2020-08-20: 299.7824974070607
2020-08-21: 314.5903627868644
2020-08-22: 331.32747861279734
2020-08-23: 348.74954059318543
2020-08-24: 365.97725512026886
2020-08-25: 382.5627046220512
2020-08-26: 398.59109305358237
2020-08-27: 414.5524048794807
2020-08-28: 433.7171608055032
2020-08-29: 452.25494526972045
2020-08-30: 471.08318404948784
2020-08-31: 489.9262856383094

Predicting for Suriname__nan
2020-08-01: 48.04156175280521
2020-08-02: 64.14001880828155
2020-08-03: 76.7776264614526
2020-08-04: 88.16339758741624
2020-08-05: 92.94772748011442
2020-08-06: 95.18981398075279
2020-08-07: 105.74840506244485
2020-08-08: 128.88913463998063
2020-08-09: 146.15329944111272
2020-08-10: 161.62762101384055
2020-08-11: 175.18283414980914
2020-08-12: 185.444481791

2020-08-05: 104.08827128785653
2020-08-06: 102.11963945310634
2020-08-07: 112.9789288736649
2020-08-08: 139.9651302463026
2020-08-09: 158.88361715952124
2020-08-10: 175.5797178448316
2020-08-11: 189.17636057284437
2020-08-12: 197.9031242993968
2020-08-13: 205.28881415938866
2020-08-14: 218.35255796064703
2020-08-15: 236.8621271654245
2020-08-16: 254.7228974377535
2020-08-17: 271.91604496422974
2020-08-18: 287.4262794810189
2020-08-19: 300.60685956365
2020-08-20: 313.2313375509555
2020-08-21: 328.0715230143149
2020-08-22: 345.14556982582315
2020-08-23: 362.836924277265
2020-08-24: 380.5792523906784
2020-08-25: 397.65155717148923
2020-08-26: 413.7445198797085
2020-08-27: 429.6301056264541
2020-08-28: 441.73271646995227
2020-08-29: 457.87117217271054
2020-08-30: 475.2245070229611
2020-08-31: 493.17658222808853

Predicting for Thailand__nan
2020-08-01: 52.731289196523996
2020-08-02: 71.94650792780425
2020-08-03: 87.90602799476572
2020-08-04: 98.69325444352567
2020-08-05: 99.52713833570208


2020-08-27: 413.60251913383036
2020-08-28: 406.412523916103
2020-08-29: 415.31347374778954
2020-08-30: 427.628308022229
2020-08-31: 441.7405983581165

Predicting for Tanzania__nan
2020-08-01: 45.62544718711239
2020-08-02: 62.21591366720995
2020-08-03: 73.26724228424465
2020-08-04: 81.29930156092614
2020-08-05: 87.92846421795497
2020-08-06: 92.36970477095713
2020-08-07: 103.29856815834697
2020-08-08: 126.07631818273649
2020-08-09: 143.31286956265848
2020-08-10: 157.7474292635796
2020-08-11: 170.14003915117937
2020-08-12: 180.82625668177508
2020-08-13: 190.72141164853258
2020-08-14: 204.07558804871033
2020-08-15: 221.35762920502773
2020-08-16: 238.06316296218785
2020-08-17: 253.86606418981788
2020-08-18: 268.63778498091904
2020-08-19: 282.3626608919487
2020-08-20: 295.8614172581714
2020-08-21: 310.87733524872186
2020-08-22: 327.5129148951869
2020-08-23: 344.5126098067958
2020-08-24: 361.4170942278159
2020-08-25: 377.94644046431756
2020-08-26: 394.02691613003014
2020-08-27: 410.0971897242


Predicting for United States__Connecticut
2020-08-01: 1830.2013407292661
2020-08-02: 2212.4053957224983
2020-08-03: 1051.4233871457154
2020-08-04: 1993.4494592967671
2020-08-05: 3416.4726972073345
2020-08-06: 1639.9352799294056
2020-08-07: 1464.8985902690088
2020-08-08: 1970.7293398921893
2020-08-09: 2029.471282175678
2020-08-10: 1814.5106614041042
2020-08-11: 2310.4218925945806
2020-08-12: 2649.148008825439
2020-08-13: 2209.9981964675094
2020-08-14: 2132.678341380065
2020-08-15: 2265.798018134174
2020-08-16: 2279.677641818094
2020-08-17: 2293.935096785252
2020-08-18: 2523.5634425661997
2020-08-19: 2649.652579189837
2020-08-20: 2565.67021504814
2020-08-21: 2558.0774457615876
2020-08-22: 2606.4129707342067
2020-08-23: 2630.771332358129
2020-08-24: 2685.3010978161465
2020-08-25: 2806.1900458837604
2020-08-26: 2885.8609649518608
2020-08-27: 2902.9897050928544
2020-08-28: 2934.0653089793886
2020-08-29: 2975.4750904218204
2020-08-30: 3014.9811555173924
2020-08-31: 3074.433078176876

Predic

2020-08-03: 1138.3376292816665
2020-08-04: 1571.6708871673054
2020-08-05: 2392.071518209231
2020-08-06: 1235.5159041553916
2020-08-07: 1094.9517848019732
2020-08-08: 1596.252844740135
2020-08-09: 1900.1859003172103
2020-08-10: 1649.8405526210145
2020-08-11: 1892.3001982787619
2020-08-12: 2056.0424343498034
2020-08-13: 1740.7056536219538
2020-08-14: 1686.1444380663925
2020-08-15: 1863.7162918368804
2020-08-16: 1969.5457891282063
2020-08-17: 1973.0304882614223
2020-08-18: 2099.3806406580484
2020-08-19: 2156.9137397127597
2020-08-20: 2083.7737145569054
2020-08-21: 2082.23129703053
2020-08-22: 2157.009606430904
2020-08-23: 2217.7405238569704
2020-08-24: 2267.958319716586
2020-08-25: 2347.4191767074103
2020-08-26: 2393.792660929381
2020-08-27: 2399.9684287510504
2020-08-28: 2418.1046389037488
2020-08-29: 2465.291270441008
2020-08-30: 2514.8261597132946
2020-08-31: 2569.055246002161

Predicting for United States__Kentucky
2020-08-01: 2254.3439557973416
2020-08-02: 2731.89648601619
2020-08-03

2020-08-26: 3488.5376678851726
2020-08-27: 3470.3151701265633
2020-08-28: 3465.5535168383362
2020-08-29: 3513.8032023915607
2020-08-30: 3608.438449672117
2020-08-31: 3718.8190454964865

Predicting for United States__Mississippi
2020-08-01: 1321.5833899848335
2020-08-02: 1800.6266678731781
2020-08-03: 1970.1536409720584
2020-08-04: 1793.5510111515518
2020-08-05: 1587.0243986288096
2020-08-06: 981.4723542086571
2020-08-07: 961.9941547420539
2020-08-08: 1384.221149219115
2020-08-09: 1683.9086010653668
2020-08-10: 1840.9588073870564
2020-08-11: 1857.7619232961583
2020-08-12: 1740.5646376030563
2020-08-13: 1517.8419282559944
2020-08-14: 1496.3396361203472
2020-08-15: 1647.759699343169
2020-08-16: 1809.8692211077569
2020-08-17: 1930.1969924128566
2020-08-18: 1983.7154638669754
2020-08-19: 1953.6136865666822
2020-08-20: 1884.1149698354566
2020-08-21: 1881.4170321521165
2020-08-22: 1950.3487811760829
2020-08-23: 2044.0252011751186
2020-08-24: 2131.298863292589
2020-08-25: 2188.8438107786706
20

2020-08-01: 6361.02250762834
2020-08-02: 7947.644454048668
2020-08-03: 8883.309834772639
2020-08-04: 8818.640494711724
2020-08-05: 7413.236007884818
2020-08-06: 4433.019678889381
2020-08-07: 4312.297704060462
2020-08-08: 6214.259297090315
2020-08-09: 7374.042383400983
2020-08-10: 8182.168761701602
2020-08-11: 8429.363119326075
2020-08-12: 7767.856196789364
2020-08-13: 6644.740542055995
2020-08-14: 6490.8368749768415
2020-08-15: 7106.504034465466
2020-08-16: 7750.814340361554
2020-08-17: 8309.178089098188
2020-08-18: 8574.517044927432
2020-08-19: 8378.96419123296
2020-08-20: 7995.569103904232
2020-08-21: 7923.093076017542
2020-08-22: 8165.648151291445
2020-08-23: 8525.463666744443
2020-08-24: 8892.473360568776
2020-08-25: 9127.520439110971
2020-08-26: 9150.476299286753
2020-08-27: 9078.370895533599
2020-08-28: 9099.87189727279
2020-08-29: 9250.057513537904
2020-08-30: 9484.31911496367
2020-08-31: 9744.079760495943

Predicting for United States__Oklahoma
2020-08-01: 1774.7441463963473
20

2020-08-02: 89.73566659217789
2020-08-03: 85.70196963676723
2020-08-04: 92.63799662587175
2020-08-05: 100.77376372398166
2020-08-06: 99.9078789125734
2020-08-07: 112.22683835569559
2020-08-08: 143.0796969169696
2020-08-09: 162.4067530017812
2020-08-10: 173.25663349821588
2020-08-11: 184.9148404625875
2020-08-12: 194.9253161964882
2020-08-13: 202.90781171668945
2020-08-14: 217.1549941391524
2020-08-15: 237.47965652475733
2020-08-16: 255.34215017410602
2020-08-17: 270.57873718715393
2020-08-18: 285.0963184173035
2020-08-19: 298.2884845932538
2020-08-20: 311.1106724490219
2020-08-21: 326.59908018368793
2020-08-22: 344.485934689598
2020-08-23: 362.2206338139862
2020-08-24: 379.29961048857433
2020-08-25: 395.8771092531217
2020-08-26: 411.78701946427555
2020-08-27: 427.6964817927211
2020-08-28: 437.60982226860824
2020-08-29: 453.19989109111714
2020-08-30: 470.00823121863846
2020-08-31: 487.29707328123175

Predicting for United States__Vermont
2020-08-01: 118.89519046859036
2020-08-02: 148.94

2020-08-29: 430.31300610551136
2020-08-30: 443.05597980043297
2020-08-31: 457.3901305947075

Predicting for Vietnam__nan
2020-08-01: 48.26725000199204
2020-08-02: 66.00034339735062
2020-08-03: 76.57869033390268
2020-08-04: 84.55160876337096
2020-08-05: 91.38431255009073
2020-08-06: 94.28368817719638
2020-08-07: 105.10481544591408
2020-08-08: 128.7371998498786
2020-08-09: 146.4978439747198
2020-08-10: 160.99025384393997
2020-08-11: 173.51401194873281
2020-08-12: 184.114331504608
2020-08-13: 193.49415979203428
2020-08-14: 206.77072984235815
2020-08-15: 224.33656018730508
2020-08-16: 241.28947851138463
2020-08-17: 257.23189674895264
2020-08-18: 272.1203572439672
2020-08-19: 285.8094952292378
2020-08-20: 299.14283170416843
2020-08-21: 314.12861409674804
2020-08-22: 330.8723660940963
2020-08-23: 348.0032385990679
2020-08-24: 365.02510987096923
2020-08-25: 381.6475754120592
2020-08-26: 397.74525735393047
2020-08-27: 413.7834965243021
2020-08-28: 449.8209498873679
2020-08-29: 474.459654979469

In [35]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,54.448817
214,Aruba,,2020-08-02,74.420533
215,Aruba,,2020-08-03,83.765057
216,Aruba,,2020-08-04,89.706375
217,Aruba,,2020-08-05,95.334698


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [36]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [None]:
!head predictions/2020-08-01_2020-08-04.csv

# Test cases
We can generate a prediction file. Let's validate a few cases...

In [None]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [None]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [None]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [None]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

In [None]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

### Check it

In [None]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")