# Example Predictor: SVR Predictor

This example contains basic functionality for training and evaluating a SVR predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
import time

The history saving thread hit an unexpected error (DatabaseError('database disk image is malformed')).History will not be written to the database.


### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f65b6ac8280>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'M1_Wildcard', 'ConfirmedCases',
       'ConfirmedDeaths', 'StringencyIndex', 'StringencyIndexForDisplay',
       'StringencyLegacyIndex', 'StringencyLegacyIndexForDisplay',
       'GovernmentResponseIndex', 'Gove

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87064,Zimbabwe,,Zimbabwe__nan,2020-07-27,78.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87065,Zimbabwe,,Zimbabwe__nan,2020-07-28,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87066,Zimbabwe,,Zimbabwe__nan,2020-07-29,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87067,Zimbabwe,,Zimbabwe__nan,2020-07-30,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 2

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [15]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.5,
                                                    random_state=301)
print(X_train.shape)

(27825, 26)


In [16]:
# Create and train SVR model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
start_time = time.time()  
  


model = SVR(kernel='poly')
# Fit model
model.fit(X_train, y_train)

interval = time.time() - start_time
print ('Total time in seconds:', interval)

Total time in seconds: 151.36163139343262


In [17]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 382.24827551287336
Test MAE: 378.11705666028627


In [195]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.0013246240389721805
Day -6 NewCases 0.43932558850808323
Day -5 NewCases 0.21733329048235964
Day -4 NewCases 0.05883044887710127
Day -3 NewCases 0.06951319628060729
Day -2 NewCases 0.052031127397790256
Day -1 NewCases 0.23822431735053784
Day -26 C6_Stay at home requirements 4.314491844181091
Day -22 C2_Workplace closing 9.715799339190195
Day -17 C2_Workplace closing 5.77128206222885
Intercept 26.55971959613379


In [196]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [197]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [198]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 58.844691538124245
2020-08-02: 71.32339797888373
2020-08-03: 78.99821175219239
2020-08-04: 90.24702060691985
2020-08-05: 87.81468190769263
2020-08-06: 98.77662008875717
2020-08-07: 131.04966110524353
2020-08-08: 147.02909378415967
2020-08-09: 158.9669020318663
2020-08-10: 169.95393176817132
2020-08-11: 177.53057188557548
2020-08-12: 193.50373596073152
2020-08-13: 216.8349592365735
2020-08-14: 234.05458053144855
2020-08-15: 248.58034443852705
2020-08-16: 261.9875797943422
2020-08-17: 275.32154428012313
2020-08-18: 293.3164901988505
2020-08-19: 314.09720089504225
2020-08-20: 332.4524664281599
2020-08-21: 349.2598980364087
2020-08-22: 365.52936360035164
2020-08-23: 382.5647394069261
2020-08-24: 407.9286106098403
2020-08-25: 430.1195473706485
2020-08-26: 450.6112812074667
2020-08-27: 466.04250578239487
2020-08-28: 484.6917479475174
2020-08-29: 515.4011439840982
2020-08-30: 541.9538140452253
2020-08-31: 566.3175244293836
(31,)

Predicting for Afghanist

2020-08-01: 2144.332695101401
2020-08-02: 2777.197014257926
2020-08-03: 3001.935197889325
2020-08-04: 2959.720273647507
2020-08-05: 2638.85852292054
2020-08-06: 1675.639874058618
2020-08-07: 2515.1391339682514
2020-08-08: 2969.940284086295
2020-08-09: 3126.5290564056013
2020-08-10: 3101.2034748194906
2020-08-11: 2833.9624209230765
2020-08-12: 2565.4820421577147
2020-08-13: 2961.4513938292084
2020-08-14: 3256.694647335901
2020-08-15: 3377.1384643214483
2020-08-16: 3363.9240914269913
2020-08-17: 3235.0721866850313
2020-08-18: 3208.726290666877
2020-08-19: 3439.682958890746
2020-08-20: 3640.0062092948474
2020-08-21: 3740.766425811254
2020-08-22: 3756.0479216754384
2020-08-23: 3749.5237225945566
2020-08-24: 3806.0033506906666
2020-08-25: 3971.07593144318
2020-08-26: 4123.996042396558
2020-08-27: 4218.95278380375
2020-08-28: 4269.756934972487
2020-08-29: 4316.5705832789645
2020-08-30: 4406.643306209051
2020-08-31: 4545.484546546078
(31,)

Predicting for Burundi__nan
2020-08-01: 56.720283064

2020-08-17: 1805.3405240528039
2020-08-18: 1794.6481259781447
2020-08-19: 1922.5576722084572
2020-08-20: 2027.8611327454012
2020-08-21: 2086.480804507241
2020-08-22: 2103.064322605483
2020-08-23: 2084.465891235773
2020-08-24: 2114.18393196926
2020-08-25: 2203.9629103789403
2020-08-26: 2285.751431772877
2020-08-27: 2331.7461660028644
2020-08-28: 2358.2690831540385
2020-08-29: 2376.2576837027927
2020-08-30: 2422.4751400840555
2020-08-31: 2496.227749088482
(31,)

Predicting for Belize__nan
2020-08-01: 90.24670752412706
2020-08-02: 115.09458860912723
2020-08-03: 115.59422268420562
2020-08-04: 130.1142254691766
2020-08-05: 133.13879965417544
2020-08-06: 123.67937584586683
2020-08-07: 167.5732706686711
2020-08-08: 189.74658547057885
2020-08-09: 200.24113547699562
2020-08-10: 213.42660908892782
2020-08-11: 220.5295470030658
2020-08-12: 230.32951445957664
2020-08-13: 258.6578228489191
2020-08-14: 279.26593594832696
2020-08-15: 294.2538775634252
2020-08-16: 308.79268298359665
2020-08-17: 321.40

2020-08-01: 52.43243951479994
2020-08-02: 67.11107907783762
2020-08-03: 71.15796934836747
2020-08-04: 74.76925341890174
2020-08-05: 79.93979797389343
2020-08-06: 93.89962444224932
2020-08-07: 124.2084054878473
2020-08-08: 140.12459342605703
2020-08-09: 149.34999247323088
2020-08-10: 158.01958873847354
2020-08-11: 168.76464073775372
2020-08-12: 186.0799708599353
2020-08-13: 208.70239852942325
2020-08-14: 225.28702891751902
2020-08-15: 238.20897047437683
2020-08-16: 250.897687269757
2020-08-17: 265.5717875156541
2020-08-18: 284.1395573739485
2020-08-19: 304.5345384813852
2020-08-20: 322.2501342423749
2020-08-21: 338.1421712966197
2020-08-22: 354.14063913468317
2020-08-23: 371.7089674232608
2020-08-24: 391.4827539364624
2020-08-25: 411.9893120773748
2020-08-26: 431.32955191499707
2020-08-27: 445.5796912783702
2020-08-28: 463.4372624886415
2020-08-29: 483.02054313247726
2020-08-30: 503.91043366777944
2020-08-31: 525.2240592171854
(31,)

Predicting for Canada__nan
2020-08-01: 3992.723776106

2020-08-16: 8403.039349827659
2020-08-17: 8067.457024749086
2020-08-18: 8025.8694388755885
2020-08-19: 8630.641021762716
2020-08-20: 9086.552259317094
2020-08-21: 9274.101001248462
2020-08-22: 9272.666120133108
2020-08-23: 9212.2059980136
2020-08-24: 9350.308830521859
2020-08-25: 9755.717541779833
2020-08-26: 10097.049363235892
2020-08-27: 10288.186873849114
2020-08-28: 10374.264317567733
2020-08-29: 10455.743363970292
2020-08-30: 10661.700474625331
2020-08-31: 10984.704260040297
(31,)

Predicting for Comoros__nan
2020-08-01: 51.18706624998747
2020-08-02: 63.54443598451152
2020-08-03: 69.34759312089086
2020-08-04: 75.8178352984691
2020-08-05: 79.77950362676793
2020-08-06: 93.30966400646543
2020-08-07: 122.7036326801595
2020-08-08: 137.82394736496258
2020-08-09: 148.10101007372515
2020-08-10: 157.88646983143383
2020-08-11: 168.2222404193158
2020-08-12: 185.13522851724812
2020-08-13: 207.20451086000236
2020-08-14: 223.5513328183471
2020-08-15: 237.03927836889264
2020-08-16: 250.191006745

2020-08-08: 138.83932383255802
2020-08-09: 148.36879896050596
2020-08-10: 158.32292489608108
2020-08-11: 169.57691075745203
2020-08-12: 186.05975687516133
2020-08-13: 208.13556498032807
2020-08-14: 224.44648488777648
2020-08-15: 237.65877965809588
2020-08-16: 250.9907893146705
2020-08-17: 265.82642330301167
2020-08-18: 278.2150884791778
2020-08-19: 296.9075272379077
2020-08-20: 313.88723549760164
2020-08-21: 329.3623608294914
2020-08-22: 345.0594587216441
2020-08-23: 351.396079286382
2020-08-24: 365.32180185764884
2020-08-25: 382.88924598873797
2020-08-26: 400.01039617042676
2020-08-27: 407.94756064795445
2020-08-28: 421.06343877568355
2020-08-29: 432.6557414229309
2020-08-30: 447.6030366984314
2020-08-31: 464.6029500314374
(31,)

Predicting for Denmark__nan
2020-08-01: 1093.0719999155933
2020-08-02: 1289.7077807881585
2020-08-03: 1295.4107712912044
2020-08-04: 1248.021833190158
2020-08-05: 944.4572399637909
2020-08-06: 745.2135447227547
2020-08-07: 1200.8194668445312
2020-08-08: 1364.

2020-08-11: 593.0403819077885
2020-08-12: 585.7343119785783
2020-08-13: 681.6970420089704
2020-08-14: 732.1322733137142
2020-08-15: 751.8576360993136
2020-08-16: 757.2969614721978
2020-08-17: 749.8748590557634
2020-08-18: 776.09329365228
2020-08-19: 836.6019986441504
2020-08-20: 878.7564624001468
2020-08-21: 903.2477092787212
2020-08-22: 917.826753454252
2020-08-23: 941.2246128263041
2020-08-24: 976.3987005561805
2020-08-25: 1024.229027918421
2020-08-26: 1063.8601233020204
2020-08-27: 1089.2809396989167
2020-08-28: 1114.3155158326988
2020-08-29: 1145.113888436592
2020-08-30: 1183.7309768661798
2020-08-31: 1227.441567412267
(31,)

Predicting for Finland__nan
2020-08-01: 335.5192462936506
2020-08-02: 442.37561084417325
2020-08-03: 503.326489691257
2020-08-04: 515.5070456523931
2020-08-05: 436.6164065795811
2020-08-06: 316.0036494692063
2020-08-07: 457.6637854721006
2020-08-08: 541.0035311911606
2020-08-09: 584.7685373747113
2020-08-10: 590.5686379587848
2020-08-11: 547.4989379063229
2020

(31,)

Predicting for Georgia__nan
2020-08-01: 3301.201214145864
2020-08-02: 4629.122800918369
2020-08-03: 5778.083704359073
2020-08-04: 4546.143048620199
2020-08-05: 3694.526846911816
2020-08-06: 2564.0330921691293
2020-08-07: 3966.0300633724187
2020-08-08: 4946.995021104764
2020-08-09: 5363.732219382251
2020-08-10: 4820.214038542471
2020-08-11: 4241.625643197036
2020-08-12: 3969.109007034139
2020-08-13: 4688.4514687289775
2020-08-14: 5296.833033370514
2020-08-15: 5492.4685281023785
2020-08-16: 5240.724012334731
2020-08-17: 4961.405687517957
2020-08-18: 4972.775730962942
2020-08-19: 5402.846998040047
2020-08-20: 5782.414028873704
2020-08-21: 5911.612255932247
2020-08-22: 5821.660414387518
2020-08-23: 5747.781964985561
2020-08-24: 5854.90706947834
2020-08-25: 6149.379770340968
2020-08-26: 6410.079001821343
2020-08-27: 6524.00480016094
2020-08-28: 6536.077874111397
2020-08-29: 6571.033665480212
2020-08-30: 6714.20921929755
2020-08-31: 6943.847365813679
(31,)

Predicting for Ghana__nan
2

2020-08-20: 382.023108991903
2020-08-21: 402.16217699776195
2020-08-22: 420.1137163724367
2020-08-23: 447.3640524811037
2020-08-24: 471.4819331518947
2020-08-25: 496.2043172643343
2020-08-26: 520.4282036056238
2020-08-27: 539.2288104397155
2020-08-28: 561.9410184772505
2020-08-29: 588.7053115766437
2020-08-30: 614.999646856829
2020-08-31: 641.4987995884433
(31,)

Predicting for Honduras__nan
2020-08-01: 116.83461294935418
2020-08-02: 134.2210656853041
2020-08-03: 294.5441691504922
2020-08-04: 551.9899213789836
2020-08-05: 214.95665905322036
2020-08-06: 184.36738903139883
2020-08-07: 241.9789077763383
2020-08-08: 288.46579127694844
2020-08-09: 406.9921183428444
2020-08-10: 479.9182435185431
2020-08-11: 355.7044575013817
2020-08-12: 339.5179392154956
2020-08-13: 376.61406115061806
2020-08-14: 426.523743795174
2020-08-15: 499.89336828977656
2020-08-16: 526.7945887682693
2020-08-17: 484.6808167780237
2020-08-18: 490.6421752340345
2020-08-19: 523.180316614703
2020-08-20: 567.8184183396797
2

2020-08-01: 59.45237438208073
2020-08-02: 73.01170425460379
2020-08-03: 84.33809986619833
2020-08-04: 88.37179620633623
2020-08-05: 88.2276366699414
2020-08-06: 99.38133675811807
2020-08-07: 132.03289677700084
2020-08-08: 149.11623058722375
2020-08-09: 161.52223490501245
2020-08-10: 170.0481819838097
2020-08-11: 178.1992718654942
2020-08-12: 194.44826141281797
2020-08-13: 218.1380103166305
2020-08-14: 235.93976588855404
2020-08-15: 250.34810905258496
2020-08-16: 262.8430608878453
2020-08-17: 276.32420248793665
2020-08-18: 294.5326850361376
2020-08-19: 315.5859914883388
2020-08-20: 334.20456917259736
2020-08-21: 350.8633293047878
2020-08-22: 366.7736310476189
2020-08-23: 383.85990921151375
2020-08-24: 403.60434416792026
2020-08-25: 424.60279768010014
2020-08-26: 444.5842056141152
2020-08-27: 454.9866650417788
2020-08-28: 471.8509689374912
2020-08-29: 490.85266528560953
2020-08-30: 511.41595497168413
2020-08-31: 532.6815203624344
(31,)

Predicting for Israel__nan
2020-08-01: 609.18413195

2020-08-22: 361.25727707066903
2020-08-23: 390.05792082220205
2020-08-24: 415.28434717303065
2020-08-25: 438.3488423497947
2020-08-26: 459.8221285879077
2020-08-27: 471.98020644759396
2020-08-28: 492.7369353883235
2020-08-29: 523.6359909470846
2020-08-30: 550.3187705507987
2020-08-31: 575.27420035766
(31,)

Predicting for South Korea__nan
2020-08-01: 340.4193968774451
2020-08-02: 413.84427224140666
2020-08-03: 424.6995121328037
2020-08-04: 401.432031254995
2020-08-05: 336.69327518508277
2020-08-06: 279.9835771134824
2020-08-07: 427.2803877331331
2020-08-08: 488.6179254653402
2020-08-09: 502.9528020140346
2020-08-10: 492.18448705259306
2020-08-11: 462.4972600881378
2020-08-12: 466.4824556454319
2020-08-13: 543.9487813656931
2020-08-14: 590.1710479550004
2020-08-15: 607.7821008443746
2020-08-16: 608.8380305796361
2020-08-17: 605.5858310989179
2020-08-18: 633.1282640817196
2020-08-19: 684.7136482831972
2020-08-20: 722.5083614007854
2020-08-21: 743.9469630088374
2020-08-22: 756.00727316972

2020-08-12: 705.6408871229978
2020-08-13: 832.1529501109783
2020-08-14: 912.8402336005577
2020-08-15: 921.5717611430784
2020-08-16: 912.8087884947397
2020-08-17: 900.1759095455989
2020-08-18: 916.4243276429313
2020-08-19: 992.6363318429471
2020-08-20: 1047.756847456852
2020-08-21: 1067.277950809489
2020-08-22: 1074.4661993620284
2020-08-23: 1073.7634778276883
2020-08-24: 1102.2548417857993
2020-08-25: 1156.1366571999997
2020-08-26: 1199.3884839907796
2020-08-27: 1216.017233057352
2020-08-28: 1230.6818185687914
2020-08-29: 1247.1098761695105
2020-08-30: 1279.7132624140297
2020-08-31: 1323.4420736343918
(31,)

Predicting for Latvia__nan
2020-08-01: 411.3166710928832
2020-08-02: 456.532827455001
2020-08-03: 509.91103326776266
2020-08-04: 588.7625488717513
2020-08-05: 439.43441725245333
2020-08-06: 338.1881690486254
2020-08-07: 504.9514803325872
2020-08-08: 565.6795097809891
2020-08-09: 613.6476603334613
2020-08-10: 636.1286969966195
2020-08-11: 570.5088224887995
2020-08-12: 554.5187110225

2020-08-16: 288.27138552889187
2020-08-17: 300.66324485855824
2020-08-18: 324.6324526094816
2020-08-19: 348.8230153857429
2020-08-20: 369.65926349339344
2020-08-21: 387.74191579343193
2020-08-22: 403.7529514867653
2020-08-23: 431.76324518584585
2020-08-24: 457.5559547695096
2020-08-25: 482.5222951604585
2020-08-26: 505.8168324639025
2020-08-27: 518.9284958254716
2020-08-28: 539.6625204113088
2020-08-29: 566.3046044574949
2020-08-30: 592.806590465553
2020-08-31: 618.7840752924276
(31,)

Predicting for Mozambique__nan
2020-08-01: 104.06899641620504
2020-08-02: 187.922100119506
2020-08-03: 194.619499864302
2020-08-04: 172.11536706482974
2020-08-05: 145.72960796293677
2020-08-06: 141.62347902408686
2020-08-07: 201.97216039504858
2020-08-08: 251.4091849653353
2020-08-09: 262.6510958158305
2020-08-10: 256.24264710550443
2020-08-11: 249.773646514167
2020-08-12: 262.86589673050344
2020-08-13: 303.11568671888693
2020-08-14: 336.8008078831642
2020-08-15: 351.0607501048903
2020-08-16: 355.5721710

2020-08-06: 98.99470786364424
2020-08-07: 130.04393994320148
2020-08-08: 147.10151256469652
2020-08-09: 164.40452442104586
2020-08-10: 178.24987390756166
2020-08-11: 180.0217393236549
2020-08-12: 194.78897820212316
2020-08-13: 217.74151751753348
2020-08-14: 236.21089996006904
2020-08-15: 253.56906619328782
2020-08-16: 267.62007558583235
2020-08-17: 278.510884979071
2020-08-18: 289.83665594652103
2020-08-19: 309.2165209212618
2020-08-20: 327.9232339985329
2020-08-21: 345.5200896098982
2020-08-22: 361.26177290075026
2020-08-23: 365.91678974956665
2020-08-24: 379.37054054235165
2020-08-25: 397.54190703120435
2020-08-26: 415.88884892270113
2020-08-27: 424.9618197135189
2020-08-28: 438.0831976438845
2020-08-29: 449.0153578210876
2020-08-30: 463.87842033311154
2020-08-31: 481.42222042650144
(31,)

Predicting for Netherlands__nan
2020-08-01: 4354.863072376625
2020-08-02: 5588.274903430284
2020-08-03: 5850.4634655131895
2020-08-04: 5878.603081500332
2020-08-05: 4772.645788514458
2020-08-06: 31

2020-08-03: 1707.7589568404799
2020-08-04: 1839.492379217558
2020-08-05: 1625.487586948136
2020-08-06: 1021.5287304589253
2020-08-07: 1499.3691766181482
2020-08-08: 1730.837156399053
2020-08-09: 1859.7377417549524
2020-08-10: 1911.7616614136093
2020-08-11: 1749.9598643428012
2020-08-12: 1574.9318422579818
2020-08-13: 1795.4500239876536
2020-08-14: 1961.026296186134
2020-08-15: 2058.501233816576
2020-08-16: 2083.23046792106
2020-08-17: 2009.622049706077
2020-08-18: 1986.478754095436
2020-08-19: 2117.221701453168
2020-08-20: 2237.7203004389175
2020-08-21: 2315.706565608
2020-08-22: 2343.277041272444
2020-08-23: 2342.3513073963686
2020-08-24: 2374.2252087449083
2020-08-25: 2471.871082171875
2020-08-26: 2568.409086257742
2020-08-27: 2639.0615660589783
2020-08-28: 2681.5930438462738
2020-08-29: 2714.413501872344
2020-08-30: 2770.2592408226665
2020-08-31: 2856.305229187582
(31,)

Predicting for Papua New Guinea__nan
2020-08-01: 51.55443586293118
2020-08-02: 62.95995422383182
2020-08-03: 68.3

2020-08-16: 1035.9895058510738
2020-08-17: 1017.4796072504423
2020-08-18: 1048.821747602134
2020-08-19: 1134.2149263466868
2020-08-20: 1189.63979879886
2020-08-21: 1214.8390012190075
2020-08-22: 1225.941018302872
2020-08-23: 1247.1643145916785
2020-08-24: 1290.1138836763012
2020-08-25: 1353.3065995310378
2020-08-26: 1402.6630955817789
2020-08-27: 1435.500048738201
2020-08-28: 1462.3334621863396
2020-08-29: 1496.255975422272
2020-08-30: 1543.550420595987
2020-08-31: 1598.925101957014
(31,)

Predicting for Romania__nan
2020-08-01: 8292.276115668168
2020-08-02: 9471.038907936114
2020-08-03: 9273.446442971535
2020-08-04: 8864.772337634937
2020-08-05: 6368.360953203149
2020-08-06: 5040.750181652721
2020-08-07: 8446.0188668183
2020-08-08: 9476.484959043148
2020-08-09: 9485.931042250606
2020-08-10: 8978.048786233514
2020-08-11: 7743.702092254244
2020-08-12: 7638.02247580707
2020-08-13: 9232.091440148717
2020-08-14: 9949.930247796354
2020-08-15: 10019.049979426265
2020-08-16: 9686.079805184236

2020-08-29: 433.20259122725344
2020-08-30: 448.1980544809372
2020-08-31: 465.25505963351407
(31,)

Predicting for El Salvador__nan
2020-08-01: 236.5409577535493
2020-08-02: 223.01267753045863
2020-08-03: 196.60107414970173
2020-08-04: 193.55201904623635
2020-08-05: 273.5016761938698
2020-08-06: 204.5090769158563
2020-08-07: 291.03254391447166
2020-08-08: 302.0628414806378
2020-08-09: 296.8161439824775
2020-08-10: 314.0970377185264
2020-08-11: 343.9232435934372
2020-08-12: 340.81203689872405
2020-08-13: 382.83318218311496
2020-08-14: 399.5919398822171
2020-08-15: 408.7744262460461
2020-08-16: 428.6391016565137
2020-08-17: 449.936340514038
2020-08-18: 471.24422300822545
2020-08-19: 501.4485441510691
2020-08-20: 521.815581285397
2020-08-21: 539.346730374885
2020-08-22: 561.3037700553245
2020-08-23: 594.3687946928928
2020-08-24: 621.7586986407734
2020-08-25: 650.2859109924141
2020-08-26: 674.8949845732166
2020-08-27: 698.5918140995557
2020-08-28: 725.9674332003387
2020-08-29: 757.619002380

2020-08-11: 236.8582916763558
2020-08-12: 249.5011739324083
2020-08-13: 283.22065147831137
2020-08-14: 305.90581298765727
2020-08-15: 321.01872294719067
2020-08-16: 332.23942848599825
2020-08-17: 343.04621125406806
2020-08-18: 367.24669644074936
2020-08-19: 395.0040331568957
2020-08-20: 417.582393441992
2020-08-21: 435.83154461732295
2020-08-22: 452.00517598732887
2020-08-23: 479.74805258990233
2020-08-24: 506.47426524687995
2020-08-25: 533.616072893735
2020-08-26: 558.2746720165301
2020-08-27: 576.2989227599719
2020-08-28: 598.4938952350153
2020-08-29: 626.0480375663215
2020-08-30: 654.1476098395374
2020-08-31: 682.1971298795802
(31,)

Predicting for Chad__nan
2020-08-01: 56.37420088991095
2020-08-02: 68.21482120699957
2020-08-03: 74.55148623006788
2020-08-04: 80.94303302293426
2020-08-05: 85.85893381391243
2020-08-06: 96.8003606864285
2020-08-07: 127.80780852989234
2020-08-08: 143.13529931505425
2020-08-09: 153.63845037352945
2020-08-10: 163.62193150222254
2020-08-11: 173.98245892963

(31,)

Predicting for Taiwan__nan
2020-08-01: 50.67578468591501
2020-08-02: 62.74798889891996
2020-08-03: 68.26059971545519
2020-08-04: 73.72460382696828
2020-08-05: 79.1335464933935
2020-08-06: 92.81333388578004
2020-08-07: 121.94461583813501
2020-08-08: 136.86244149412033
2020-08-09: 146.8264321568545
2020-08-10: 156.38940713730403
2020-08-11: 167.29336800636722
2020-08-12: 184.30702305359688
2020-08-13: 206.23675208971656
2020-08-14: 222.4246271641249
2020-08-15: 235.7217090251424
2020-08-16: 248.84124903456492
2020-08-17: 263.57460140491327
2020-08-18: 276.1274870651756
2020-08-19: 294.7537290378956
2020-08-20: 311.67077939733593
2020-08-21: 327.1238414403827
2020-08-22: 342.70202253168594
2020-08-23: 348.9913905111261
2020-08-24: 362.9520731209849
2020-08-25: 380.4733406388048
2020-08-26: 397.5426188396114
2020-08-27: 405.4290532767384
2020-08-28: 418.4664521694336
2020-08-29: 430.01777842554407
2020-08-30: 444.94991123830044
2020-08-31: 461.9041268754439
(31,)

Predicting for Tan

2020-08-27: 2607.191155541033
2020-08-28: 2626.705449341907
2020-08-29: 2673.3977999651574
2020-08-30: 2759.0367901610857
2020-08-31: 2857.223870727708
(31,)

Predicting for United States__Arizona
2020-08-01: 3491.3966663772458
2020-08-02: 4082.386277037585
2020-08-03: 3906.750303152855
2020-08-04: 3921.8953364192666
2020-08-05: 2851.323084875096
2020-08-06: 2208.0491350289058
2020-08-07: 3648.602029313714
2020-08-08: 4110.868682705032
2020-08-09: 4115.160138726531
2020-08-10: 3976.278022807088
2020-08-11: 3450.1836536815217
2020-08-12: 3374.177944513477
2020-08-13: 4051.782166472139
2020-08-14: 4370.434745381807
2020-08-15: 4417.682251117119
2020-08-16: 4312.801991050266
2020-08-17: 4104.460367699207
2020-08-18: 4190.351195027603
2020-08-19: 4562.303267147701
2020-08-20: 4785.885905748116
2020-08-21: 4850.600688669574
2020-08-22: 4817.265586386703
2020-08-23: 4786.829590983412
2020-08-24: 4913.792163875824
2020-08-25: 5156.057873775436
2020-08-26: 5329.083588240066
2020-08-27: 5407.11

2020-08-12: 288.62144714207017
2020-08-13: 329.48427530227974
2020-08-14: 358.9022785662412
2020-08-15: 376.006514224098
2020-08-16: 384.72847916828005
2020-08-17: 392.1903409918396
2020-08-18: 415.92313446153133
2020-08-19: 447.93511664234774
2020-08-20: 474.5233375211154
2020-08-21: 494.06051435537745
2020-08-22: 509.1957762181051
2020-08-23: 535.7128820757027
2020-08-24: 563.1332718738895
2020-08-25: 593.1202972110123
2020-08-26: 620.3937212339431
2020-08-27: 643.8250623378372
2020-08-28: 666.9619295044301
2020-08-29: 694.9819212325904
2020-08-30: 724.6928800776545
2020-08-31: 755.3532708761943
(31,)

Predicting for United States__Iowa
2020-08-01: 3308.879441048914
2020-08-02: 3829.547243823956
2020-08-03: 3303.390037600211
2020-08-04: 3032.241588448829
2020-08-05: 2783.245847663123
2020-08-06: 2049.6798393054723
2020-08-07: 3374.8590192060037
2020-08-08: 3737.894616672648
2020-08-09: 3538.293821817332
2020-08-10: 3384.660156887578
2020-08-11: 3171.694393798552
2020-08-12: 3085.8511

2020-08-11: 2170.6367135288497
2020-08-12: 2155.570032561928
2020-08-13: 2586.958296836104
2020-08-14: 2789.5935218660125
2020-08-15: 2817.820621666483
2020-08-16: 2721.975554083676
2020-08-17: 2617.1650375613526
2020-08-18: 2687.8688627026418
2020-08-19: 2927.796265412865
2020-08-20: 3071.4362397489226
2020-08-21: 3108.726245784462
2020-08-22: 3081.0724253690328
2020-08-23: 3069.7181588963585
2020-08-24: 3159.6843657077043
2020-08-25: 3317.514862620432
2020-08-26: 3428.9056997280386
2020-08-27: 3475.487801729189
2020-08-28: 3494.077340873483
2020-08-29: 3532.485713576316
2020-08-30: 3626.205238222311
2020-08-31: 3748.229687689558
(31,)

Predicting for United States__Maine
2020-08-01: 229.75798158881733
2020-08-02: 266.7073691764357
2020-08-03: 261.1262599357003
2020-08-04: 286.8367670697391
2020-08-05: 246.23826344996743
2020-08-06: 208.2815160805459
2020-08-07: 307.30962731708826
2020-08-08: 342.94022163834734
2020-08-09: 354.7386859580012
2020-08-10: 366.5184995324431
2020-08-11: 35

2020-08-30: 2899.2467968074798
2020-08-31: 2987.6755023912033
(31,)

Predicting for United States__New Hampshire
2020-08-01: 472.7462842270983
2020-08-02: 552.5428458773952
2020-08-03: 520.3489103820864
2020-08-04: 475.0762745142402
2020-08-05: 453.072157550527
2020-08-06: 355.33776709215755
2020-08-07: 550.3116614793696
2020-08-08: 616.1670364240074
2020-08-09: 610.0345876206968
2020-08-10: 595.0894197224493
2020-08-11: 576.2903808799631
2020-08-12: 573.8903461390325
2020-08-13: 670.7809281159223
2020-08-14: 719.4091653230694
2020-08-15: 728.907139587699
2020-08-16: 729.634374109035
2020-08-17: 730.5819270137977
2020-08-18: 754.3448022260689
2020-08-19: 813.7961892898986
2020-08-20: 852.8602140636394
2020-08-21: 871.3622853010609
2020-08-22: 883.8711191641103
2020-08-23: 899.6084396176636
2020-08-24: 930.9542528528277
2020-08-25: 975.8383380463497
2020-08-26: 1011.253320792458
2020-08-27: 1031.7144543910101
2020-08-28: 1052.335802665084
2020-08-29: 1077.1581504608457
2020-08-30: 1111.

2020-08-14: 1147.6394667262512
2020-08-15: 1153.5166431511766
2020-08-16: 1300.4150011145573
2020-08-17: 1377.0190924562787
2020-08-18: 1313.807808148768
2020-08-19: 1344.4825837736046
2020-08-20: 1367.940889482997
2020-08-21: 1409.7581853405115
2020-08-22: 1500.5468095094936
2020-08-23: 1547.896747394844
2020-08-24: 1547.1851919077053
2020-08-25: 1576.7412439119137
2020-08-26: 1611.812529243038
2020-08-27: 1658.2607326936193
2020-08-28: 1723.3951730263734
2020-08-29: 1766.2729013752285
2020-08-30: 1791.3420902142482
2020-08-31: 1827.4113741918932
(31,)

Predicting for United States__South Carolina
2020-08-01: 1503.5932477978881
2020-08-02: 1856.3203346654025
2020-08-03: 1741.0927577721152
2020-08-04: 1503.2189802361167
2020-08-05: 1270.7897543042816
2020-08-06: 990.3059863383264
2020-08-07: 1623.641395341552
2020-08-08: 1861.6832269702911
2020-08-09: 1816.319413208519
2020-08-10: 1690.2511381345223
2020-08-11: 1548.2790739469183
2020-08-12: 1532.8624161569971
2020-08-13: 1839.97397482

2020-08-31: 4035.1098690478548
(31,)

Predicting for United States__Wisconsin
2020-08-01: 5744.773321377702
2020-08-02: 6542.4368432057445
2020-08-03: 6082.265989289753
2020-08-04: 4765.214209486694
2020-08-05: 3818.279705263088
2020-08-06: 3269.0216393823684
2020-08-07: 5662.888587086608
2020-08-08: 6319.313413807025
2020-08-09: 6019.008676515706
2020-08-10: 5330.691060396875
2020-08-11: 4800.425107464653
2020-08-12: 4933.741071779466
2020-08-13: 6066.018033689139
2020-08-14: 6491.624984634177
2020-08-15: 6349.343619851815
2020-08-16: 6006.107000023757
2020-08-17: 5808.237125499828
2020-08-18: 6062.337126276415
2020-08-19: 6670.457977046128
2020-08-20: 6952.158317555319
2020-08-21: 6920.38911290193
2020-08-22: 6790.7145330456915
2020-08-23: 6781.368251660932
2020-08-24: 7030.2942387931735
2020-08-25: 7406.947573049391
2020-08-26: 7619.008146187526
2020-08-27: 7659.796651689267
2020-08-28: 7662.331319530042
2020-08-29: 7751.779177176103
2020-08-30: 7979.737155711312
2020-08-31: 8253.16

2020-08-07: 2521.6574016022614
2020-08-08: 2865.719893133229
2020-08-09: 2965.390964640279
2020-08-10: 2838.9535859935927
2020-08-11: 2531.316068042638
2020-08-12: 2422.9349242807853
2020-08-13: 2864.1281770361966
2020-08-14: 3108.8280871521965
2020-08-15: 3181.2097195975625
2020-08-16: 3113.2024154740207
2020-08-17: 2984.8576435840996
2020-08-18: 3023.806734990132
2020-08-19: 3272.803629162292
2020-08-20: 3445.0430394452064
2020-08-21: 3511.5301272700904
2020-08-22: 3498.255679540122
2020-08-23: 3486.879799399192
2020-08-24: 3569.290393708765
2020-08-25: 3738.195075101608
2020-08-26: 3871.5973432775745
2020-08-27: 3943.7773362709063
2020-08-28: 3976.2867011351263
2020-08-29: 4019.8920518158193
2020-08-30: 4117.73560722296
2020-08-31: 4253.1254337314385
(31,)

Predicting for Zambia__nan
2020-08-01: 86.5921520086615
2020-08-02: 110.42544426703182
2020-08-03: 98.01342284222659
2020-08-04: 101.56612547728474
2020-08-05: 105.94891999794973
2020-08-06: 113.3686946414708
2020-08-07: 158.0632

In [199]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,58.844692
214,Aruba,,2020-08-02,71.323398
215,Aruba,,2020-08-03,78.998212
216,Aruba,,2020-08-04,90.247021
217,Aruba,,2020-08-05,87.814682


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [200]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [201]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,58.844691538124245
Aruba,,2020-08-02,71.32339797888373
Aruba,,2020-08-03,78.99821175219239
Aruba,,2020-08-04,90.24702060691985
Afghanistan,,2020-08-01,149.93664407981095
Afghanistan,,2020-08-02,287.2826933086321
Afghanistan,,2020-08-03,277.50656117987995
Afghanistan,,2020-08-04,264.8862561764254
Angola,,2020-08-01,198.70750682167088


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [202]:
import os
print(os.getcwd())

/home/cdrutinus/Cours/SDD/Mini-hackaton/pangolin/covid-xprize/covid_xprize/examples/predictors/linear


In [203]:
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [204]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [205]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 62.5 ms, sys: 109 ms, total: 172 ms
Wall time: 4.08 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [56]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-01
End date: 2021-05-30


In [57]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [58]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-01 to 2021-05-30...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 7.59 s, sys: 277 ms, total: 7.87 s
Wall time: 2min 49s
