# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [15]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [16]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [17]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7fac08dffaf0>)

In [18]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [19]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'M1_Wildcard', 'ConfirmedCases',
       'ConfirmedDeaths', 'StringencyIndex', 'StringencyIndexForDisplay',
       'StringencyLegacyIndex', 'StringencyLegacyIndexForDisplay',
       'GovernmentResponseIndex', 'Gove

In [20]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [21]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [22]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [23]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [24]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [25]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [26]:
list_countries = sorted(list(set(df.CountryName)))
hist_ips_df = pd.read_csv("../../../validation/data/2020-09-30_historical_ip.csv",
                              parse_dates=['Date'],
                              encoding="ISO-8859-1",
                              dtype={"RegionName": str},
                              error_bad_lines=True)
hist_ips_df = hist_ips_df[hist_ips_df.CountryName.isin(list_countries)]
hist_ips_df.to_csv("../../../validation/data/2020-09-30_historical_ip.csv" , index = False)

In [27]:
hist_ips_df

Unnamed: 0,CountryName,RegionName,Date,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64385,Zimbabwe,,2020-09-26,2.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,3.0
64386,Zimbabwe,,2020-09-27,2.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,3.0
64387,Zimbabwe,,2020-09-28,2.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,3.0
64388,Zimbabwe,,2020-09-29,2.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,1.0


In [28]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [29]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [30]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [31]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [32]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 140.7093913629205
Test MAE: 152.4940063194295


In [33]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.0010599800384059794
Day -6 NewCases 0.439769568825458
Day -5 NewCases 0.2168537668667522
Day -4 NewCases 0.0593830010565273
Day -3 NewCases 0.0689558246877214
Day -2 NewCases 0.05235638861100778
Day -1 NewCases 0.23820118873827473
Day -26 C6_Stay at home requirements 4.316095256933905
Day -22 C2_Workplace closing 9.716767615360357
Day -17 C2_Workplace closing 5.764854338654872
Intercept 26.551692081205488


In [34]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [35]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [36]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 58.82300181119936
2020-08-02: 71.31582751123675
2020-08-03: 78.99398601057783
2020-08-04: 90.2292787574965
2020-08-05: 87.81074037842762
2020-08-06: 98.75165916655027
2020-08-07: 131.03212665889947
2020-08-08: 147.01596584944787
2020-08-09: 158.95052220908332
2020-08-10: 169.93044405023159
2020-08-11: 177.51070366204908
2020-08-12: 193.47238559050163
2020-08-13: 216.81091172018114
2020-08-14: 234.03207239542056
2020-08-15: 248.55331845472057
2020-08-16: 261.9562079357195
2020-08-17: 275.2890400534381
2020-08-18: 293.2791763110071
2020-08-19: 314.063995759744
2020-08-20: 332.4192109437261
2020-08-21: 349.22265464569745
2020-08-22: 365.4886603628373
2020-08-23: 382.5217832856015
2020-08-24: 407.87717354169223
2020-08-25: 430.06812302503084
2020-08-26: 450.55999413363935
2020-08-27: 465.98324867588303
2020-08-28: 484.63064350072614
2020-08-29: 515.333850339269
2020-08-30: 541.884867792765
2020-08-31: 566.2481063041456

Predicting for Afghanistan__nan

2020-08-27: 4219.0799640306805
2020-08-28: 4269.772841948303
2020-08-29: 4316.552979599862
2020-08-30: 4406.590881825113
2020-08-31: 4545.533550777108

Predicting for Burundi__nan
2020-08-01: 56.71653989808276
2020-08-02: 66.53450387794538
2020-08-03: 74.43623702468685
2020-08-04: 80.17767626000114
2020-08-05: 84.26311210390307
2020-08-06: 96.32427742076659
2020-08-07: 127.34120519668448
2020-08-08: 142.0748925963046
2020-08-09: 153.01310316586907
2020-08-10: 162.66425470459734
2020-08-11: 172.81195279960673
2020-08-12: 189.3982314703651
2020-08-13: 212.04358763602806
2020-08-14: 228.46329012265863
2020-08-15: 242.22523125849693
2020-08-16: 255.3659341268916
2020-08-17: 269.76324989620053
2020-08-18: 282.2554980083099
2020-08-19: 301.24530516250917
2020-08-20: 318.42507983480346
2020-08-21: 334.14702778799665
2020-08-22: 349.7583512092749
2020-08-23: 355.95010236676757
2020-08-24: 369.97362647966247
2020-08-25: 387.73821214585786
2020-08-26: 405.0426177424877
2020-08-27: 413.1147539789

2020-08-21: 2086.527542413211
2020-08-22: 2103.015368161686
2020-08-23: 2084.4130534583205
2020-08-24: 2114.0616401923467
2020-08-25: 2203.930050514393
2020-08-26: 2285.7782580519006
2020-08-27: 2331.7598239931694
2020-08-28: 2358.2345491490482
2020-08-29: 2376.1950080085894
2020-08-30: 2422.3975520751515
2020-08-31: 2496.201418727858

Predicting for Belize__nan
2020-08-01: 90.22937837977064
2020-08-02: 115.06560350944511
2020-08-03: 115.61825739210916
2020-08-04: 130.070332489382
2020-08-05: 133.15471908336298
2020-08-06: 123.64258206413754
2020-08-07: 167.556225321515
2020-08-08: 189.72708264968372
2020-08-09: 200.23796305841495
2020-08-10: 213.3907250193884
2020-08-11: 220.51956949812467
2020-08-12: 230.29061826461952
2020-08-13: 258.63209663001896
2020-08-14: 279.24321308382474
2020-08-15: 294.23185787188856
2020-08-16: 308.7570812720045
2020-08-17: 321.3738958954195
2020-08-18: 343.612605706893
2020-08-19: 368.40580821667356
2020-08-20: 389.55299855378115
2020-08-21: 407.934326503

2020-08-05: 79.93587549435514
2020-08-06: 93.87888123272586
2020-08-07: 124.19630358468919
2020-08-08: 140.10959520464306
2020-08-09: 149.3367724208934
2020-08-10: 157.99380983862216
2020-08-11: 168.74565951227913
2020-08-12: 186.05291690110965
2020-08-13: 208.6816611280192
2020-08-14: 225.26474931315343
2020-08-15: 238.18354395645696
2020-08-16: 250.86564874924767
2020-08-17: 265.5408424241491
2020-08-18: 284.10579018176736
2020-08-19: 304.5038683185075
2020-08-20: 322.2179627625344
2020-08-21: 338.1061018353781
2020-08-22: 354.10039852310916
2020-08-23: 371.6679189029829
2020-08-24: 391.4406771389861
2020-08-25: 411.94822948504844
2020-08-26: 431.286699276606
2020-08-27: 445.53173096541593
2020-08-28: 463.3859168316008
2020-08-29: 482.96633528189386
2020-08-30: 503.8570677664827
2020-08-31: 525.1686641194982

Predicting for Canada__nan
2020-08-01: 3992.9377742216093
2020-08-02: 4752.323501744425
2020-08-03: 4948.523404265031
2020-08-04: 4993.319394387919
2020-08-05: 4176.647142920733

2020-08-26: 512.2041991144488
2020-08-27: 538.70342921454
2020-08-28: 562.9715627608418
2020-08-29: 592.3640497967374
2020-08-30: 622.0815525285453
2020-08-31: 653.059860479856

Predicting for Colombia__nan
2020-08-01: 6077.205006743495
2020-08-02: 7487.956676852941
2020-08-03: 7657.502517198883
2020-08-04: 7512.508780152522
2020-08-05: 6610.499965846539
2020-08-06: 4317.573103638014
2020-08-07: 6694.336610893436
2020-08-08: 7733.240232726625
2020-08-09: 7938.086442812927
2020-08-10: 7809.8169776213335
2020-08-11: 7108.706307160727
2020-08-12: 6516.8982908974185
2020-08-13: 7610.628721040513
2020-08-14: 8288.02818089535
2020-08-15: 8487.57727730212
2020-08-16: 8402.621681997454
2020-08-17: 8067.693564021746
2020-08-18: 8025.158358299946
2020-08-19: 8630.741115633507
2020-08-20: 9086.95646979446
2020-08-21: 9274.562257816478
2020-08-22: 9272.588766722352
2020-08-23: 9212.189438221667
2020-08-24: 9349.989066072361
2020-08-25: 9755.859121185646
2020-08-26: 10097.425021365898
2020-08-27: 1

2020-08-12: 186.55358790393234
2020-08-13: 208.468499126805
2020-08-14: 224.79507418233862
2020-08-15: 238.36203835858143
2020-08-16: 251.8044702908922
2020-08-17: 266.573148850588
2020-08-18: 290.3258371992956
2020-08-19: 311.68466755266655
2020-08-20: 329.9778172014828
2020-08-21: 346.8671189140878
2020-08-22: 363.85985237297166
2020-08-23: 392.6982668610593
2020-08-24: 417.80131431892073
2020-08-25: 440.85053007082905
2020-08-26: 462.3827608965628
2020-08-27: 474.6615895616085
2020-08-28: 495.5383511662641
2020-08-29: 522.1515595244489
2020-08-30: 547.7776682372631
2020-08-31: 572.2762350490208

Predicting for Dominica__nan
2020-08-01: 51.79470155961057
2020-08-02: 65.10692142967238
2020-08-03: 69.18698907700835
2020-08-04: 75.21050606692334
2020-08-05: 81.9671915084042
2020-08-06: 94.00220379784241
2020-08-07: 123.53959561963791
2020-08-08: 138.82314315724196
2020-08-09: 148.35591259629558
2020-08-10: 158.29676827182308
2020-08-11: 169.55930908678175
2020-08-12: 186.0325386947976
2

2020-08-28: 6873.49477953382
2020-08-29: 7160.348554879928
2020-08-30: 7575.305950218077
2020-08-31: 7844.721114317447

Predicting for Estonia__nan
2020-08-01: 319.49348677840464
2020-08-02: 427.907090242326
2020-08-03: 414.0774170704649
2020-08-04: 401.2895628034139
2020-08-05: 370.2084355121745
2020-08-06: 283.476422366007
2020-08-07: 423.1298918008782
2020-08-08: 493.96857696502514
2020-08-09: 501.5877716681189
2020-08-10: 499.21261569356665
2020-08-11: 479.73329373763613
2020-08-12: 471.8102785766648
2020-08-13: 545.876857358303
2020-08-14: 594.573506402104
2020-08-15: 611.2586020745797
2020-08-16: 617.1588125166955
2020-08-17: 616.9069680131488
2020-08-18: 627.9899933730309
2020-08-19: 675.1383434068879
2020-08-20: 712.3944489675783
2020-08-21: 733.1553964732385
2020-08-22: 746.5183760963951
2020-08-23: 748.7393867054845
2020-08-24: 768.7100466793894
2020-08-25: 804.5629525479718
2020-08-26: 836.0316018900174
2020-08-27: 850.3487294623274
2020-08-28: 865.4451280111995
2020-08-29: 


Predicting for United Kingdom__Wales
2020-08-01: 630.7327612816392
2020-08-02: 596.5765766770727
2020-08-03: 345.9918134791493
2020-08-04: 209.41396192684795
2020-08-05: 197.25845506503157
2020-08-06: 304.67829529242795
2020-08-07: 575.3012740216709
2020-08-08: 567.7469721925364
2020-08-09: 446.94858383271725
2020-08-10: 379.8543261425866
2020-08-11: 390.9013235396924
2020-08-12: 487.15427338228903
2020-08-13: 616.3477453973311
2020-08-14: 619.7078546459461
2020-08-15: 566.8844405950234
2020-08-16: 541.863721532483
2020-08-17: 566.7015379580356
2020-08-18: 643.9867654049518
2020-08-19: 716.4805156903434
2020-08-20: 728.181555296254
2020-08-21: 712.9160729247283
2020-08-22: 713.8075675870319
2020-08-23: 755.7051831011428
2020-08-24: 815.1086808481845
2020-08-25: 865.1071166790231
2020-08-26: 884.9812061157876
2020-08-27: 888.0935356905412
2020-08-28: 906.3122000537504
2020-08-29: 946.4624628212972
2020-08-30: 995.385573313008
2020-08-31: 1036.9428698920065

Predicting for Georgia__nan


2020-08-07: 163.3727761750897
2020-08-08: 186.97158042779654
2020-08-09: 200.12291758837208
2020-08-10: 213.06905740729746
2020-08-11: 213.30159474911886
2020-08-12: 225.6438410922032
2020-08-13: 254.67663285976437
2020-08-14: 276.29973701062613
2020-08-15: 292.4513081122595
2020-08-16: 305.92332399901113
2020-08-17: 315.9855535218762
2020-08-18: 327.4445311330471
2020-08-19: 350.0588703890544
2020-08-20: 370.5819280388747
2020-08-21: 388.0895854498367
2020-08-22: 403.6980376675014
2020-08-23: 408.29821620315363
2020-08-24: 422.5911421564532
2020-08-25: 442.76028064787283
2020-08-26: 462.40299365753845
2020-08-27: 480.5024880430035
2020-08-28: 495.9620131646798
2020-08-29: 508.283336046177
2020-08-30: 525.1064271550556
2020-08-31: 545.0441538844706

Predicting for Hong Kong__nan
2020-08-01: 72.65495862667453
2020-08-02: 97.81526186093322
2020-08-03: 115.7878336140422
2020-08-04: 134.15777541227158
2020-08-05: 131.91893764432348
2020-08-06: 118.73845829436918
2020-08-07: 155.14578253575

2020-08-01: 9114.68146778478
2020-08-02: 9646.919731484573
2020-08-03: 18051.46492059375
2020-08-04: 14029.4581275904
2020-08-05: 11298.095849043028
2020-08-06: 7284.422752549107
2020-08-07: 10517.0445973794
2020-08-08: 12716.016218297378
2020-08-09: 15794.539702971115
2020-08-10: 14275.388278972057
2020-08-11: 12342.486275421672
2020-08-12: 11078.412845304769
2020-08-13: 12648.376487972075
2020-08-14: 14370.721482136534
2020-08-15: 15687.975088948047
2020-08-16: 15041.14541794541
2020-08-17: 14042.270499474473
2020-08-18: 13751.84948349116
2020-08-19: 14726.572509205735
2020-08-20: 15881.008818378834
2020-08-21: 16568.533737402464
2020-08-22: 16343.042333321342
2020-08-23: 15969.590878449491
2020-08-24: 16067.386370061104
2020-08-25: 16775.096696696986
2020-08-26: 17567.463860585693
2020-08-27: 18028.18817947325
2020-08-28: 18054.607032475586
2020-08-29: 18038.421909528985
2020-08-30: 18310.853835503745
2020-08-31: 18887.24190674356

Predicting for Iraq__nan
2020-08-01: 1968.261122021

2020-08-07: 1141.9749452963206
2020-08-08: 1328.6721241921618
2020-08-09: 1323.962457782714
2020-08-10: 1304.215776316205
2020-08-11: 1177.3438966566714
2020-08-12: 1135.235856752145
2020-08-13: 1342.3730252268415
2020-08-14: 1461.1164547006342
2020-08-15: 1483.6531923206617
2020-08-16: 1470.8196328878048
2020-08-17: 1424.4844737972062
2020-08-18: 1453.4118588905103
2020-08-19: 1575.1277548088106
2020-08-20: 1659.0044694299818
2020-08-21: 1691.8536295464805
2020-08-22: 1698.5127611281728
2020-08-23: 1710.4298645711096
2020-08-24: 1759.9299926609992
2020-08-25: 1846.5013771694735
2020-08-26: 1915.0705318793775
2020-08-27: 1956.0363730043505
2020-08-28: 1983.8411573906787
2020-08-29: 2018.4603520367
2020-08-30: 2075.6137317086955
2020-08-31: 2148.3836561949915

Predicting for Kyrgyz Republic__nan
2020-08-01: 392.65376799543037
2020-08-02: 457.9441249419222
2020-08-03: 489.0256819201343
2020-08-04: 474.58187819972125
2020-08-05: 378.35782494879874
2020-08-06: 312.0241297045301
2020-08-07:

2020-08-13: 622.4956301443037
2020-08-14: 681.5288258959516
2020-08-15: 710.0510615816784
2020-08-16: 716.3048700832717
2020-08-17: 704.395305595169
2020-08-18: 712.2364756204432
2020-08-19: 765.1122232731838
2020-08-20: 809.9026312932882
2020-08-21: 837.1355588143675
2020-08-22: 850.2770952720806
2020-08-23: 847.8145072299243
2020-08-24: 867.3556061581639
2020-08-25: 907.3791703123933
2020-08-26: 944.2055749001868
2020-08-27: 966.8317804105556
2020-08-28: 983.3438266690944
2020-08-29: 996.5462759720946
2020-08-30: 1021.5729282902254
2020-08-31: 1056.3155562224133

Predicting for Lesotho__nan
2020-08-01: 59.92681189664024
2020-08-02: 67.82126919582416
2020-08-03: 77.5965410662236
2020-08-04: 85.19637398237941
2020-08-05: 83.27750187048338
2020-08-06: 97.33639783281025
2020-08-07: 129.75524710699574
2020-08-08: 144.18757987341402
2020-08-09: 156.13349440436988
2020-08-10: 165.74140038214
2020-08-11: 173.78870057692836
2020-08-12: 191.10017816059053
2020-08-13: 214.4184568528152
2020-08-

2020-08-23: 8038.145203381763
2020-08-24: 8014.657163260085
2020-08-25: 8292.722172262584
2020-08-26: 8629.037757740587
2020-08-27: 8885.555862120835
2020-08-28: 9011.815578265485
2020-08-29: 9047.021309369566
2020-08-30: 9149.544105074649
2020-08-31: 9394.93849336612

Predicting for Mali__nan
2020-08-01: 65.2865096224901
2020-08-02: 97.43158352506906
2020-08-03: 151.49042636748362
2020-08-04: 146.16749417593388
2020-08-05: 104.12942885726498
2020-08-06: 113.51161305999119
2020-08-07: 152.07042889714944
2020-08-08: 184.46003218757104
2020-08-09: 214.99789356046716
2020-08-10: 215.78417318070495
2020-08-11: 205.6353659905929
2020-08-12: 219.73129412816184
2020-08-13: 248.4160007114823
2020-08-14: 276.24069980308525
2020-08-15: 298.3741957286399
2020-08-16: 306.09560194618814
2020-08-17: 311.31019102655904
2020-08-18: 328.54371426943635
2020-08-19: 353.4320596482171
2020-08-20: 378.147467811216
2020-08-21: 398.2733286638637
2020-08-22: 411.65082069907265
2020-08-23: 425.1118451479644
202

2020-08-01: 55.24914535551454
2020-08-02: 68.49896571064811
2020-08-03: 75.47267555114274
2020-08-04: 84.5685736261158
2020-08-05: 90.25332122570545
2020-08-06: 97.86261774202575
2020-08-07: 128.16627566865137
2020-08-08: 144.1128825029786
2020-08-09: 155.41360566263853
2020-08-10: 166.72208478216336
2020-08-11: 177.0663016644081
2020-08-12: 191.77241074349593
2020-08-13: 214.06063964919997
2020-08-14: 231.020064534421
2020-08-15: 245.29399692884687
2020-08-16: 259.2204808527165
2020-08-17: 273.52824418964866
2020-08-18: 285.20357912322066
2020-08-19: 304.0366898952324
2020-08-20: 321.52485772637624
2020-08-21: 337.64654227762816
2020-08-22: 353.63659941359094
2020-08-23: 359.73575433766496
2020-08-24: 373.4095998444721
2020-08-25: 391.13299710717723
2020-08-26: 408.64752918158734
2020-08-27: 416.9961990145083
2020-08-28: 430.3076391133926
2020-08-29: 441.84014047075146
2020-08-30: 456.7630814073219
2020-08-31: 473.94201123078767

Predicting for Nigeria__nan
2020-08-01: 219.84303197924

2020-08-01: 1036.3913233586118
2020-08-02: 1270.934976192928
2020-08-03: 1287.6438020834712
2020-08-04: 1324.6113963256944
2020-08-05: 1114.515643815375
2020-08-06: 775.7824964533041
2020-08-07: 1192.9940584892083
2020-08-08: 1370.2145120597188
2020-08-09: 1414.0517924362275
2020-08-10: 1413.1415628194359
2020-08-11: 1286.405471336739
2020-08-12: 1211.0025168977454
2020-08-13: 1410.4952443858983
2020-08-14: 1533.1582567211526
2020-08-15: 1579.3647737891442
2020-08-16: 1578.2349217642854
2020-08-17: 1528.6027462201232
2020-08-18: 1537.1579412450315
2020-08-19: 1653.514050578031
2020-08-20: 1742.3635872882378
2020-08-21: 1787.4674889893377
2020-08-22: 1800.1837767182801
2020-08-23: 1798.6378652253297
2020-08-24: 1836.263700961621
2020-08-25: 1919.146849655585
2020-08-26: 1990.4858073911341
2020-08-27: 2041.3241862286477
2020-08-28: 2076.1882728250057
2020-08-29: 2104.4886787761648
2020-08-30: 2155.3157324502085
2020-08-31: 2226.287012362866

Predicting for Peru__nan
2020-08-01: 1716.6506

2020-08-07: 304.67046697319745
2020-08-08: 345.21571202508846
2020-08-09: 356.57951852364
2020-08-10: 352.0665804822601
2020-08-11: 342.2168235605899
2020-08-12: 351.99083627178516
2020-08-13: 406.94216770197727
2020-08-14: 439.99711628788225
2020-08-15: 454.89875039968314
2020-08-16: 460.4400353768192
2020-08-17: 465.8658024992443
2020-08-18: 492.40793522707656
2020-08-19: 531.6257576983527
2020-08-20: 560.8866195157278
2020-08-21: 579.8522994382859
2020-08-22: 593.811670758727
2020-08-23: 620.3408529580369
2020-08-24: 650.6190958533019
2020-08-25: 684.9294675452219
2020-08-26: 714.3681751447401
2020-08-27: 733.9227129155162
2020-08-28: 756.1978074188389
2020-08-29: 784.8424642840954
2020-08-30: 816.7123607578193
2020-08-31: 850.0053897507761

Predicting for Kosovo__nan
2020-08-01: 717.2070415759687
2020-08-02: 839.0067765735116
2020-08-03: 793.2683818956614
2020-08-04: 784.5473545663469
2020-08-05: 663.9233731880569
2020-08-06: 510.62485893891073
2020-08-07: 805.610961442272
2020-08-

2020-08-02: 62.73408796070588
2020-08-03: 68.26150172108501
2020-08-04: 73.70368734194052
2020-08-05: 79.13028668114353
2020-08-06: 92.79266386720126
2020-08-07: 121.93294517594441
2020-08-08: 136.84661714200402
2020-08-09: 146.81283894105988
2020-08-10: 156.36428099423188
2020-08-11: 167.27502294910767
2020-08-12: 184.28013088657602
2020-08-13: 206.21604331843199
2020-08-14: 222.4017622234672
2020-08-15: 235.69613415202804
2020-08-16: 248.80975145428994
2020-08-17: 263.5441542289473
2020-08-18: 287.6300028049993
2020-08-19: 309.00714662412054
2020-08-20: 327.17917747059596
2020-08-21: 343.8701935356219
2020-08-22: 360.67807620520625
2020-08-23: 389.5116382228964
2020-08-24: 414.73343509037466
2020-08-25: 437.76830347915114
2020-08-26: 459.1957784942875
2020-08-27: 475.6454846235316
2020-08-28: 497.43377771715194
2020-08-29: 524.4978318779819
2020-08-30: 550.6095486279992
2020-08-31: 575.5290267821161

Predicting for Sierra Leone__nan
2020-08-01: 53.39791332349361
2020-08-02: 64.180256

2020-08-15: 2016.0321214870642
2020-08-16: 1965.275217597437
2020-08-17: 1896.4933644323442
2020-08-18: 1968.87866656476
2020-08-19: 2150.2358875887817
2020-08-20: 2245.819618020974
2020-08-21: 2267.256909551847
2020-08-22: 2256.932786214699
2020-08-23: 2268.0683637698257
2020-08-24: 2348.4225548933196
2020-08-25: 2469.266945495507
2020-08-26: 2549.2898721258266
2020-08-27: 2579.538327531516
2020-08-28: 2601.9351344120246
2020-08-29: 2643.8591835995803
2020-08-30: 2723.410659382547
2020-08-31: 2818.4776741036544

Predicting for Sweden__nan
2020-08-01: 3461.0603099234418
2020-08-02: 4414.89682356241
2020-08-03: 4034.2467138388024
2020-08-04: 1488.0536445653875
2020-08-05: 1126.3012397141474
2020-08-06: 1687.75951105974
2020-08-07: 3333.2925415896443
2020-08-08: 3919.102388902747
2020-08-09: 3443.492187210772
2020-08-10: 2309.0932426975983
2020-08-11: 2112.0562520355693
2020-08-12: 2611.08867549919
2020-08-13: 3464.461628821658
2020-08-14: 3769.1404666612093
2020-08-15: 3454.568767643484

2020-08-30: 505.94496248939714
2020-08-31: 524.9825297776462

Predicting for Turkmenistan__nan
2020-08-01: 50.665504549088524
2020-08-02: 62.73408796070588
2020-08-03: 68.26150172108501
2020-08-04: 73.70368734194052
2020-08-05: 79.13028668114353
2020-08-06: 92.79266386720126
2020-08-07: 121.93294517594441
2020-08-08: 136.84661714200402
2020-08-09: 146.81283894105988
2020-08-10: 156.36428099423188
2020-08-11: 167.27502294910767
2020-08-12: 184.28013088657602
2020-08-13: 206.21604331843199
2020-08-14: 222.4017622234672
2020-08-15: 235.69613415202804
2020-08-16: 248.80975145428994
2020-08-17: 263.5441542289473
2020-08-18: 287.6300028049993
2020-08-19: 309.00714662412054
2020-08-20: 327.17917747059596
2020-08-21: 343.8701935356219
2020-08-22: 360.67807620520625
2020-08-23: 389.5116382228964
2020-08-24: 414.73343509037466
2020-08-25: 437.76830347915114
2020-08-26: 459.1957784942875
2020-08-27: 471.3293893665977
2020-08-28: 492.08958343930874
2020-08-29: 518.6827680306633
2020-08-30: 544.330

2020-08-29: 595.2597716664676
2020-08-30: 615.7754171932974
2020-08-31: 643.7544069237201

Predicting for United States__nan
2020-08-01: 147742.0705655284
2020-08-02: 181917.29183032503
2020-08-03: 184972.04342935357
2020-08-04: 175356.41521473837
2020-08-05: 142259.83857451865
2020-08-06: 98881.19195839163
2020-08-07: 158550.4523024374
2020-08-08: 183487.6455277328
2020-08-09: 186889.62383044552
2020-08-10: 179141.6493102429
2020-08-11: 158765.2098086586
2020-08-12: 149049.10676585254
2020-08-13: 176938.1030746614
2020-08-14: 192975.02165931155
2020-08-15: 196217.54783014738
2020-08-16: 191353.2688968477
2020-08-17: 182050.20582035143
2020-08-18: 182514.59417102873
2020-08-19: 197727.3670226023
2020-08-20: 208230.28999415872
2020-08-21: 211396.28899552813
2020-08-22: 209623.7858018689
2020-08-23: 207009.01258387906
2020-08-24: 210628.6876880077
2020-08-25: 220387.9858299409
2020-08-26: 227938.16491386152
2020-08-27: 231356.66556891188
2020-08-28: 232111.00372983338
2020-08-29: 233202.

2020-08-24: 606.7037152554138
2020-08-25: 637.8973988529749
2020-08-26: 665.6639993031029
2020-08-27: 685.8285791700367
2020-08-28: 708.8490928907612
2020-08-29: 736.88430635657
2020-08-30: 767.0449099414005
2020-08-31: 798.2511783280048

Predicting for United States__Delaware
2020-08-01: 454.35254308877666
2020-08-02: 617.101787737048
2020-08-03: 573.7736751851318
2020-08-04: 553.0736893485685
2020-08-05: 459.72595907585503
2020-08-06: 364.2953409842948
2020-08-07: 567.3520030930312
2020-08-08: 665.715747211486
2020-08-09: 664.2829118371657
2020-08-10: 648.0346045405787
2020-08-11: 601.1621657452459
2020-08-12: 596.8558979541534
2020-08-13: 702.6981013602832
2020-08-14: 766.649280078061
2020-08-15: 780.2943136414922
2020-08-16: 776.6240521537443
2020-08-17: 765.5951046573185
2020-08-18: 794.2880375335443
2020-08-19: 861.5121591248325
2020-08-20: 909.2437716726441
2020-08-21: 930.7292710750202
2020-08-22: 940.694289977505
2020-08-23: 962.5610675754878
2020-08-24: 999.791881370356
2020-

2020-08-11: 2716.8041801144586
2020-08-12: 2737.0592404979684
2020-08-13: 3330.480262658197
2020-08-14: 3584.096546117677
2020-08-15: 3532.368307950752
2020-08-16: 3376.236006115408
2020-08-17: 3276.2201504581194
2020-08-18: 3398.7915071096354
2020-08-19: 3724.901294722051
2020-08-20: 3893.77442168047
2020-08-21: 3897.2493447773772
2020-08-22: 3846.278605992495
2020-08-23: 3857.4768146202664
2020-08-24: 3992.2586803943304
2020-08-25: 4201.805658421723
2020-08-26: 4331.8865579594285
2020-08-27: 4370.139959430704
2020-08-28: 4382.767590535283
2020-08-29: 4443.2902400624325
2020-08-30: 4573.456217722328
2020-08-31: 4731.276789514471

Predicting for United States__Louisiana
2020-08-01: 2274.82689309633
2020-08-02: 2954.1143496139857
2020-08-03: 1691.252675475434
2020-08-04: 2506.68508017522
2020-08-05: 1504.4870448483534
2020-08-06: 1326.6538252082296
2020-08-07: 2359.738193887125
2020-08-08: 2603.1097566316034
2020-08-09: 2265.575041355621
2020-08-10: 2398.5261711450244
2020-08-11: 2012.2

2020-08-23: 1680.9441692175833
2020-08-24: 1738.795579640072
2020-08-25: 1830.1067242601298
2020-08-26: 1894.0316409454463
2020-08-27: 1924.847616918897
2020-08-28: 1942.1150835847475
2020-08-29: 1971.4107717086604
2020-08-30: 2030.4594501861452
2020-08-31: 2103.1589125815676

Predicting for United States__North Carolina
2020-08-01: 3383.950862515594
2020-08-02: 3658.565087244215
2020-08-03: 3727.5591709182713
2020-08-04: 3876.7758356079385
2020-08-05: 2691.0955208766186
2020-08-06: 2105.3417131225506
2020-08-07: 3463.2686034853136
2020-08-08: 3822.4773449425606
2020-08-09: 3931.3278297225474
2020-08-10: 3843.491004752325
2020-08-11: 3285.3836221221113
2020-08-12: 3212.2977935780186
2020-08-13: 3840.524987480312
2020-08-14: 4125.655259116956
2020-08-15: 4217.485689482688
2020-08-16: 4133.727795335238
2020-08-17: 3914.17119870196
2020-08-18: 3990.017211606039
2020-08-19: 4334.29550954343
2020-08-20: 4546.131831031637
2020-08-21: 4629.332004057025
2020-08-22: 4604.136817173224
2020-08-23

2020-08-03: 3512.51700112543
2020-08-04: 3504.9548054199013
2020-08-05: 3007.107937628843
2020-08-06: 1951.8504659617147
2020-08-07: 2963.8608408716477
2020-08-08: 3446.2157106729533
2020-08-09: 3648.050803916119
2020-08-10: 3617.547503516923
2020-08-11: 3266.423852768978
2020-08-12: 2978.6105550161296
2020-08-13: 3450.084997216842
2020-08-14: 3778.264910684809
2020-08-15: 3923.082738752321
2020-08-16: 3900.8375558538282
2020-08-17: 3736.889122050851
2020-08-18: 3701.443368555543
2020-08-19: 3969.684268583883
2020-08-20: 4195.3251570817765
2020-08-21: 4310.147654830093
2020-08-22: 4320.522063664702
2020-08-23: 4280.683498036534
2020-08-24: 4335.461287872225
2020-08-25: 4520.815189311342
2020-08-26: 4690.117656008223
2020-08-27: 4790.22992935572
2020-08-28: 4835.0198741306285
2020-08-29: 4867.981649349241
2020-08-30: 4959.377304735962
2020-08-31: 5110.192059143707

Predicting for United States__Oregon
2020-08-01: 1116.7775371506145
2020-08-02: 1383.6910016623397
2020-08-03: 1491.4656330

2020-08-02: 2487.059006854932
2020-08-03: 2430.432300074013
2020-08-04: 2528.6577126514926
2020-08-05: 2493.673338829085
2020-08-06: 1511.9124368304106
2020-08-07: 2243.3010478921274
2020-08-08: 2609.108705733744
2020-08-09: 2661.7617643585145
2020-08-10: 2721.1397804800113
2020-08-11: 2578.5175421151257
2020-08-12: 2299.8257895476186
2020-08-13: 2633.1236298875187
2020-08-14: 2864.6800360790317
2020-08-15: 2946.019870551379
2020-08-16: 2979.192133922683
2020-08-17: 2904.0185183319604
2020-08-18: 2856.7737869069274
2020-08-19: 3045.194559709539
2020-08-20: 3204.2128576176688
2020-08-21: 3287.4441614722814
2020-08-22: 3324.155445483789
2020-08-23: 3316.142931735247
2020-08-24: 3351.3419552756814
2020-08-25: 3484.0762401540137
2020-08-26: 3607.3442674039698
2020-08-27: 3686.0237522113885
2020-08-28: 3736.9573787630347
2020-08-29: 3773.739705849103
2020-08-30: 3842.168330753045
2020-08-31: 3953.7192400474864

Predicting for United States__Virgin Islands
2020-08-01: 57.724942568090476
2020

2020-08-14: 583.1256116522733
2020-08-15: 618.4128265924805
2020-08-16: 679.6894994464486
2020-08-17: 692.5174123329551
2020-08-18: 688.0422653671362
2020-08-19: 717.1579729642252
2020-08-20: 748.2963115713338
2020-08-21: 786.5268416254665
2020-08-22: 828.7724412329679
2020-08-23: 868.8825142027883
2020-08-24: 889.4933287646949
2020-08-25: 921.2381162592787
2020-08-26: 957.1683541031672
2020-08-27: 997.1988612644753
2020-08-28: 1039.3451463113513
2020-08-29: 1077.9964959944655
2020-08-30: 1110.2944043558705
2020-08-31: 1147.0686590794758

Predicting for United States Virgin Islands__nan
2020-08-01: 59.77380571350194
2020-08-02: 71.80606497862429
2020-08-03: 79.34636237680664
2020-08-04: 84.23796552884426
2020-08-05: 89.11291566951827
2020-08-06: 99.01411080811803
2020-08-07: 131.29505383258646
2020-08-08: 147.11936034139086
2020-08-09: 157.94058666578508
2020-08-10: 167.37696298531756
2020-08-11: 177.4955479372481
2020-08-12: 193.4453986419376
2020-08-13: 216.7059665502021
2020-08-14: 

In [37]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,58.823002
214,Aruba,,2020-08-02,71.315828
215,Aruba,,2020-08-03,78.993986
216,Aruba,,2020-08-04,90.229279
217,Aruba,,2020-08-05,87.81074


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [38]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [39]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,58.82300181119936
Aruba,,2020-08-02,71.31582751123675
Aruba,,2020-08-03,78.99398601057783
Aruba,,2020-08-04,90.2292787574965
Afghanistan,,2020-08-01,149.81410233721954
Afghanistan,,2020-08-02,287.19834412144843
Afghanistan,,2020-08-03,277.60566758591983
Afghanistan,,2020-08-04,264.77630587676333
Angola,,2020-08-01,198.681737901836


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [40]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [41]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [42]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 94.9 ms, sys: 44.4 ms, total: 139 ms
Wall time: 2.79 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [43]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-01
End date: 2021-05-30


In [44]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [45]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-01 to 2021-05-30...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 16.3 s, sys: 2.58 s, total: 18.8 s
Wall time: 5min 6s
