# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x159e684f130>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)



  exec(code_obj, self.user_global_ns, self.user_ns)
  exec(code_obj, self.user_global_ns, self.user_ns)


In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'H8_Protection of elderly people', 'H8_Flag', 'M1_Wildcard',
       'V1_Vaccine Prioritisation (summary)',
       'V2A_Vaccine Availability (summary)',
       'V2B_Vaccin

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
278160,Zimbabwe,,Zimbabwe__nan,2020-07-27,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
278161,Zimbabwe,,Zimbabwe__nan,2020-07-28,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
278162,Zimbabwe,,Zimbabwe__nan,2020-07-29,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
278163,Zimbabwe,,Zimbabwe__nan,2020-07-30,213.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [15]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [16]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [17]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 108.28933283934148
Test MAE: 103.9558566178491


In [18]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -14 NewCases 0.014488997654703597
Day -7 NewCases 0.27836558996612504
Day -6 NewCases 0.20392892827062528
Day -4 NewCases 0.05995666752775503
Day -3 NewCases 0.08850082587035356
Day -2 NewCases 0.07236673346189793
Day -1 NewCases 0.3334575998689063
Day -29 C6_Stay at home requirements 0.40839306656377594
Day -27 C2_Workplace closing 1.0541986338394314
Day -27 C6_Stay at home requirements 11.449402519976598
Day -23 C2_Workplace closing 2.351786060098264
Intercept 20.50703031349377


In [19]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [20]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [21]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)






Predicting for Aruba__nan
2020-08-01: 30.8065571422404
2020-08-02: 48.11366982137272
2020-08-03: 63.687398028030124
2020-08-04: 47.95231207096397
2020-08-05: 47.21110018191554
2020-08-06: 48.24120528188183
2020-08-07: 54.35459149655343
2020-08-08: 68.09967005746564
2020-08-09: 80.62974673309924
2020-08-10: 88.76339948741736
2020-08-11: 88.20255176870627
2020-08-12: 90.54091029916556
2020-08-13: 94.28460574517251
2020-08-14: 100.6449955396939
2020-08-15: 110.03791311076475
2020-08-16: 119.49924580676924
2020-08-17: 126.49669679908612
2020-08-18: 130.82011711161985
2020-08-19: 135.57230787112775
2020-08-20: 141.01049262798344
2020-08-21: 147.74454540732955
2020-08-22: 155.80665772142314
2020-08-23: 163.99080150061423
2020-08-24: 171.17268836324246
2020-08-25: 177.44153232534245
2020-08-26: 183.72506659519487
2020-08-27: 190.34162558678568
2020-08-28: 197.59886835937252
2020-08-29: 205.47891912349732
2020-08-30: 213.47393933139364
2020-08-31: 221.12814677597686

Predicting for Afghanista

2020-08-02: 63.36909900841405
2020-08-03: 67.3755359213788
2020-08-04: 63.902354113818994
2020-08-05: 56.222430669629574
2020-08-06: 53.873246266882816
2020-08-07: 63.56234903946944
2020-08-08: 82.74169264623188
2020-08-09: 93.02739382561387
2020-08-10: 98.83840078208632
2020-08-11: 101.16438602908028
2020-08-12: 101.22429667131303
2020-08-13: 103.86552980748003
2020-08-14: 111.91320063084503
2020-08-15: 123.12445442130573
2020-08-16: 131.89365403525278
2020-08-17: 138.64968647069202
2020-08-18: 143.6212207513444
2020-08-19: 147.66022597416293
2020-08-20: 152.8328670726568
2020-08-21: 160.3619462469796
2020-08-22: 169.08944420246513
2020-08-23: 177.21232127007175
2020-08-24: 184.50768320555278
2020-08-25: 190.99328771434386
2020-08-26: 197.10609563485545
2020-08-27: 203.76001726000544
2020-08-28: 211.42479743618182
2020-08-29: 219.64040755463236
2020-08-30: 227.7659431621322
2020-08-31: 235.59855621743208

Predicting for Burundi__nan
2020-08-01: 27.101230590624574
2020-08-02: 36.872377

2020-08-01: 75.28145211823649
2020-08-02: 100.70373090220824
2020-08-03: 92.92075048682784
2020-08-04: 65.44223148685109
2020-08-05: 62.47959928745529
2020-08-06: 63.873906754653625
2020-08-07: 75.47684199446431
2020-08-08: 102.83685661861438
2020-08-09: 118.20623053982914
2020-08-10: 118.73844246616721
2020-08-11: 113.240214889617
2020-08-12: 113.90556590996655
2020-08-13: 117.4524425010095
2020-08-14: 127.03799308135223
2020-08-15: 142.06137170363803
2020-08-16: 152.8736817867824
2020-08-17: 157.54144388071896
2020-08-18: 159.99172603191164
2020-08-19: 163.86982802173495
2020-08-20: 169.3640019689499
2020-08-21: 177.87354089901794
2020-08-22: 188.38218637804226
2020-08-23: 197.4052922355619
2020-08-24: 204.06384436788542
2020-08-25: 209.77065688988603
2020-08-26: 215.7937016697425
2020-08-27: 222.6615186402996
2020-08-28: 230.94245935215082
2020-08-29: 240.05900064838494
2020-08-30: 248.69342754560333
2020-08-31: 256.46194578791886

Predicting for Belarus__nan
2020-08-01: 686.8190626

2020-08-01: 11593.834798745109
2020-08-02: 3886.5593555796418
2020-08-03: 2155.5177374551886
2020-08-04: 2046.6023611826204
2020-08-05: 1898.0413303013243
2020-08-06: 1225.3195113943073
2020-08-07: 3241.13614928759
2020-08-08: 6102.947678846652
2020-08-09: 4033.8305829865462
2020-08-10: 3184.9615761123955
2020-08-11: 3065.6846561229377
2020-08-12: 2774.4053755365303
2020-08-13: 2693.2821512595606
2020-08-14: 3728.44074859745
2020-08-15: 4577.481791854647
2020-08-16: 4050.1177755718395
2020-08-17: 3736.7540815964467
2020-08-18: 3637.121522147791
2020-08-19: 3485.6767002613246
2020-08-20: 3547.38406016678
2020-08-21: 4019.8992008301525
2020-08-22: 4332.8157103545345
2020-08-23: 4227.052150760234
2020-08-24: 4140.1000898453285
2020-08-25: 4099.128844269617
2020-08-26: 4054.782046961432
2020-08-27: 4135.354073045914
2020-08-28: 4360.515944555669
2020-08-29: 4512.886081722027
2020-08-30: 4529.647143511161
2020-08-31: 4533.920608051009

Predicting for Central African Republic__nan
2020-08-01

2020-08-07: 45.936038050835826
2020-08-08: 55.7452358869887
2020-08-09: 62.504179238835434
2020-08-10: 67.70423115261141
2020-08-11: 72.77116374024503
2020-08-12: 77.3804785081319
2020-08-13: 81.90250017257645
2020-08-14: 88.0725206190078
2020-08-15: 95.5808523032965
2020-08-16: 102.25273346756829
2020-08-17: 108.36714154892874
2020-08-18: 114.3241057500179
2020-08-19: 120.04646602062044
2020-08-20: 125.8753398310487
2020-08-21: 132.4572271612467
2020-08-22: 139.53019360113967
2020-08-23: 146.42604863646656
2020-08-24: 153.16153537447167
2020-08-25: 159.82573994683915
2020-08-26: 166.4181236778339
2020-08-27: 173.1385341644492
2020-08-28: 180.2141511665213
2020-08-29: 187.52682148947264
2020-08-30: 194.8571437291255
2020-08-31: 202.18236429910692

Predicting for Colombia__nan
2020-08-01: 206.62298422422379
2020-08-02: 268.28695383289227
2020-08-03: 299.7336434690138
2020-08-04: 260.07024338365227
2020-08-05: 170.15234388364684
2020-08-06: 142.96706648824207
2020-08-07: 167.978908237807

2020-08-01: 21.032800828181593
2020-08-02: 28.10628677142161
2020-08-03: 31.605289261280042
2020-08-04: 35.21980021074072
2020-08-05: 38.28699814143959
2020-08-06: 40.37756119551754
2020-08-07: 46.04306570528577
2020-08-08: 55.869018460445275
2020-08-09: 62.63600757157387
2020-08-10: 67.91245588623647
2020-08-11: 73.01705713582629
2020-08-12: 77.55471970005756
2020-08-13: 82.04723831630841
2020-08-14: 88.22267581234755
2020-08-15: 95.73460921422298
2020-08-16: 102.41827655690714
2020-08-17: 108.56695262212878
2020-08-18: 114.53469816371347
2020-08-19: 120.23527269076295
2020-08-20: 126.05313528031931
2020-08-21: 132.63549958009466
2020-08-22: 139.71019547893832
2020-08-23: 146.61476670421956
2020-08-24: 153.3655116554247
2020-08-25: 160.03472080521925
2020-08-26: 166.6214045438046
2020-08-27: 173.3387541010296
2020-08-28: 180.41485961020092
2020-08-29: 187.72957801425974
2020-08-30: 195.0657145547928
2020-08-31: 202.39864631905442

Predicting for Dominica__nan
2020-08-01: 30.9880174481

2020-08-02: 815.3940262357398
2020-08-03: 839.1344103420242
2020-08-04: 703.9119899078271
2020-08-05: 441.3524254881586
2020-08-06: 350.0737546901696
2020-08-07: 451.9562033422005
2020-08-08: 668.0600743394411
2020-08-09: 748.3937838319534
2020-08-10: 771.1657540620606
2020-08-11: 717.5931536463063
2020-08-12: 616.1362852146385
2020-08-13: 580.6276644806146
2020-08-14: 630.499453681357
2020-08-15: 719.5057038396594
2020-08-16: 771.7906925964307
2020-08-17: 793.7092313122414
2020-08-18: 778.107105102976
2020-08-19: 745.1668525068831
2020-08-20: 737.0916284101118
2020-08-21: 765.4581803555097
2020-08-22: 809.0519815787852
2020-08-23: 843.1399296125448
2020-08-24: 862.9374407974144
2020-08-25: 865.7279773874745
2020-08-26: 861.4355218637909
2020-08-27: 867.0244195165646
2020-08-28: 887.5203078848781
2020-08-29: 914.9228375654875
2020-08-30: 940.0646546459802
2020-08-31: 958.9780055865151

Predicting for Ethiopia__nan
2020-08-01: 40.21755980685768
2020-08-02: 54.55604422322851
2020-08-03: 

2020-08-27: 11686.767017086544
2020-08-28: 12142.574008247053
2020-08-29: 12549.840486914962
2020-08-30: 12686.516616135827
2020-08-31: 12720.08145489787

Predicting for United Kingdom__Northern Ireland
2020-08-01: 627.1348019133313
2020-08-02: 520.6034866928296
2020-08-03: 260.23816116574653
2020-08-04: 217.4718278389484
2020-08-05: 214.91827990816608
2020-08-06: 187.22179819453754
2020-08-07: 282.4137875416933
2020-08-08: 460.7753794116108
2020-08-09: 435.786758102643
2020-08-10: 352.1774403805037
2020-08-11: 331.5560578089733
2020-08-12: 320.75268719143753
2020-08-13: 318.4628438252992
2020-08-14: 372.950790606669
2020-08-15: 442.4023871367569
2020-08-16: 443.10441456903806
2020-08-17: 421.79774694639445
2020-08-18: 415.5940750717281
2020-08-19: 412.6984856020677
2020-08-20: 419.51338409512783
2020-08-21: 450.459808176797
2020-08-22: 482.70446589728715
2020-08-23: 491.61401225371765
2020-08-24: 496.36179709270476
2020-08-25: 500.6826936919345
2020-08-26: 505.61608807774144
2020-08-2

2020-08-09: 62.146286096270885
2020-08-10: 67.52209071500958
2020-08-11: 72.60137133391648
2020-08-12: 77.19147970344167
2020-08-13: 81.7067042703208
2020-08-14: 87.85754539233804
2020-08-15: 95.29846534303627
2020-08-16: 101.9678265629541
2020-08-17: 108.13572251179329
2020-08-18: 114.10145925494011
2020-08-19: 119.81920293133182
2020-08-20: 125.64488065429587
2020-08-21: 132.2106634763691
2020-08-22: 139.2563942650979
2020-08-23: 146.15119656588558
2020-08-24: 152.9019690548435
2020-08-25: 159.5694980711475
2020-08-26: 166.1601552592221
2020-08-27: 172.87724794027056
2020-08-28: 179.94252938206756
2020-08-29: 187.24278677766998
2020-08-30: 194.57061238560422
2020-08-31: 201.89869648453214

Predicting for Guatemala__nan
2020-08-01: 789.5331842408004
2020-08-02: 864.5520955099464
2020-08-03: 841.4382535941761
2020-08-04: 688.7298740645479
2020-08-05: 439.8824581853604
2020-08-06: 345.39139084796847
2020-08-07: 460.4558331723186
2020-08-08: 697.5949472217685
2020-08-09: 774.160455667319

2020-08-02: 2250.350976692754
2020-08-03: 1721.2113301357338
2020-08-04: 922.0355139932435
2020-08-05: 894.3481952116192
2020-08-06: 757.8339619855335
2020-08-07: 958.7409440539107
2020-08-08: 1562.1535649522443
2020-08-09: 1757.1934272943893
2020-08-10: 1559.3995326844438
2020-08-11: 1302.4442255483534
2020-08-12: 1220.34022819101
2020-08-13: 1171.5259076900004
2020-08-14: 1293.685863448668
2020-08-15: 1542.9280493211638
2020-08-16: 1645.2306687086448
2020-08-17: 1590.1392727899652
2020-08-18: 1508.7042252877718
2020-08-19: 1468.349161390854
2020-08-20: 1459.60312769021
2020-08-21: 1530.9999274554882
2020-08-22: 1644.7065361975765
2020-08-23: 1704.6647676441628
2020-08-24: 1703.8728179885036
2020-08-25: 1687.6702558625998
2020-08-26: 1680.1272648373997
2020-08-27: 1691.3822856612476
2020-08-28: 1737.940527937957
2020-08-29: 1800.1316987399907
2020-08-30: 1842.79398066881
2020-08-31: 1861.993029351998

Predicting for India__nan
2020-08-01: 727.5421823285167
2020-08-02: 916.093973510087

2020-08-13: 103.2517933805689
2020-08-14: 102.76136616555561
2020-08-15: 107.16877585412183
2020-08-16: 113.3463689352967
2020-08-17: 118.35559325951519
2020-08-18: 124.13132921011515
2020-08-19: 133.52532186554578
2020-08-20: 142.2295164485331
2020-08-21: 147.0239081694224
2020-08-22: 152.9568706206785
2020-08-23: 159.44727426771234
2020-08-24: 165.62778021508365
2020-08-25: 172.5297480001251
2020-08-26: 180.78103859570385
2020-08-27: 188.5636246528598
2020-08-28: 195.2746890121462
2020-08-29: 202.25878237819722
2020-08-30: 209.4134205014218
2020-08-31: 216.5656873611944

Predicting for Japan__nan
2020-08-01: 28969.151603218772
2020-08-02: 34770.27335162231
2020-08-03: 38007.1047721925
2020-08-04: 32013.51073994356
2020-08-05: 18886.411682968337
2020-08-06: 14514.020629919552
2020-08-07: 17889.519338645234
2020-08-08: 26558.54215806573
2020-08-09: 30764.67704437903
2020-08-10: 32474.981796998636
2020-08-11: 29967.41181615155
2020-08-12: 24895.693411378976
2020-08-13: 22897.81354952872

2020-08-01: 223.1906297563192
2020-08-02: 278.44852699973643
2020-08-03: 293.74173403482723
2020-08-04: 245.51516445205374
2020-08-05: 165.7726238760353
2020-08-06: 140.61921116523203
2020-08-07: 169.01597471033185
2020-08-08: 240.62042592465139
2020-08-09: 278.27770904307204
2020-08-10: 291.0382367630147
2020-08-11: 275.5892604545466
2020-08-12: 247.3421753832082
2020-08-13: 238.98117767414132
2020-08-14: 256.05372255786847
2020-08-15: 288.5609080178369
2020-08-16: 312.08796819176524
2020-08-17: 323.91850268362947
2020-08-18: 322.70697062714993
2020-08-19: 316.46684467212145
2020-08-20: 317.54569281186383
2020-08-21: 329.84896838818037
2020-08-22: 348.28897287393534
2020-08-23: 364.5569506618695
2020-08-24: 375.7006431038596
2020-08-25: 381.12984026561185
2020-08-26: 384.3657106813703
2020-08-27: 390.487394772756
2020-08-28: 401.34413630350633
2020-08-29: 414.94086986807
2020-08-30: 428.1381070445213
2020-08-31: 439.2312471076983

Predicting for Liberia__nan
2020-08-01: 20.50703031349

2020-08-01: 69.61559624609295
2020-08-02: 78.8869160691963
2020-08-03: 99.6710221664334
2020-08-04: 98.25632580549798
2020-08-05: 72.37863044527806
2020-08-06: 65.85407033108774
2020-08-07: 77.97815063031709
2020-08-08: 100.55902717071457
2020-08-09: 113.0044948159864
2020-08-10: 125.32974248235837
2020-08-11: 127.80035196742664
2020-08-12: 121.80017843900542
2020-08-13: 122.47138595311388
2020-08-14: 131.19844471527387
2020-08-15: 143.6066211877336
2020-08-16: 154.18761930024067
2020-08-17: 163.6623624553126
2020-08-18: 168.65250524094418
2020-08-19: 170.77423739011942
2020-08-20: 175.18779589291353
2020-08-21: 182.95814388789222
2020-08-22: 192.2945429081381
2020-08-23: 201.54584873147775
2020-08-24: 210.0923735638579
2020-08-25: 216.76149045846338
2020-08-26: 222.38610281027078
2020-08-27: 228.87825340295416
2020-08-28: 236.74619436526623
2020-08-29: 245.40259881158943
2020-08-30: 254.24175045995855
2020-08-31: 262.7775992738027

Predicting for Monaco__nan
2020-08-01: 43.65447852028

2020-08-01: 20.710959241764396
2020-08-02: 27.89555159147202
2020-08-03: 31.5861640513604
2020-08-04: 34.9058127090367
2020-08-05: 38.142965807304975
2020-08-06: 40.24901570026974
2020-08-07: 45.895210436574956
2020-08-08: 55.646275332575485
2020-08-09: 62.453969694773235
2020-08-10: 67.75997636987664
2020-08-11: 72.7931956624547
2020-08-12: 77.37326371708133
2020-08-13: 81.88018675947498
2020-08-14: 88.03830414900486
2020-08-15: 95.52776914079763
2020-08-16: 102.22547668380773
2020-08-17: 108.37298681360456
2020-08-18: 114.32283784645827
2020-08-19: 120.03446056553008
2020-08-20: 125.85615292350799
2020-08-21: 132.42925716611543
2020-08-22: 139.49657097612078
2020-08-23: 146.4032721371038
2020-08-24: 153.15005771807085
2020-08-25: 159.81312977660096
2020-08-26: 166.40169727450586
2020-08-27: 173.118394548564
2020-08-28: 180.18930428844985
2020-08-29: 187.50009556611553
2020-08-30: 194.83459069109688
2020-08-31: 202.1638214048494

Predicting for Mauritius__nan
2020-08-01: 1730.50684947

2020-08-10: 1004.1446552068378
2020-08-11: 897.9455461936047
2020-08-12: 785.397134726215
2020-08-13: 748.553545693806
2020-08-14: 818.6012484579389
2020-08-15: 947.5322290055815
2020-08-16: 1016.0253600054818
2020-08-17: 1023.8619021355626
2020-08-18: 990.3638740084571
2020-08-19: 950.8356723727311
2020-08-20: 942.7838072130951
2020-08-21: 982.5041636970814
2020-08-22: 1044.0088427468163
2020-08-23: 1086.6387068567335
2020-08-24: 1103.405895679964
2020-08-25: 1100.9823805338865
2020-08-26: 1094.572008377156
2020-08-27: 1101.623834694196
2020-08-28: 1128.916528502661
2020-08-29: 1165.4975026320126
2020-08-30: 1196.1889363995554
2020-08-31: 1216.19543099605

Predicting for Nepal__nan
2020-08-01: 33.452656936679375
2020-08-02: 42.06935794130432
2020-08-03: 43.6232436427767
2020-08-04: 43.958110097514776
2020-08-05: 44.26828556315847
2020-08-06: 45.05009796615441
2020-08-07: 52.49534684675622
2020-08-08: 66.0789453372015
2020-08-09: 73.93606226613372
2020-08-10: 78.62978909763656
2020-08-

2020-08-12: 2058.0821252242226
2020-08-13: 1934.6414240103093
2020-08-14: 2132.49841936882
2020-08-15: 2492.2216936082555
2020-08-16: 2666.929836745266
2020-08-17: 2670.3955116148095
2020-08-18: 2566.2767081796887
2020-08-19: 2444.0366551581146
2020-08-20: 2407.94401009673
2020-08-21: 2512.4041999692813
2020-08-22: 2676.1679677791435
2020-08-23: 2781.440054912845
2020-08-24: 2813.935850589228
2020-08-25: 2794.8392023937517
2020-08-26: 2763.9212973151252
2020-08-27: 2770.884165228878
2020-08-28: 2836.5678707189277
2020-08-29: 2927.3595014124435
2020-08-30: 2999.612766188709
2020-08-31: 3041.7085592111694

Predicting for Puerto Rico__nan
2020-08-01: 263.7279325270547
2020-08-02: 372.36566005928904
2020-08-03: 423.7048804362363
2020-08-04: 341.2318485274565
2020-08-05: 217.4316214802414
2020-08-06: 179.97776770547586
2020-08-07: 206.1923705604729
2020-08-08: 297.206804579122
2020-08-09: 360.73054552973315
2020-08-10: 387.5821118653969
2020-08-11: 360.1068267541799
2020-08-12: 315.60807547

2020-08-03: 33.343422282848415
2020-08-04: 35.848886271476445
2020-08-05: 38.92151894094709
2020-08-06: 40.8840567866698
2020-08-07: 46.789079190738846
2020-08-08: 57.19714032159598
2020-08-09: 64.1773654981082
2020-08-10: 69.28901709978746
2020-08-11: 74.03991739788582
2020-08-12: 78.49137653658386
2020-08-13: 82.94096087648357
2020-08-14: 89.24004214055921
2020-08-15: 96.99110928105995
2020-08-16: 103.78602046587291
2020-08-17: 109.87455029196144
2020-08-18: 115.72676024309992
2020-08-19: 121.3759659321158
2020-08-20: 127.18109323616818
2020-08-21: 133.82831713163594
2020-08-22: 141.00993263962755
2020-08-23: 147.97243418710968
2020-08-24: 154.71251899821291
2020-08-25: 161.34795541324715
2020-08-26: 167.91599795079696
2020-08-27: 174.63627692118195
2020-08-28: 181.7500323564034
2020-08-29: 189.11888396096145
2020-08-30: 196.4905015310986
2020-08-31: 203.83196373211433

Predicting for Saudi Arabia__nan
2020-08-01: 82.04454173891327
2020-08-02: 98.89393625474241
2020-08-03: 105.524513

2020-08-06: 40.07058712274599
2020-08-07: 45.732591943264964
2020-08-08: 55.3810104095047
2020-08-09: 62.0627406290825
2020-08-10: 67.39089972136173
2020-08-11: 72.50901450158432
2020-08-12: 77.11684610152324
2020-08-13: 81.63516572397452
2020-08-14: 87.79140245970912
2020-08-15: 95.23139652134235
2020-08-16: 101.87957964802585
2020-08-17: 108.03205885346776
2020-08-18: 114.00834032299669
2020-08-19: 119.73269069517667
2020-08-20: 125.56071399553606
2020-08-21: 132.12910445156712
2020-08-22: 139.1726579422653
2020-08-23: 146.0578206748937
2020-08-24: 152.80276117136384
2020-08-25: 159.4724567872032
2020-08-26: 166.06500494853475
2020-08-27: 172.78302043046983
2020-08-28: 179.84894854835255
2020-08-29: 187.14720033102012
2020-08-30: 194.47041950961165
2020-08-31: 201.7955300125767

Predicting for Serbia__nan
2020-08-01: 1061.221122699794
2020-08-02: 1276.834052499144
2020-08-03: 1319.775777718259
2020-08-04: 1089.2721724273865
2020-08-05: 674.3136206578706
2020-08-06: 543.3789446703925


2020-08-05: 40.70102438776455
2020-08-06: 42.1001455630107
2020-08-07: 48.201272859335965
2020-08-08: 59.12831289594767
2020-08-09: 66.27252575522911
2020-08-10: 71.57180939653135
2020-08-11: 76.38503561708238
2020-08-12: 80.4831311005108
2020-08-13: 84.72898735953459
2020-08-14: 91.11175152678061
2020-08-15: 99.05548257239306
2020-08-16: 105.96734155407852
2020-08-17: 112.172963324217
2020-08-18: 118.04352366176823
2020-08-19: 123.57650811424467
2020-08-20: 129.31377418967088
2020-08-21: 136.00402870630265
2020-08-22: 143.27089628374358
2020-08-23: 150.31077444864567
2020-08-24: 157.1216251499569
2020-08-25: 163.77869361721596
2020-08-26: 170.3197192019513
2020-08-27: 177.03038983073293
2020-08-28: 184.17571959303038
2020-08-29: 191.59561261588453
2020-08-30: 199.0217297294082
2020-08-31: 206.4130839228549

Predicting for Chad__nan
2020-08-01: 20.50703031349377
2020-08-02: 27.34525542227031
2020-08-03: 31.10954035119789
2020-08-04: 34.67451890151264
2020-08-05: 37.970418864712926
2020

2020-08-01: 140.2475617883709
2020-08-02: 183.74029047441468
2020-08-03: 241.69958884133953
2020-08-04: 240.76436987412524
2020-08-05: 145.89395935947354
2020-08-06: 120.82698225323662
2020-08-07: 137.14640129242193
2020-08-08: 181.20232172155755
2020-08-09: 214.19514491927958
2020-08-10: 244.2115853488433
2020-08-11: 244.32730838087596
2020-08-12: 214.72533250202122
2020-08-13: 205.84754043468337
2020-08-14: 216.08194357034898
2020-08-15: 237.26325526939593
2020-08-16: 258.44188046400853
2020-08-17: 276.62926219132396
2020-08-18: 280.69675876808935
2020-08-19: 275.08835724691676
2020-08-20: 275.64456633399044
2020-08-21: 284.2794482740032
2020-08-22: 297.8000004824294
2020-08-23: 312.72874016738064
2020-08-24: 325.80995229310474
2020-08-25: 332.95683728340754
2020-08-26: 336.54185620067767
2020-08-27: 342.0943816658975
2020-08-28: 350.93133266954777
2020-08-29: 362.1406341660087
2020-08-30: 374.35445314367047
2020-08-31: 385.7157456827271

Predicting for Tanzania__nan
2020-08-01: 25.5

2020-08-01: 2088.9581497624463
2020-08-02: 3606.7049434776536
2020-08-03: 1374.3612818377899
2020-08-04: 924.6782214908158
2020-08-05: 872.7493826991383
2020-08-06: 716.326004801748
2020-08-07: 912.7653079428703
2020-08-08: 1826.3989395327687
2020-08-09: 2242.526544335665
2020-08-10: 1595.3376061215592
2020-08-11: 1366.5104256799564
2020-08-12: 1288.622924840234
2020-08-13: 1210.2809097856566
2020-08-14: 1360.464288850112
2020-08-15: 1753.7134299968554
2020-08-16: 1889.9561066803449
2020-08-17: 1713.275611007389
2020-08-18: 1621.9303786748212
2020-08-19: 1575.9105237057004
2020-08-20: 1553.0393806682962
2020-08-21: 1648.2521848041115
2020-08-22: 1819.2854449045965
2020-08-23: 1886.348993577748
2020-08-24: 1855.660843550719
2020-08-25: 1832.998418189655
2020-08-26: 1820.8074792199307
2020-08-27: 1828.3223011522966
2020-08-28: 1903.2167588785555
2020-08-29: 1993.2807942839602
2020-08-30: 2043.459734132933
2020-08-31: 2058.064735675064

Predicting for United States__California
2020-08-01:

2020-08-01: 58.16072239976771
2020-08-02: 41.045796026283625
2020-08-03: 157.70138747529822
2020-08-04: 247.38771821412942
2020-08-05: 121.53238928314818
2020-08-06: 95.35324239415942
2020-08-07: 106.7567571039392
2020-08-08: 113.15491397451953
2020-08-09: 125.27616404792334
2020-08-10: 179.98324662581584
2020-08-11: 208.12898290852485
2020-08-12: 174.08117636128404
2020-08-13: 165.3709961193934
2020-08-14: 170.25268242082564
2020-08-15: 175.02004689955606
2020-08-16: 188.43329886431133
2020-08-17: 215.81936989746427
2020-08-18: 228.82767880190875
2020-08-19: 223.54264489002492
2020-08-20: 224.1411757780979
2020-08-21: 229.24784596212606
2020-08-22: 235.46134585978166
2020-08-23: 247.13315979505376
2020-08-24: 265.38324315793994
2020-08-25: 276.1206997886046
2020-08-26: 280.58495873442445
2020-08-27: 286.24739138745696
2020-08-28: 305.7660899153633
2020-08-29: 317.9023121588053
2020-08-30: 331.70401316500494
2020-08-31: 346.9181455128319

Predicting for United States__Idaho
2020-08-01:

2020-08-01: 158.9989882853174
2020-08-02: 186.7557739032966
2020-08-03: 212.23130511342222
2020-08-04: 180.60105565389546
2020-08-05: 122.14940085941723
2020-08-06: 104.2880844298322
2020-08-07: 131.8327603673838
2020-08-08: 180.21426479953794
2020-08-09: 204.94562265955645
2020-08-10: 219.42615317134087
2020-08-11: 210.61554151321275
2020-08-12: 190.82998816834152
2020-08-13: 187.00427892045283
2020-08-14: 201.9190874916564
2020-08-15: 225.15131824978621
2020-08-16: 242.6922899539179
2020-08-17: 253.91655476560163
2020-08-18: 254.93335941034826
2020-08-19: 251.8953057388523
2020-08-20: 254.718703941793
2020-08-21: 265.4918792261531
2020-08-22: 279.82585514962517
2020-08-23: 292.98305717749776
2020-08-24: 307.7756311295945
2020-08-25: 315.1108029276823
2020-08-26: 320.0945127718912
2020-08-27: 327.3126764204448
2020-08-28: 351.3122590042911
2020-08-29: 367.72814353931267
2020-08-30: 383.2182266921299
2020-08-31: 397.9044621452669

Predicting for United States__Michigan
2020-08-01: 602.

2020-08-01: 88.91532767584883
2020-08-02: 98.51396257820791
2020-08-03: 61.095781435917026
2020-08-04: 55.87809655724823
2020-08-05: 57.61094219505091
2020-08-06: 60.92870321187516
2020-08-07: 72.77725760715612
2020-08-08: 104.35763575542481
2020-08-09: 111.59028629695594
2020-08-10: 103.76573355494452
2020-08-11: 104.08622307975754
2020-08-12: 107.319357250645
2020-08-13: 111.50164293767237
2020-08-14: 122.42784122859396
2020-08-15: 138.2334052103587
2020-08-16: 145.41531058637298
2020-08-17: 147.5168209424927
2020-08-18: 151.4641642961494
2020-08-19: 156.29375841354818
2020-08-20: 162.24691149737868
2020-08-21: 171.49324332480657
2020-08-22: 181.99329724085692
2020-08-23: 189.51272012077717
2020-08-24: 197.58360445647958
2020-08-25: 204.39104301945918
2020-08-26: 211.14537987348098
2020-08-27: 218.65876102625202
2020-08-28: 240.11634780211472
2020-08-29: 253.5064660906729
2020-08-30: 264.84571328196955
2020-08-31: 275.61070091123946

Predicting for United States__New Jersey
2020-08-0

2020-08-02: 204.85114896902073
2020-08-03: 169.76223922777305
2020-08-04: 104.42358829764285
2020-08-05: 94.19979788508098
2020-08-06: 92.896110420817
2020-08-07: 108.87706849748828
2020-08-08: 162.07138040849938
2020-08-09: 192.33243881730283
2020-08-10: 183.6910215611398
2020-08-11: 164.82802261897564
2020-08-12: 160.66833242331836
2020-08-13: 161.86185607512255
2020-08-14: 175.0677286909396
2020-08-15: 201.08094255010582
2020-08-16: 218.15285796236412
2020-08-17: 220.20781204650285
2020-08-18: 218.17667932679
2020-08-19: 219.6559841358766
2020-08-20: 224.21412965712537
2020-08-21: 234.99703706396946
2020-08-22: 250.42505063451125
2020-08-23: 262.4518074203457
2020-08-24: 271.19068561763146
2020-08-25: 276.44993684552986
2020-08-26: 282.1065239401841
2020-08-27: 289.35290350340586
2020-08-28: 312.0107643284651
2020-08-29: 328.03067395864167
2020-08-30: 341.77846395893334
2020-08-31: 353.49667638829476

Predicting for United States__South Carolina
2020-08-01: 20.50703031349377
2020-08

ValueError: Length of values (0) does not match length of index (31)

In [22]:
# Check the predictions
preds_df.head()

NameError: name 'preds_df' is not defined

# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [None]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

In [None]:
!head predictions/2020-08-01_2020-08-04.csv

# Test cases
We can generate a prediction file. Let's validate a few cases...

In [None]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [None]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [None]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [None]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

In [None]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

### Check it

In [None]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")