# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7fcd4a7abb38>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'M1_Wildcard', 'ConfirmedCases',
       'ConfirmedDeaths', 'StringencyIndex', 'StringencyIndexForDisplay',
       'StringencyLegacyIndex', 'StringencyLegacyIndexForDisplay',
       'GovernmentResponseIndex', 'Gove

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87064,Zimbabwe,,Zimbabwe__nan,2020-07-27,78.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87065,Zimbabwe,,Zimbabwe__nan,2020-07-28,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87066,Zimbabwe,,Zimbabwe__nan,2020-07-29,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
87067,Zimbabwe,,Zimbabwe__nan,2020-07-30,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
df['CountryName'].unique()

array(['Aruba', 'Afghanistan', 'Angola', 'Albania', 'Andorra',
       'United Arab Emirates', 'Argentina', 'Australia', 'Austria',
       'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso',
       'Bangladesh', 'Bulgaria', 'Bahrain', 'Bahamas',
       'Bosnia and Herzegovina', 'Belarus', 'Belize', 'Bermuda',
       'Bolivia', 'Brazil', 'Barbados', 'Brunei', 'Bhutan', 'Botswana',
       'Central African Republic', 'Canada', 'Switzerland', 'Chile',
       'China', "Cote d'Ivoire", 'Cameroon',
       'Democratic Republic of Congo', 'Congo', 'Colombia', 'Comoros',
       'Cape Verde', 'Costa Rica', 'Cuba', 'Cyprus', 'Czech Republic',
       'Germany', 'Djibouti', 'Dominica', 'Denmark', 'Dominican Republic',
       'Algeria', 'Ecuador', 'Egypt', 'Eritrea', 'Spain', 'Estonia',
       'Ethiopia', 'Finland', 'Fiji', 'France', 'Faeroe Islands', 'Gabon',
       'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia', 'Greece',
       'Greenland', 'Guatemala', 'Guam', 'Guyana', 'Hong Ko

In [14]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [15]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [16]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [17]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [18]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 140.71095231544
Test MAE: 152.49522615642294


In [19]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.0011341938503356519
Day -6 NewCases 0.4396262751216602
Day -5 NewCases 0.2169845306198639
Day -4 NewCases 0.059135106380260934
Day -3 NewCases 0.0692819249543897
Day -2 NewCases 0.05212276342957982
Day -1 NewCases 0.2382854984072622
Day -26 C6_Stay at home requirements 4.3169695377140425
Day -22 C2_Workplace closing 9.718898082370853
Day -17 C2_Workplace closing 5.761062600416451
Intercept 26.55463592733912


In [20]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [21]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [22]:
list_countries = sorted(list(set(df.CountryName)))
hist_ips_df = pd.read_csv("data/2020-09-30_historical_ip.csv",
                              parse_dates=['Date'],
                              encoding="ISO-8859-1",
                              dtype={"RegionName": str},
                              error_bad_lines=True)
hist_ips_df = hist_ips_df[hist_ips_df.CountryName.isin(list_countries)]
hist_ips_df.to_csv("data/2020-09-30_historical_ip_new.csv" , index = False) 

In [23]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="data/2020-09-30_historical_ip_new.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 58.83198688213354
2020-08-02: 71.32197184134634
2020-08-03: 78.9927614946973
2020-08-04: 90.2397026313416
2020-08-05: 87.81590149273288
2020-08-06: 98.76130759495216
2020-08-07: 131.0422080417786
2020-08-08: 147.02404189275614
2020-08-09: 158.95669693079518
2020-08-10: 169.94405725377925
2020-08-11: 177.52250888643076
2020-08-12: 193.48643543499207
2020-08-13: 216.82403871908235
2020-08-14: 234.04377199174866
2020-08-15: 248.5656916034551
2020-08-16: 261.97277012020345
2020-08-17: 275.3055663763646
2020-08-18: 293.296445555375
2020-08-19: 314.0803223521228
2020-08-20: 332.4348524108193
2020-08-21: 349.2395394944447
2020-08-22: 365.5079663044676
2020-08-23: 382.54149341918657
2020-08-24: 407.8931769631052
2020-08-25: 430.0829974725612
2020-08-26: 450.573124179063
2020-08-27: 465.9975388732862
2020-08-28: 484.6445827649952
2020-08-29: 515.3503177117129
2020-08-30: 541.8987368026883
2020-08-31: 566.2593577211017

Predicting for Afghanistan__nan
2020-

2020-08-09: 153.01795709370958
2020-08-10: 162.67827315786826
2020-08-11: 172.8220597431478
2020-08-12: 189.41066560168275
2020-08-13: 212.05385088007313
2020-08-14: 228.47491162321015
2020-08-15: 242.2367474053045
2020-08-16: 255.3820111962556
2020-08-17: 269.77823434854713
2020-08-18: 282.274771032547
2020-08-19: 301.2638922939456
2020-08-20: 318.4457401752333
2020-08-21: 334.16780530412547
2020-08-22: 349.78266739094914
2020-08-23: 355.9727795705515
2020-08-24: 369.9974681835229
2020-08-25: 387.7640328903412
2020-08-26: 405.0673766120246
2020-08-27: 413.13983023148165
2020-08-28: 426.2512243238305
2020-08-29: 437.8278613618101
2020-08-30: 452.8800304838348
2020-08-31: 470.0370117388652

Predicting for Belgium__nan
2020-08-01: 2433.1124185607378
2020-08-02: 2510.012027152879
2020-08-03: 2060.7967211251243
2020-08-04: 1079.545251177831
2020-08-05: 735.0282988353044
2020-08-06: 1101.2409526446131
2020-08-07: 2162.3423285820923
2020-08-08: 2291.4741439288696
2020-08-09: 1972.23476759376

2020-08-31: 2496.092432181765

Predicting for Belize__nan
2020-08-01: 90.22869087017813
2020-08-02: 115.08568856356143
2020-08-03: 115.60263255812761
2020-08-04: 130.0875570687213
2020-08-05: 133.15504855026578
2020-08-06: 123.65377939448149
2020-08-07: 167.56145496443642
2020-08-08: 189.74016916648264
2020-08-09: 200.23521410177878
2020-08-10: 213.40633463201615
2020-08-11: 220.5282716134019
2020-08-12: 230.3042565916966
2020-08-13: 258.64272273387394
2020-08-14: 279.25462445851883
2020-08-15: 294.2386583388187
2020-08-16: 308.77255722013433
2020-08-17: 321.3877809930056
2020-08-18: 343.6242914558824
2020-08-19: 368.4153746834938
2020-08-20: 389.56020777469314
2020-08-21: 407.94186960187153
2020-08-22: 425.60039960063534
2020-08-23: 453.80359728034324
2020-08-24: 484.898747081099
2020-08-25: 511.60085120014804
2020-08-26: 535.895029768342
2020-08-27: 554.5176007677144
2020-08-28: 577.813817013023
2020-08-29: 616.4827661037602
2020-08-30: 649.1343017828367
2020-08-31: 678.6911515770278


Predicting for Canada__nan
2020-08-01: 3992.4769427906185
2020-08-02: 4753.207440203256
2020-08-03: 4947.2783400808585
2020-08-04: 4993.928926767341
2020-08-05: 4176.273113532416
2020-08-06: 2801.6904187451914
2020-08-07: 4361.066721297862
2020-08-08: 4988.182091718657
2020-08-09: 5171.611632161874
2020-08-10: 5118.067990643904
2020-08-11: 4592.858565928219
2020-08-12: 4247.841304547654
2020-08-13: 4965.448933159265
2020-08-14: 5396.173744741371
2020-08-15: 5550.984712680012
2020-08-16: 5502.345626689613
2020-08-17: 5265.282060397853
2020-08-18: 5251.651080484551
2020-08-19: 5650.380034387745
2020-08-20: 5949.14381571373
2020-08-21: 6084.148152822817
2020-08-22: 6086.062201277233
2020-08-23: 6040.31954691102
2020-08-24: 6136.796891497732
2020-08-25: 6405.621298354796
2020-08-26: 6632.741467219263
2020-08-27: 6760.640579493345
2020-08-28: 6818.354223150807
2020-08-29: 6871.231874537928
2020-08-30: 7009.824751031588
2020-08-31: 7224.740652830176

Predicting for Switzerland__nan
2020-08-

2020-08-24: 9349.548353078793
2020-08-25: 9755.225692925214
2020-08-26: 10096.65991656533
2020-08-27: 10287.70432144363
2020-08-28: 10373.664615355166
2020-08-29: 10455.092861333746
2020-08-30: 10660.93665791952
2020-08-31: 10984.072084179235

Predicting for Comoros__nan
2020-08-01: 51.176258242195956
2020-08-02: 63.53926258805763
2020-08-03: 69.34554085767859
2020-08-04: 75.80850918698383
2020-08-05: 79.78005797468789
2020-08-06: 93.29656553697077
2020-08-07: 122.69649162672526
2020-08-08: 137.8169481712178
2020-08-09: 148.09262539096113
2020-08-10: 157.87511461123816
2020-08-11: 168.2140792239981
2020-08-12: 185.1197089767316
2020-08-13: 207.19358924592393
2020-08-14: 223.53979306148537
2020-08-15: 237.02531442911095
2020-08-16: 250.17545434487712
2020-08-17: 264.67431822994973
2020-08-18: 277.18805962556405
2020-08-19: 295.90446371455295
2020-08-20: 312.93611257530307
2020-08-21: 328.4949323938483
2020-08-22: 344.0692914640022
2020-08-23: 350.26357257890015
2020-08-24: 364.215456125

2020-08-07: 123.54575456312378
2020-08-08: 138.83262509537263
2020-08-09: 148.36176082430762
2020-08-10: 158.31062786188448
2020-08-11: 169.57012768299037
2020-08-12: 186.0449288460255
2020-08-13: 208.12537963704446
2020-08-14: 224.43563370296042
2020-08-15: 237.6456094572676
2020-08-16: 250.97523441026726
2020-08-17: 265.81179968887415
2020-08-18: 278.2072156615832
2020-08-19: 296.90376648911626
2020-08-20: 313.88294148906084
2020-08-21: 329.3577625464051
2020-08-22: 345.05286798319105
2020-08-23: 351.3897246340885
2020-08-24: 365.3170090575027
2020-08-25: 382.8862378745369
2020-08-26: 400.00931398403094
2020-08-27: 407.9389428590118
2020-08-28: 421.05431396498255
2020-08-29: 432.64376401572537
2020-08-30: 447.59316442839173
2020-08-31: 464.5921750180028

Predicting for Denmark__nan
2020-08-01: 1092.9244843269234
2020-08-02: 1289.635733280033
2020-08-03: 1295.503698651091
2020-08-04: 1247.814351473752
2020-08-05: 944.6346475159082
2020-08-06: 744.9303130423514
2020-08-07: 1200.7816484

2020-08-13: 206.22710592989824
2020-08-14: 222.41375305767554
2020-08-15: 235.70865236223483
2020-08-16: 248.82617483838965
2020-08-17: 263.5598535165909
2020-08-18: 287.6421221559115
2020-08-19: 309.018047027592
2020-08-20: 327.18907883963544
2020-08-21: 343.88233338372123
2020-08-22: 360.6911578862052
2020-08-23: 389.5261151502965
2020-08-24: 414.7464303353874
2020-08-25: 437.7782934645083
2020-08-26: 459.2076234167528
2020-08-27: 479.9735342697934
2020-08-28: 502.7910773763712
2020-08-29: 530.3252881458034
2020-08-30: 556.8993083311427
2020-08-31: 582.2793588105169

Predicting for France__nan
2020-08-01: 20447.173500274293
2020-08-02: 21187.198160549447
2020-08-03: 20906.49141031119
2020-08-04: 18295.112637271377
2020-08-05: 13981.311749877837
2020-08-06: 11488.796321323854
2020-08-07: 19607.263883048894
2020-08-08: 21246.183910272615
2020-08-09: 20942.89287976307
2020-08-10: 19286.73880205696
2020-08-11: 17029.667597003478
2020-08-12: 17142.335969771924
2020-08-13: 20840.7466818060


Predicting for Ghana__nan
2020-08-01: 105.35122147920472
2020-08-02: 166.63700672511294
2020-08-03: 148.79150535090554
2020-08-04: 208.9370952519472
2020-08-05: 126.26113738681485
2020-08-06: 134.66735689476482
2020-08-07: 195.08607675418094
2020-08-08: 230.94003572225103
2020-08-09: 243.6008736098869
2020-08-10: 261.65154048186287
2020-08-11: 238.2152821669092
2020-08-12: 253.2807591872667
2020-08-13: 291.9993117823657
2020-08-14: 320.0324296320132
2020-08-15: 337.91171048651603
2020-08-16: 350.07124362237477
2020-08-17: 351.1186691254156
2020-08-18: 369.8964215495792
2020-08-19: 399.4467445394683
2020-08-20: 424.5060661577431
2020-08-21: 443.9109092704129
2020-08-22: 458.59189468366
2020-08-23: 471.13396302618656
2020-08-24: 492.38240439643124
2020-08-25: 518.7138426662729
2020-08-26: 543.0936731021129
2020-08-27: 555.6002186212947
2020-08-28: 572.1295033573149
2020-08-29: 590.1073188565047
2020-08-30: 612.6300180374672
2020-08-31: 637.7087986662921

Predicting for Guinea__nan
2020-

2020-08-15: 289.77509132708707
2020-08-16: 305.12756530728836
2020-08-17: 316.2340586911032
2020-08-18: 336.4379896817336
2020-08-19: 359.86403445981256
2020-08-20: 381.99270122369387
2020-08-21: 402.1277723727663
2020-08-22: 420.07735437699694
2020-08-23: 447.3253508639289
2020-08-24: 471.43517782649195
2020-08-25: 496.15791035525825
2020-08-26: 520.3788058750467
2020-08-27: 539.1754161825995
2020-08-28: 561.8817502057311
2020-08-29: 588.642876274004
2020-08-30: 614.9324204364991
2020-08-31: 641.4280471687082

Predicting for Honduras__nan
2020-08-01: 116.56093288609515
2020-08-02: 134.44426141524025
2020-08-03: 294.2644819518454
2020-08-04: 552.1967492082563
2020-08-05: 214.86837324020632
2020-08-06: 184.28007838785845
2020-08-07: 241.85352828763178
2020-08-08: 288.53289220803896
2020-08-09: 406.8383930149243
2020-08-10: 480.0219227216415
2020-08-11: 355.6224954471924
2020-08-12: 339.41379980799866
2020-08-13: 376.53857309843426
2020-08-14: 426.5063481271167
2020-08-15: 499.8114910154

2020-08-01: 1967.9971733389168
2020-08-02: 2345.7095623228734
2020-08-03: 2324.8237223480637
2020-08-04: 2034.503275542834
2020-08-05: 1713.936067749558
2020-08-06: 1293.9285551821838
2020-08-07: 2100.9269870613894
2020-08-08: 2395.7058914043087
2020-08-09: 2388.206102156125
2020-08-10: 2235.6426077648116
2020-08-11: 2034.6486376460227
2020-08-12: 1985.8110927278415
2020-08-13: 2370.9478663302457
2020-08-14: 2566.1082250678032
2020-08-15: 2581.350456219601
2020-08-16: 2508.2577504702754
2020-08-17: 2428.7992165223222
2020-08-18: 2486.2839965992284
2020-08-19: 2703.2851680938124
2020-08-20: 2837.704050570463
2020-08-21: 2871.3909375997528
2020-08-22: 2855.500486325971
2020-08-23: 2862.7923076583743
2020-08-24: 2946.2519517017636
2020-08-25: 3092.0418347791647
2020-08-26: 3197.3468322787626
2020-08-27: 3247.7660207283566
2020-08-28: 3274.9395283032104
2020-08-29: 3321.256706797624
2020-08-30: 3411.76360984816
2020-08-31: 3527.6457267905075

Predicting for Iceland__nan
2020-08-01: 59.4474

2020-08-07: 427.27521610130634
2020-08-08: 488.6033790448657
2020-08-09: 502.95996443414873
2020-08-10: 492.1342138456594
2020-08-11: 462.50573948405014
2020-08-12: 466.41221413646036
2020-08-13: 543.9317207738525
2020-08-14: 590.1579956579917
2020-08-15: 607.7642123877229
2020-08-16: 608.7987482836552
2020-08-17: 605.5596276816798
2020-08-18: 633.0604841823315
2020-08-19: 684.6719569107629
2020-08-20: 722.4708917915459
2020-08-21: 743.8987922693987
2020-08-22: 755.9488823720471
2020-08-23: 779.8712877465568
2020-08-24: 813.2048607580684
2020-08-25: 855.4193992597652
2020-08-26: 890.9113884885496
2020-08-27: 913.0538743229304
2020-08-28: 935.5885685498388
2020-08-29: 964.8311563039854
2020-08-30: 1000.4480570451526
2020-08-31: 1039.6275373977733

Predicting for Kuwait__nan
2020-08-01: 430.28501272766704
2020-08-02: 519.8854304894571
2020-08-03: 522.6627065551525
2020-08-04: 489.8213487533082
2020-08-05: 398.1349250619433
2020-08-06: 331.7541176099412
2020-08-07: 517.2877976364778
2020-


Predicting for Luxembourg__nan
2020-08-01: 536.585849961027
2020-08-02: 786.7885670523367
2020-08-03: 708.059601224785
2020-08-04: 650.398279255413
2020-08-05: 580.4115743897105
2020-08-06: 435.53376386128554
2020-08-07: 678.25278190025
2020-08-08: 813.8004634143169
2020-08-09: 797.7354528945962
2020-08-10: 768.5964312382275
2020-08-11: 722.2902542479007
2020-08-12: 705.5347706858167
2020-08-13: 832.1128507376973
2020-08-14: 912.8496632758056
2020-08-15: 921.5549210314643
2020-08-16: 912.7399947679837
2020-08-17: 900.1459356364808
2020-08-18: 916.3487443548281
2020-08-19: 992.6039580759004
2020-08-20: 1047.7516870966977
2020-08-21: 1067.250963008317
2020-08-22: 1074.4149567744287
2020-08-23: 1073.7186733661308
2020-08-24: 1102.1933785676047
2020-08-25: 1156.099999496153
2020-08-26: 1199.3666696495332
2020-08-27: 1215.9733383578323
2020-08-28: 1230.6247674123035
2020-08-29: 1247.0475393283011
2020-08-30: 1279.6479835095113
2020-08-31: 1323.3885740973362

Predicting for Latvia__nan
2020

2020-08-15: 1871.2661886942012
2020-08-16: 1855.157408674477
2020-08-17: 1800.228263469763
2020-08-18: 1828.012546419676
2020-08-19: 1973.2024872091959
2020-08-20: 2071.304434586924
2020-08-21: 2113.2253549255333
2020-08-22: 2121.0696540025697
2020-08-23: 2132.0874769770594
2020-08-24: 2187.4837337866134
2020-08-25: 2289.42803467648
2020-08-26: 2370.2235571259002
2020-08-27: 2419.5219728176503
2020-08-28: 2451.705920798327
2020-08-29: 2490.443367155062
2020-08-30: 2556.031278787336
2020-08-31: 2641.235664556258

Predicting for Mongolia__nan
2020-08-01: 80.73691645663186
2020-08-02: 89.79462728100367
2020-08-03: 116.76810180465105
2020-08-04: 108.98172350871792
2020-08-05: 107.82117352706697
2020-08-06: 112.99607617808292
2020-08-07: 152.64747214712713
2020-08-08: 171.75110727291138
2020-08-09: 188.83886248454934
2020-08-10: 193.31517973372974
2020-08-11: 199.54464267164326
2020-08-12: 214.45336587794952
2020-08-13: 241.23406749566539
2020-08-14: 261.24011172659266
2020-08-15: 277.30961

2020-08-12: 4854.8608029401785
2020-08-13: 5685.559638667002
2020-08-14: 6244.432154790977
2020-08-15: 6439.535838139589
2020-08-16: 6358.759102298045
2020-08-17: 6048.869215893498
2020-08-18: 6018.616047206127
2020-08-19: 6487.184949020319
2020-08-20: 6859.986433047747
2020-08-21: 7021.700630566972
2020-08-22: 7007.808951781306
2020-08-23: 6933.292108873218
2020-08-24: 7036.081919606531
2020-08-25: 7352.14480415089
2020-08-26: 7626.345793882395
2020-08-27: 7775.058289366819
2020-08-28: 7830.66963129996
2020-08-29: 7878.888379440448
2020-08-30: 8033.480159059929
2020-08-31: 8284.040928037437

Predicting for Norway__nan
2020-08-01: 498.12897245083485
2020-08-02: 603.3842107688477
2020-08-03: 515.7026029716387
2020-08-04: 240.2742839062932
2020-08-05: 206.062787341444
2020-08-06: 291.79040913390924
2020-08-07: 527.9969383119519
2020-08-08: 597.9045587727574
2020-08-09: 532.5990988479875
2020-08-10: 413.5075487689799
2020-08-11: 403.7861969781137
2020-08-12: 483.7743411670156
2020-08-13: 


Predicting for Papua New Guinea__nan
2020-08-01: 51.54778823579783
2020-08-02: 62.95389448481427
2020-08-03: 68.35634898107924
2020-08-04: 73.80952331713175
2020-08-05: 79.22904747750482
2020-08-06: 93.03854239728444
2020-08-07: 122.44561723268052
2020-08-08: 137.1157731053379
2020-08-09: 146.99272119837963
2020-08-10: 156.54434752224694
2020-08-11: 167.4757394768707
2020-08-12: 184.58824795494544
2020-08-13: 206.60849666027786
2020-08-14: 222.6956223960935
2020-08-15: 235.94008379303034
2020-08-16: 249.05439878288772
2020-08-17: 263.8161147249175
2020-08-18: 282.1996475827654
2020-08-19: 302.23214216502396
2020-08-20: 319.733173798013
2020-08-21: 341.5416722376764
2020-08-22: 359.1090084882122
2020-08-23: 377.32235264339545
2020-08-24: 403.3302774200036
2020-08-25: 425.60877991233536
2020-08-26: 456.74136969341924
2020-08-27: 472.9852982504504
2020-08-28: 493.25999800800713
2020-08-29: 520.0403529990626
2020-08-30: 559.6843598380226
2020-08-31: 589.4710780195564

Predicting for Polan

2020-08-06: 242.39408489792086
2020-08-07: 369.4815898241797
2020-08-08: 417.57579940388536
2020-08-09: 419.98044148961986
2020-08-10: 413.1181851261385
2020-08-11: 399.4730651602151
2020-08-12: 408.23689477602005
2020-08-13: 475.54054465439566
2020-08-14: 512.4930532038284
2020-08-15: 524.2293102303565
2020-08-16: 528.1582525384704
2020-08-17: 532.1414171450949
2020-08-18: 554.73482027823
2020-08-19: 598.9088591475318
2020-08-20: 629.9890376533729
2020-08-21: 647.5523237509674
2020-08-22: 660.358773232836
2020-08-23: 675.7493214947137
2020-08-24: 702.661175858582
2020-08-25: 737.9915520673023
2020-08-26: 767.1613125716208
2020-08-27: 784.9465520012823
2020-08-28: 803.733599757853
2020-08-29: 825.867569825165
2020-08-30: 854.5929185281811
2020-08-31: 886.8368428471651

Predicting for Sudan__nan
2020-08-01: 95.81422157795203
2020-08-02: 199.71131832906136
2020-08-03: 333.86318747092577
2020-08-04: 214.86702681155344
2020-08-05: 274.616991966388
2020-08-06: 183.38654984436266
2020-08-07:


Predicting for Serbia__nan
2020-08-01: 4879.731472908307
2020-08-02: 5950.011500484702
2020-08-03: 6027.41513178083
2020-08-04: 5764.455477354759
2020-08-05: 4641.693984695596
2020-08-06: 3291.1132767855865
2020-08-07: 5268.954051080902
2020-08-08: 6069.364776302269
2020-08-09: 6181.400518059223
2020-08-10: 5947.840453616783
2020-08-11: 5283.491089044169
2020-08-12: 5002.243555842664
2020-08-13: 5932.698074673025
2020-08-14: 6458.348356100218
2020-08-15: 6572.812071125261
2020-08-16: 6428.6119311890125
2020-08-17: 6138.3011815215505
2020-08-18: 6183.880875148669
2020-08-19: 6699.179661973694
2020-08-20: 7052.684898133458
2020-08-21: 7169.397404441277
2020-08-22: 7127.772935780966
2020-08-23: 7070.718780788816
2020-08-24: 7215.4652851987175
2020-08-25: 7554.296191258438
2020-08-26: 7817.48442864722
2020-08-27: 7942.875577599416
2020-08-28: 7987.960321055498
2020-08-29: 8049.788120662359
2020-08-30: 8228.212455921968
2020-08-31: 8490.720515971478

Predicting for South Sudan__nan
2020-08

2020-08-15: 321.00542183313195
2020-08-16: 332.2191520130828
2020-08-17: 343.03023226810876
2020-08-18: 367.2100852696827
2020-08-19: 394.9734334944364
2020-08-20: 417.55100997629523
2020-08-21: 435.7949839577944
2020-08-22: 451.96526762525616
2020-08-23: 479.70645020115086
2020-08-24: 506.4252942629584
2020-08-25: 533.5690289964678
2020-08-26: 558.2236863182975
2020-08-27: 576.2428203694622
2020-08-28: 598.4311607170287
2020-08-29: 625.982466113365
2020-08-30: 654.0779566524523
2020-08-31: 682.1246517206998

Predicting for Chad__nan
2020-08-01: 56.36804396208744
2020-08-02: 68.20719203956283
2020-08-03: 74.55278075397833
2020-08-04: 80.93086599617712
2020-08-05: 85.86251604966907
2020-08-06: 96.78646952364426
2020-08-07: 127.80325098442242
2020-08-08: 143.1279572620756
2020-08-09: 153.6318203245981
2020-08-10: 163.609658843031
2020-08-11: 173.97627510380295
2020-08-12: 190.13873012047134
2020-08-13: 212.81618255483548
2020-08-14: 229.4220804793673
2020-08-15: 243.09402556987004
2020-0

2020-08-30: 2248.828408015204
2020-08-31: 2317.225708128838

Predicting for Turkey__nan
2020-08-01: 3892.2252339751426
2020-08-02: 4830.969344857722
2020-08-03: 5209.430757793483
2020-08-04: 5556.860935018924
2020-08-05: 4862.6895068191625
2020-08-06: 2996.992263147622
2020-08-07: 4470.68550723244
2020-08-08: 5196.283114759212
2020-08-09: 5518.519741936925
2020-08-10: 5627.440531223801
2020-08-11: 5097.999058223493
2020-08-12: 4541.523333157141
2020-08-13: 5211.130689704973
2020-08-14: 5702.027268432146
2020-08-15: 5950.161085741976
2020-08-16: 5981.728407906892
2020-08-17: 5722.411956065525
2020-08-18: 5608.5389905636575
2020-08-19: 5985.011689005571
2020-08-20: 6323.096792748718
2020-08-21: 6516.548861680047
2020-08-22: 6557.508075060547
2020-08-23: 6484.362063608685
2020-08-24: 6539.5569190774895
2020-08-25: 6801.911395924122
2020-08-26: 7055.692196784034
2020-08-27: 7223.655212614802
2020-08-28: 7300.701102624043
2020-08-29: 7350.496374784141
2020-08-30: 7474.130876415177
2020-08-3

2020-08-01: 10837.90874357248
2020-08-02: 13066.51886721358
2020-08-03: 13420.961677746718
2020-08-04: 11665.779522133143
2020-08-05: 11107.157225713898
2020-08-06: 7375.107066767487
2020-08-07: 11588.724899557219
2020-08-08: 13324.705309569183
2020-08-09: 13443.894536559197
2020-08-10: 12741.577485300058
2020-08-11: 11892.509883688543
2020-08-12: 11037.446718582843
2020-08-13: 12972.697345467192
2020-08-14: 14082.738498414928
2020-08-15: 14240.641411906043
2020-08-16: 13926.78305571738
2020-08-17: 13491.96806640223
2020-08-18: 13497.390284562196
2020-08-19: 14554.290431884097
2020-08-20: 15282.192860738909
2020-08-21: 15487.967704640276
2020-08-22: 15416.336458220125
2020-08-23: 15342.307073776605
2020-08-24: 15609.455958818186
2020-08-25: 16299.05160624397
2020-08-26: 16833.78568830543
2020-08-27: 17082.70931851505
2020-08-28: 17186.149598772983
2020-08-29: 17326.939892687642
2020-08-30: 17681.740529831728
2020-08-31: 18215.005811001804

Predicting for United States__Colorado
2020-08

2020-08-25: 4622.8318615158205
2020-08-26: 4754.205279645711
2020-08-27: 4805.914764438558
2020-08-28: 4843.353461441033
2020-08-29: 4906.326956243473
2020-08-30: 5031.394312872202
2020-08-31: 5189.662365349226

Predicting for United States__Idaho
2020-08-01: 1312.9489152421236
2020-08-02: 1530.6110368637972
2020-08-03: 1218.938738219569
2020-08-04: 1184.9415428872749
2020-08-05: 1212.914292181588
2020-08-06: 862.9335593828034
2020-08-07: 1383.0163951208594
2020-08-08: 1518.1812669825076
2020-08-09: 1410.7547569488509
2020-08-10: 1398.3080329567463
2020-08-11: 1366.1810200198627
2020-08-12: 1317.4480653415137
2020-08-13: 1554.5199692682165
2020-08-14: 1642.2108231811737
2020-08-15: 1620.412017687068
2020-08-16: 1620.766523270693
2020-08-17: 1615.0971529243818
2020-08-18: 1653.181319560498
2020-08-19: 1783.8915313045884
2020-08-20: 1850.7411954847628
2020-08-21: 1866.3798170265857
2020-08-22: 1883.7995895473464
2020-08-23: 1916.6172036033925
2020-08-24: 1975.4802317110823
2020-08-25: 20

2020-08-17: 7297.548681078331
2020-08-18: 7324.0481308713715
2020-08-19: 7890.571959786739
2020-08-20: 8250.823197573276
2020-08-21: 8402.766151616652
2020-08-22: 8403.418660497344
2020-08-23: 8335.337541492587
2020-08-24: 8485.194006872438
2020-08-25: 8853.642185100563
2020-08-26: 9136.558499367646
2020-08-27: 9292.565576174744
2020-08-28: 9364.561400562743
2020-08-29: 9433.824716604777
2020-08-30: 9627.371716142201
2020-08-31: 9914.852789321158

Predicting for United States__Missouri
2020-08-01: 3781.7327673084064
2020-08-02: 4617.4732018421
2020-08-03: 4650.722123966218
2020-08-04: 4320.157546523404
2020-08-05: 3424.114976063029
2020-08-06: 2511.614395629151
2020-08-07: 4066.4232824318974
2020-08-08: 4686.6392895048375
2020-08-09: 4743.098927966108
2020-08-10: 4502.924607274419
2020-08-11: 3991.2536376735984
2020-08-12: 3832.593447665225
2020-08-13: 4571.893493835906
2020-08-14: 4976.813458328217
2020-08-15: 5043.99484818622
2020-08-16: 4906.398898791018
2020-08-17: 4689.24246702266

2020-08-24: 2962.9268839634583
2020-08-25: 3086.490345956181
2020-08-26: 3192.8047850735843
2020-08-27: 3262.1954345577283
2020-08-28: 3307.7800145114484
2020-08-29: 3349.5639007235613
2020-08-30: 3422.8150346556445
2020-08-31: 3527.158915988391

Predicting for United States__New York
2020-08-01: 4612.109780908175
2020-08-02: 5583.591463917848
2020-08-03: 5772.658373718101
2020-08-04: 5695.087800975167
2020-08-05: 4970.739064813391
2020-08-06: 3269.5494696536675
2020-08-07: 5063.934446041294
2020-08-08: 5821.412737686051
2020-08-09: 6002.130019673061
2020-08-10: 5917.332512722405
2020-08-11: 5377.474138638172
2020-08-12: 4942.3724220066615
2020-08-13: 5766.657259559324
2020-08-14: 6272.232136961216
2020-08-15: 6435.505480837312
2020-08-16: 6377.926205055021
2020-08-17: 6124.645084117834
2020-08-18: 6095.223460868105
2020-08-19: 6552.262696803313
2020-08-20: 6897.3089112342805
2020-08-21: 7046.193235444121
2020-08-22: 7049.493416219819
2020-08-23: 7000.893112775637
2020-08-24: 7106.1523

2020-08-17: 10589.388771387192
2020-08-18: 11002.723671864122
2020-08-19: 12089.314487946702
2020-08-20: 12603.471625552378
2020-08-21: 12587.374320551371
2020-08-22: 12367.631955826764
2020-08-23: 12312.682067763115
2020-08-24: 12734.393465980895
2020-08-25: 13405.567766933656
2020-08-26: 13794.454636748218
2020-08-27: 13889.58039984381
2020-08-28: 13895.409493552183
2020-08-29: 14035.487699067242
2020-08-30: 14429.725022959214
2020-08-31: 14916.923041564936

Predicting for United States__Utah
2020-08-01: 3333.335212542765
2020-08-02: 3947.645865770876
2020-08-03: 3489.211549593377
2020-08-04: 3215.0482168341464
2020-08-05: 2459.400266718631
2020-08-06: 2005.2962100895813
2020-08-07: 3407.771734753435
2020-08-08: 3824.1013310453227
2020-08-09: 3659.9287857823833
2020-08-10: 3427.8085365117818
2020-08-11: 3044.672972120006
2020-08-12: 3058.3464466004407
2020-08-13: 3722.231585141895
2020-08-14: 3989.8648665683245
2020-08-15: 3944.463217299744
2020-08-16: 3809.03141694199
2020-08-17: 36

2020-08-01: 219.36374253784555
2020-08-02: 265.5160873273324
2020-08-03: 237.42301898751788
2020-08-04: 209.64120598669763
2020-08-05: 292.51975739555985
2020-08-06: 211.43003874612788
2020-08-07: 298.91151635452087
2020-08-08: 334.07239192622154
2020-08-09: 327.96714626119086
2020-08-10: 335.3485584071662
2020-08-11: 363.207323573862
2020-08-12: 355.3136186006516
2020-08-13: 401.0316538585378
2020-08-14: 428.11288827593125
2020-08-15: 437.00694910719693
2020-08-16: 452.52152128276157
2020-08-17: 471.8047880474
2020-08-18: 491.66850873534247
2020-08-19: 524.9737786233234
2020-08-20: 550.0860801311585
2020-08-21: 567.6296553177121
2020-08-22: 587.6159256229962
2020-08-23: 619.5261683539734
2020-08-24: 646.853340219496
2020-08-25: 677.5637694928216
2020-08-26: 704.5831670468356
2020-08-27: 728.4803150789646
2020-08-28: 755.0570738705242
2020-08-29: 786.3043588697803
2020-08-30: 817.1024464851166
2020-08-31: 848.7192256205387

Predicting for Venezuela__nan
2020-08-01: 318.2829377241056
20

In [24]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,58.831987
214,Aruba,,2020-08-02,71.321972
215,Aruba,,2020-08-03,78.992761
216,Aruba,,2020-08-04,90.239703
217,Aruba,,2020-08-05,87.815901


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [25]:
!python3 predict.py -s 2020-08-01 -e 2020-08-04 -ip data/2020-09-30_historical_ip_new.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [26]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,58.83198688213354
Aruba,,2020-08-02,71.32197184134634
Aruba,,2020-08-03,78.9927614946973
Aruba,,2020-08-04,90.2397026313416
Afghanistan,,2020-08-01,149.81626707248535
Afghanistan,,2020-08-02,287.24845171362017
Afghanistan,,2020-08-03,277.5553170326754
Afghanistan,,2020-08-04,264.8033662649353
Angola,,2020-08-01,198.70118920042884


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [27]:
import os
from predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python3 predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [28]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="data/2020-09-30_historical_ip_new.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [29]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 60.7 ms, sys: 16.5 ms, total: 77.2 ms
Wall time: 1.86 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [30]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-01
End date: 2021-05-30


In [32]:
from validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [None]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")