# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f178857cd90>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [6]:
# # For testing, restrict training data to that before a hypothetical predictor submission date
# HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
# df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30
date_ls = []
geoid_ls = []
country_ls = []
newcase_ls = []
# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
train_geo_ids = [e for e in geo_ids]
geoid_arr = np.zeros(len(train_geo_ids)+1)
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        geoid_arr = np.zeros(len(train_geo_ids)+1)
        geoid_arr[ train_geo_ids.index(g) ] = 1
        X_sample = np.concatenate([geoid_arr,  X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d]
        X_samples.append(X_sample)
        y_samples.append(y_sample)
        date_ls += [ list(gdf['Date'])[d] ]
        geoid_ls += [ list(gdf['GeoID'])[d] ]
        country_ls += [ list(gdf['CountryName'])[d]  ] 
        newcase_ls += [ list(gdf['NewCases'])[d]  ] 

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()
with open('train_geo_ids.txt', 'w') as f:
    f.write('\n'.join(train_geo_ids))
    
print(X_samples.shape)

(85440, 658)


In [14]:
import datetime
train_idx = [i for i in range(len(date_ls)) if date_ls[i] <= datetime.date(2020,7,31) ]
test_idx = [i for i in range(len(date_ls)) if date_ls[i] >= datetime.date(2020,8,1) ]
train_idx = np.array(train_idx)
test_idx = np.array(test_idx)

In [15]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [16]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = X_samples[train_idx,:], X_samples[test_idx,:],y_samples[train_idx], y_samples[test_idx]
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape,)

(48861, 658) (36579, 658) (48861,) (36579,)


In [17]:
!pip install lightgbm



In [18]:
import random
def seed_everything(seed=0):
    random.seed(seed)
    np.random.seed(seed)
seed_everything(42) 

In [19]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.

lasso_model = Lasso(random_state=42)
lasso_model.fit(X_train, y_train)

print('Lasso result:')
# Evaluate model
train_preds = lasso_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))
test_preds = lasso_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_lasso.pkl', 'wb') as model_file:
    pickle.dump(lasso_model, model_file)

from lightgbm import LGBMRegressor
lgbm_model = LGBMRegressor(random_state=42)
lgbm_model.fit(X_train, y_train)
# Evaluate model
train_preds = lgbm_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('lgbm result:')
print('Train MAE:', mae(train_preds, y_train))
test_preds = lgbm_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_lgbm.pkl', 'wb') as model_file:
    pickle.dump(lgbm_model, model_file)

import xgboost as xgb
xgb_model = xgb.XGBRegressor(random_state=42)
xgb_model.fit(X_train, y_train)
# Evaluate model
train_preds = xgb_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('xgb result:')
print('Train MAE:', mae(train_preds, y_train))
test_preds = xgb_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_xgb.pkl', 'wb') as model_file:
    pickle.dump(xgb_model, model_file)


Lasso result:
Train MAE: 118.69420734991954
Test MAE: 447.06846062728135
lgbm result:
Train MAE: 85.51717187475573
Test MAE: 650.7371662424473
xgb result:
Train MAE: 54.2057830354577
Test MAE: 633.3338386238983


In [35]:
# None geoid result:
# Lasso result:
# Train MAE: 119.2762230373797
# Test MAE: 469.3599710917388
# lgbm result:
# Train MAE: 85.58646245869078
# Test MAE: 665.2549123382636
# xgb result:
# Train MAE: 51.797418546131624
# Test MAE: 650.3725903829637

In [22]:
# Save the best model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(lasso_model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [23]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [24]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 0
2020-08-02: 1.8942086594635512
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 10.775050492342857
2020-08-07: 14.969903603377997
2020-08-08: 12.88987000950307
2020-08-09: 9.76640477848154
2020-08-10: 10.321820828685443
2020-08-11: 5.884680787295444
2020-08-12: 5.074664494838039
2020-08-13: 14.030323766097279
2020-08-14: 12.035050437666495
2020-08-15: 4.793380734089688
2020-08-16: 6.291032616504673
2020-08-17: 4.789699457653006
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 6.736178403116348
2020-08-22: 5.369250056188319
2020-08-23: 14.075806343937188
2020-08-24: 21.430916317005696
2020-08-25: 13.680896068981934
2020-08-26: 17.877373336870363
2020-08-27: 27.489196360750043
2020-08-28: 28.630174566712167
2020-08-29: 22.71824225476272
2020-08-30: 28.41664107760637
2020-08-31: 45.387361478622346

Predicting for Afghanistan__nan
2020-08-01: 96.30950955227038
2020-08-02: 5.69729118841844
2020-08-03: 77.93904851248102
2020-08-04: 140.47768

2020-08-28: 152.82924962840076
2020-08-29: 143.59536304179554
2020-08-30: 144.14769270556553
2020-08-31: 149.88067718496845

Predicting for Belgium__nan
2020-08-01: 2278.9962231139702
2020-08-02: 1692.618511400265
2020-08-03: 2680.2786569139084
2020-08-04: 1747.8177678809245
2020-08-05: 731.8545911806032
2020-08-06: 1069.0060850894251
2020-08-07: 1273.5030545237278
2020-08-08: 2180.563921290668
2020-08-09: 1720.6814733826016
2020-08-10: 2155.3812594710043
2020-08-11: 2163.7306472733653
2020-08-12: 937.2222625874833
2020-08-13: 553.5130569103769
2020-08-14: 1725.2162860807691
2020-08-15: 1678.914955967336
2020-08-16: 1395.8390714734371
2020-08-17: 2077.332696130232
2020-08-18: 2110.9712571874306
2020-08-19: 1251.787998731551
2020-08-20: 583.0032503784189
2020-08-21: 1381.2458205916178
2020-08-22: 1616.431895398894
2020-08-23: 1962.1509256855904
2020-08-24: 2088.0832245488637
2020-08-25: 2208.757261820647
2020-08-26: 1428.2729275425704
2020-08-27: 549.1525054162239
2020-08-28: 1411.66337


Predicting for Bolivia__nan
2020-08-01: 241.2411406039507
2020-08-02: 232.61605607435632
2020-08-03: 166.21313977135605
2020-08-04: 80.58224932682043
2020-08-05: 132.1684102230932
2020-08-06: 153.80976586987214
2020-08-07: 156.41049965272333
2020-08-08: 232.82600698725247
2020-08-09: 179.25033472084561
2020-08-10: 137.09966696726633
2020-08-11: 69.61965593822623
2020-08-12: 25.74796596115695
2020-08-13: 90.00364146624966
2020-08-14: 107.02362221883831
2020-08-15: 89.0685073166355
2020-08-16: 137.19573007231236
2020-08-17: 60.600585474407055
2020-08-18: 0
2020-08-19: 5.74340327598507
2020-08-20: 50.11010270012294
2020-08-21: 50.76824794683608
2020-08-22: 67.88402549891521
2020-08-23: 111.207966940291
2020-08-24: 28.45792049681529
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 45.04074461910064
2020-08-29: 47.59162844915315
2020-08-30: 76.8374017428768
2020-08-31: 2.790951182020443

Predicting for Brazil__Acre
2020-08-01: 80.55018039278437
2020-08-02: 0
2020-08-03: 0
2020-08-04: 

2020-08-01: 2871.0693922594364
2020-08-02: 633.039006524644
2020-08-03: 0
2020-08-04: 0
2020-08-05: 326.22893042426654
2020-08-06: 916.6284540309741
2020-08-07: 1538.2153900617986
2020-08-08: 2255.3587149361783
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 853.5628324228335
2020-08-14: 2347.619505766636
2020-08-15: 1592.4022138255652
2020-08-16: 624.2846555401916
2020-08-17: 0
2020-08-18: 0
2020-08-19: 793.5864005464687
2020-08-20: 1587.1873292508415
2020-08-21: 2494.8304208024783
2020-08-22: 2069.2291591106505
2020-08-23: 1139.0429636547542
2020-08-24: 0
2020-08-25: 153.38211686143543
2020-08-26: 986.6008316724435
2020-08-27: 1852.1905016944293
2020-08-28: 2839.3109472828846
2020-08-29: 2196.614225400823
2020-08-30: 1312.6976967899398
2020-08-31: 156.40424148369937

Predicting for Brazil__Mato Grosso do Sul
2020-08-01: 580.4366676545667
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 162.68365173846166
2020-08-06: 348.2935016270979
2020-08-07: 547.768624881

2020-08-01: 135.39523168804675
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 29.141097605965754
2020-08-08: 55.548862637992215
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 49.640870900133066
2020-08-15: 59.95469189066814
2020-08-16: 11.716577885370192
2020-08-17: 0
2020-08-18: 0.4021532119977351
2020-08-19: 7.507714551993033
2020-08-20: 2.322972836583947
2020-08-21: 59.31388415424823
2020-08-22: 86.18564558136683
2020-08-23: 71.51553370870319
2020-08-24: 0.5293025784543417
2020-08-25: 24.825662114544237
2020-08-26: 47.166092692702335
2020-08-27: 0
2020-08-28: 59.730425819029385
2020-08-29: 84.44438027948996
2020-08-30: 62.08973296759545
2020-08-31: 0

Predicting for Brazil__Rio Grande do Sul
2020-08-01: 2716.220833441867
2020-08-02: 120.19772883944626
2020-08-03: 0
2020-08-04: 0
2020-08-05: 522.0179397291615
2020-08-06: 1133.335321946006
2020-08-07: 2070.467495138577
2020-08-08: 2211.6786484099944
2020-08-09: 0
20

2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 4.73149061737606

Predicting for Canada__nan
2020-08-01: 4389.290017766595
2020-08-02: 4901.616065252775
2020-08-03: 4299.345623223367
2020-08-04: 4688.923323891165
2020-08-05: 5648.5739648949575
2020-08-06: 4217.912066460676
2020-08-07: 2736.397860035065
2020-08-08: 4003.8834777451652
2020-08-09: 3940.229094834974
2020-08-10: 4244.769386605614
2020-08-11: 4542.116111728305
2020-08-12: 4514.786058354363
2020-08-13: 3924.8677632028325
2020-08-14: 2115.0921986396193
2020-08-15: 2566.8739044166673
2020-08-16: 3911.3041061822128
2020-08-17: 3751.498320189805
2020-08-18: 3975.457355911946
2020-

2020-08-11: 13.148690607926905
2020-08-12: 0
2020-08-13: 7.29469225089905
2020-08-14: 18.27314481018724
2020-08-15: 6.219017076684867
2020-08-16: 6.275504043284066
2020-08-17: 15.204821167212227
2020-08-18: 12.854338186402856
2020-08-19: 5.206878996267243
2020-08-20: 11.074369158253525
2020-08-21: 15.034360190504406
2020-08-22: 17.827515223430773
2020-08-23: 17.595319407629464
2020-08-24: 24.236515563960285
2020-08-25: 20.473865989749438
2020-08-26: 22.51231331075963
2020-08-27: 0
2020-08-28: 11.796686526097329
2020-08-29: 15.381082856295112
2020-08-30: 17.440905311765086
2020-08-31: 8.278069324053622

Predicting for Costa Rica__nan
2020-08-01: 676.82796120949
2020-08-02: 512.1023203748
2020-08-03: 0
2020-08-04: 116.09998440379294
2020-08-05: 1361.1987391940866
2020-08-06: 1301.1723606263781
2020-08-07: 412.12682903435456
2020-08-08: 687.4055350650949
2020-08-09: 432.0212499165254
2020-08-10: 0
2020-08-11: 86.04452503719577
2020-08-12: 1184.5913780721348
2020-08-13: 1403.1238477510317


2020-08-26: 17.185880865340096
2020-08-27: 0
2020-08-28: 225.6488548256945
2020-08-29: 399.8766514799034
2020-08-30: 439.06405504888653
2020-08-31: 258.6273245594076

Predicting for Egypt__nan
2020-08-01: 338.1550880912465
2020-08-02: 362.5209071052134
2020-08-03: 347.23527693786974
2020-08-04: 322.4699345087368
2020-08-05: 357.2275890693291
2020-08-06: 281.6173024459507
2020-08-07: 211.71421842537328
2020-08-08: 301.3511231404514
2020-08-09: 280.7860212830105
2020-08-10: 314.2439746608546
2020-08-11: 303.51315410233616
2020-08-12: 255.5456004567081
2020-08-13: 214.69355653486605
2020-08-14: 145.53536125933397
2020-08-15: 131.8391390305789
2020-08-16: 228.42502459581743
2020-08-17: 233.7929335713731
2020-08-18: 195.50240808987553
2020-08-19: 228.56517144754383
2020-08-20: 136.38748114745704
2020-08-21: 44.58913011250711
2020-08-22: 67.89126066759462
2020-08-23: 193.67265985065654
2020-08-24: 193.44889882158
2020-08-25: 186.84915541530336
2020-08-26: 180.85336378791516
2020-08-27: 56.91

2020-08-30: 19735.264363209826
2020-08-31: 19153.80992301693

Predicting for United Kingdom__England
2020-08-01: 5888.108896409872
2020-08-02: 4649.200694534018
2020-08-03: 2150.6349352388997
2020-08-04: 168.99045197759784
2020-08-05: 587.4812918109949
2020-08-06: 2651.4817161658752
2020-08-07: 2146.030476601181
2020-08-08: 4093.0532808730113
2020-08-09: 3093.952152402721
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 39.833064062560936
2020-08-15: 2516.882882368662
2020-08-16: 2516.620794534301
2020-08-17: 951.0936670666581
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 1480.8337841203079
2020-08-22: 4144.383318545059
2020-08-23: 5007.454352275527
2020-08-24: 3122.1309490147037
2020-08-25: 980.0337214790814
2020-08-26: 807.9350933010749
2020-08-27: 748.7456044049932
2020-08-28: 2618.5976091027546
2020-08-29: 4602.779389441248
2020-08-30: 5106.490894624964
2020-08-31: 2722.9771624772084

Predicting for United Kingdom__Northern Ireland
2020-08-01: 164.3725931


Predicting for Guam__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 33.96946710615074
2020-08-06: 21.56928487513425
2020-08-07: 15.842060849192183
2020-08-08: 12.850212225852669
2020-08-09: 8.5164532205219
2020-08-10: 0
2020-08-11: 8.344657537749306
2020-08-12: 27.485051211930596
2020-08-13: 22.27443226970542
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 1.5391315638098613
2020-08-27: 1.0657579842949048
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Guyana__nan
2020-08-01: 28.43640844662169
2020-08-02: 20.474035199473132
2020-08-03: 8.582095955245235
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 0
2020-08-08: 18.63546187944496
2020-08-09: 7.8672060809228235
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2

2020-08-01: 861.265042067555
2020-08-02: 797.2792067776763
2020-08-03: 453.78141782751715
2020-08-04: 352.96934633241443
2020-08-05: 711.9081749274912
2020-08-06: 696.1738705446578
2020-08-07: 457.2978065526389
2020-08-08: 471.5459473777064
2020-08-09: 302.44006947451595
2020-08-10: 240.38291367320863
2020-08-11: 126.07032667997893
2020-08-12: 325.4351753992178
2020-08-13: 490.83370194158806
2020-08-14: 219.25799328121724
2020-08-15: 3.1047346796527506
2020-08-16: 127.9746107761446
2020-08-17: 0
2020-08-18: 0
2020-08-19: 364.93499217236143
2020-08-20: 375.98521803788935
2020-08-21: 99.74153850475811
2020-08-22: 0
2020-08-23: 95.15916861787457
2020-08-24: 0
2020-08-25: 9.552066381619518
2020-08-26: 270.18977167275426
2020-08-27: 299.4040685750516
2020-08-28: 117.78355457489195
2020-08-29: 0
2020-08-30: 54.56564490271194
2020-08-31: 0

Predicting for Iceland__nan
2020-08-01: 0
2020-08-02: 15.2270981779735
2020-08-03: 8.728642142683803
2020-08-04: 6.604872614617683
2020-08-05: 24.75936667

2020-08-11: 726.5100895108662
2020-08-12: 611.8114525557023
2020-08-13: 633.7652415944829
2020-08-14: 534.3229802814001
2020-08-15: 514.5023802775304
2020-08-16: 785.094185766905
2020-08-17: 782.1651963444497
2020-08-18: 628.508231039589
2020-08-19: 701.6128377265436
2020-08-20: 627.6520088964451
2020-08-21: 479.8688279412889
2020-08-22: 563.0695432744658
2020-08-23: 840.9732827803261
2020-08-24: 815.4184576405562
2020-08-25: 739.4846555430363
2020-08-26: 707.1832015423865
2020-08-27: 596.9485659332818
2020-08-28: 550.1015437101464
2020-08-29: 569.0231107492141
2020-08-30: 822.2433448831487
2020-08-31: 829.2746441458328

Predicting for Kuwait__nan
2020-08-01: 155.99521980643783
2020-08-02: 184.66403464714097
2020-08-03: 131.83446112474923
2020-08-04: 55.0560808175084
2020-08-05: 103.40663120145231
2020-08-06: 114.06973950014056
2020-08-07: 104.47336719230051
2020-08-08: 135.6408160265362
2020-08-09: 105.73660450593198
2020-08-10: 80.86427673846373
2020-08-11: 18.871060553724625
2020-08

2020-08-17: 144.95810850124317
2020-08-18: 133.88275450820558
2020-08-19: 128.1602259670866
2020-08-20: 127.8577956665732
2020-08-21: 125.06028076642659
2020-08-22: 120.71068777385145
2020-08-23: 129.18604638824826
2020-08-24: 128.37767297638572
2020-08-25: 102.0929718709111
2020-08-26: 99.49498223928568
2020-08-27: 131.9611221375511
2020-08-28: 76.68721657949686
2020-08-29: 66.0480613371883
2020-08-30: 59.640737638446794
2020-08-31: 58.08715749616496

Predicting for Morocco__nan
2020-08-01: 2240.134449775793
2020-08-02: 2033.6332395412794
2020-08-03: 1797.3328183252675
2020-08-04: 783.1902797026794
2020-08-05: 405.23801752298255
2020-08-06: 510.6195084141677
2020-08-07: 1284.4109985792422
2020-08-08: 2032.8933256743762
2020-08-09: 1348.9522271202245
2020-08-10: 1379.588705603693
2020-08-11: 533.9614693500935
2020-08-12: 0
2020-08-13: 281.7641578714178
2020-08-14: 1251.949329906855
2020-08-15: 1276.706088550684
2020-08-16: 1485.5765339082723
2020-08-17: 1090.829369847613
2020-08-18: 11

2020-08-29: 107.11431345129418
2020-08-30: 106.33453939162689
2020-08-31: 100.62262991977295

Predicting for Malawi__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 9.895173141343745
2020-08-06: 14.712287773823457
2020-08-07: 16.659380711610183
2020-08-08: 17.905067343680788
2020-08-09: 17.492953648484416
2020-08-10: 21.774921995356422
2020-08-11: 25.11571546884496
2020-08-12: 28.599656512395917
2020-08-13: 33.01406118902631
2020-08-14: 30.620220668057105
2020-08-15: 21.609145229502673
2020-08-16: 20.39353837879984
2020-08-17: 21.783338933557
2020-08-18: 20.238981895755682
2020-08-19: 12.668886604346238
2020-08-20: 11.251111611404976
2020-08-21: 8.371046809603907
2020-08-22: 1.2935023569860666
2020-08-23: 4.928470904469708
2020-08-24: 6.170714710009657
2020-08-25: 16.66232657459616
2020-08-26: 22.877766396054817
2020-08-27: 26.709245701785683
2020-08-28: 16.930731138176537
2020-08-29: 19.576233584789
2020-08-30: 25.187947855464223
2020-08-31: 34.84053438178283



2020-08-01: 1934.142374153987
2020-08-02: 2025.5064771231907
2020-08-03: 1963.7693066004358
2020-08-04: 1566.1971169420951
2020-08-05: 1727.452954075624
2020-08-06: 1227.2229657011967
2020-08-07: 1137.4680997023538
2020-08-08: 1749.0907370814045
2020-08-09: 1338.1772511035488
2020-08-10: 1640.0726217127435
2020-08-11: 1540.409652429672
2020-08-12: 988.9623777754273
2020-08-13: 835.7513157071105
2020-08-14: 872.6819626960337
2020-08-15: 844.3362460481105
2020-08-16: 1361.8037923522677
2020-08-17: 1444.3932437623348
2020-08-18: 1173.906955049479
2020-08-19: 1222.2177422205993
2020-08-20: 849.1658196658134
2020-08-21: 641.8368807609831
2020-08-22: 926.6446852374368
2020-08-23: 1553.7558521342867
2020-08-24: 1457.324721232026
2020-08-25: 1416.8867667444697
2020-08-26: 1161.176903683727
2020-08-27: 795.5469009592966
2020-08-28: 791.2672241088517
2020-08-29: 937.3796376055302
2020-08-30: 1523.7470813457314
2020-08-31: 1580.0855363262244

Predicting for Panama__nan
2020-08-01: 1922.9379553133

2020-08-04: 617.771125509543
2020-08-05: 217.26210729352542
2020-08-06: 207.40104726476218
2020-08-07: 173.79012712116392
2020-08-08: 0
2020-08-09: 362.43795221304833
2020-08-10: 659.4388547413794
2020-08-11: 490.361231235478
2020-08-12: 80.10909578992077
2020-08-13: 44.09986927784481
2020-08-14: 0
2020-08-15: 0
2020-08-16: 302.60419715456106
2020-08-17: 463.6057556590385
2020-08-18: 480.7410340932912
2020-08-19: 50.43060631794006
2020-08-20: 0
2020-08-21: 0
2020-08-22: 89.44740823298586
2020-08-23: 321.5569262680088
2020-08-24: 540.6124768719969
2020-08-25: 440.80259960945165
2020-08-26: 118.74780213718701
2020-08-27: 0
2020-08-28: 0
2020-08-29: 69.9848590658463
2020-08-30: 371.12383278736405
2020-08-31: 578.8648513037595

Predicting for Romania__nan
2020-08-01: 5688.014822130367
2020-08-02: 5843.750990221659
2020-08-03: 4932.070399067011
2020-08-04: 2806.7389251844984
2020-08-05: 2071.623486989788
2020-08-06: 2391.8786143394896
2020-08-07: 3483.0430712060793
2020-08-08: 5449.86890541

2020-08-27: 18.608273565684918
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Somalia__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 1.099933912023701
2020-08-04: 13.859620045389851
2020-08-05: 27.287116089407057
2020-08-06: 37.573990597429614
2020-08-07: 40.22266386553463
2020-08-08: 44.58214929768844
2020-08-09: 47.74437116363313
2020-08-10: 52.561132050546576
2020-08-11: 78.03049670691489
2020-08-12: 93.84517938201378
2020-08-13: 103.44061016071535
2020-08-14: 99.47321683435413
2020-08-15: 94.8000845277821
2020-08-16: 89.56426160565354
2020-08-17: 97.59376018199004
2020-08-18: 90.18943653119305
2020-08-19: 85.8282145190656
2020-08-20: 83.34606952620175
2020-08-21: 65.88721033323894
2020-08-22: 56.389291404880936
2020-08-23: 62.077932809593904
2020-08-24: 61.41066217111521
2020-08-25: 54.78518316472751
2020-08-26: 60.205911163000394
2020-08-27: 99.10019219038216
2020-08-28: 49.66035499393909
2020-08-29: 47.09610972357019
2020-08-30: 37.709020109724875
2020-08-3


Predicting for Togo__nan
2020-08-01: 0
2020-08-02: 10.898369573045983
2020-08-03: 12.581425798479097
2020-08-04: 11.93427063416361
2020-08-05: 28.0399484653106
2020-08-06: 29.259915843460824
2020-08-07: 30.353419651783813
2020-08-08: 37.646346527348356
2020-08-09: 37.70515402602763
2020-08-10: 44.68769459765826
2020-08-11: 57.29414166051901
2020-08-12: 64.48683541398472
2020-08-13: 60.80293753092447
2020-08-14: 68.23254088809172
2020-08-15: 59.43256625064812
2020-08-16: 65.25209271229744
2020-08-17: 75.51142201493579
2020-08-18: 83.00701379387316
2020-08-19: 84.78149620968347
2020-08-20: 81.80934682177872
2020-08-21: 78.6836222970718
2020-08-22: 79.77219359871472
2020-08-23: 96.28666150857516
2020-08-24: 109.02608664118011
2020-08-25: 117.64078174129142
2020-08-26: 129.72715269632735
2020-08-27: 91.99419538046482
2020-08-28: 92.36527537307836
2020-08-29: 91.75586704305051
2020-08-30: 89.87089800410914
2020-08-31: 88.66336307074994

Predicting for Thailand__nan
2020-08-01: 0
2020-08-02

2020-08-01: 288.05133141509884
2020-08-02: 315.70461729475977
2020-08-03: 351.22606659144645
2020-08-04: 344.20649269868596
2020-08-05: 324.95950813827204
2020-08-06: 282.1433220909209
2020-08-07: 254.50178859016108
2020-08-08: 368.2342115162539
2020-08-09: 389.71367757628656
2020-08-10: 441.11415322072787
2020-08-11: 443.90142357723244
2020-08-12: 395.9128883394587
2020-08-13: 324.7809900394193
2020-08-14: 329.3588078645927
2020-08-15: 366.23894261835454
2020-08-16: 441.2419752297167
2020-08-17: 486.531003202479
2020-08-18: 453.65729986997775
2020-08-19: 446.9911601303166
2020-08-20: 334.5294720045654
2020-08-21: 318.08665183939075
2020-08-22: 365.0780395028143
2020-08-23: 486.248630612019
2020-08-24: 505.69364603783583
2020-08-25: 465.1757774694615
2020-08-26: 436.51920368213854
2020-08-27: 366.5067433561654
2020-08-28: 309.32953783941844
2020-08-29: 298.84356144142174
2020-08-30: 434.7142683458569
2020-08-31: 460.41341053327966

Predicting for United States__nan
2020-08-01: 157715.5

2020-08-25: 495.44405805721806
2020-08-26: 390.0803247145587
2020-08-27: 230.80516317396092
2020-08-28: 96.7434423289334
2020-08-29: 191.50180360862288
2020-08-30: 455.8426809698912
2020-08-31: 537.8631312913787

Predicting for United States__Florida
2020-08-01: 8179.557045003823
2020-08-02: 9151.173095448677
2020-08-03: 8761.60680048655
2020-08-04: 5550.567290001505
2020-08-05: 5470.423739489595
2020-08-06: 5321.661717783793
2020-08-07: 5247.598864279497
2020-08-08: 7801.618425975906
2020-08-09: 7796.834977866131
2020-08-10: 8753.563240618761
2020-08-11: 5300.314525891795
2020-08-12: 4131.230068860794
2020-08-13: 4140.096898325098
2020-08-14: 4695.895098882551
2020-08-15: 5950.101498007066
2020-08-16: 8401.833821214293
2020-08-17: 7419.388698848697
2020-08-18: 4853.445677814784
2020-08-19: 4782.94549228278
2020-08-20: 3882.841224338655
2020-08-21: 4010.583896083345
2020-08-22: 6070.152694041514
2020-08-23: 8769.772991940357
2020-08-24: 7690.071531971508
2020-08-25: 5585.705403768902
2

2020-08-30: 4130.9988896168625
2020-08-31: 3450.941423316473

Predicting for United States__Maryland
2020-08-01: 2112.0599041952623
2020-08-02: 2382.3719536798276
2020-08-03: 2458.120332233683
2020-08-04: 1596.8448419296537
2020-08-05: 1723.7896376676513
2020-08-06: 1467.8046518327064
2020-08-07: 1163.7164083949774
2020-08-08: 2024.8784029536168
2020-08-09: 2104.1208132665624
2020-08-10: 2259.908462858621
2020-08-11: 1685.8446572515295
2020-08-12: 1148.2587331256143
2020-08-13: 1012.7706418044827
2020-08-14: 1105.122354485139
2020-08-15: 1225.9072911891217
2020-08-16: 2012.067541627173
2020-08-17: 2121.9391780320598
2020-08-18: 1461.9212481864167
2020-08-19: 1217.7064636404239
2020-08-20: 895.452918286479
2020-08-21: 821.8256802489225
2020-08-22: 1256.1056892690863
2020-08-23: 2168.0806185259457
2020-08-24: 1988.968969115176
2020-08-25: 1590.4899168578095
2020-08-26: 1113.8301991194114
2020-08-27: 796.7213948671138
2020-08-28: 857.6966176945521
2020-08-29: 1250.3855900414783
2020-08-30

2020-08-01: 3337.358411252707
2020-08-02: 3574.6138353264455
2020-08-03: 3814.953394819141
2020-08-04: 3037.576965972677
2020-08-05: 3317.7649936229504
2020-08-06: 2117.7139777484813
2020-08-07: 2076.3999246944477
2020-08-08: 2992.562076538173
2020-08-09: 2857.3484022043544
2020-08-10: 3561.165144407654
2020-08-11: 3086.521461896259
2020-08-12: 2221.2442126082656
2020-08-13: 1661.3271430592745
2020-08-14: 1756.6126796377475
2020-08-15: 1577.5107700723452
2020-08-16: 2925.581934060286
2020-08-17: 3174.0145560679566
2020-08-18: 2776.939263608721
2020-08-19: 2545.356010212146
2020-08-20: 1590.7353253716608
2020-08-21: 1282.9664951548998
2020-08-22: 1741.0548382585253
2020-08-23: 3146.688346956646
2020-08-24: 3105.524984863118
2020-08-25: 3154.0002408186806
2020-08-26: 2325.3629237419023
2020-08-27: 1560.485703383311
2020-08-28: 1320.7442772307852
2020-08-29: 1774.6258038331061
2020-08-30: 3067.2561957245634
2020-08-31: 3386.647243007999

Predicting for United States__New Mexico
2020-08-01

2020-08-14: 4392.199314083929
2020-08-15: 4437.234205097248
2020-08-16: 5858.160568344564
2020-08-17: 7028.636322185909
2020-08-18: 7959.18131074175
2020-08-19: 8814.376433821155
2020-08-20: 6721.251717521741
2020-08-21: 3766.9243407615236
2020-08-22: 4253.858981224164
2020-08-23: 6229.981388842281
2020-08-24: 7588.358338366113
2020-08-25: 8610.061009110881
2020-08-26: 9204.663639144079
2020-08-27: 6708.099741619208
2020-08-28: 4349.434807432485
2020-08-29: 4002.5348472449077
2020-08-30: 6448.0243947087465
2020-08-31: 7993.687044814625

Predicting for United States__Texas
2020-08-01: 11489.078964827888
2020-08-02: 7436.9519643443755
2020-08-03: 5835.237310887854
2020-08-04: 4865.069625369607
2020-08-05: 7481.801784077002
2020-08-06: 9431.99918224505
2020-08-07: 10015.251171407614
2020-08-08: 10207.930108101786
2020-08-09: 5076.0879984839785
2020-08-10: 5138.11546866074
2020-08-11: 3604.4288639943766
2020-08-12: 4278.165742489888
2020-08-13: 7731.011156806805
2020-08-14: 8250.6198131187

2020-08-17: 83.06060215452598
2020-08-18: 92.11053210591821
2020-08-19: 104.16008347342097
2020-08-20: 24.178938879939892
2020-08-21: 0
2020-08-22: 34.8217296578371
2020-08-23: 87.4586069120997
2020-08-24: 81.11045391717445
2020-08-25: 92.84930159245067
2020-08-26: 91.20512679236109
2020-08-27: 0
2020-08-28: 0
2020-08-29: 23.30073566404122
2020-08-30: 94.56940611025937
2020-08-31: 67.12735046935718

Predicting for United States Virgin Islands__nan
2020-08-01: 3.834842401668684
2020-08-02: 15.38225818762584
2020-08-03: 5.6552863754917295
2020-08-04: 11.210089431220403
2020-08-05: 24.07320817624968
2020-08-06: 46.369211905664855
2020-08-07: 54.98730071483173
2020-08-08: 59.16148479806675
2020-08-09: 77.51353598323965
2020-08-10: 84.46273196097368
2020-08-11: 86.43960332294454
2020-08-12: 106.58884056921265
2020-08-13: 129.74222776203246
2020-08-14: 131.15641068517834
2020-08-15: 141.26350860585217
2020-08-16: 149.68475934999123
2020-08-17: 149.2119451397673
2020-08-18: 126.5704451679295


In [25]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,0.0
214,Aruba,,2020-08-02,1.894209
215,Aruba,,2020-08-03,0.0
216,Aruba,,2020-08-04,0.0
217,Aruba,,2020-08-05,0.0


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [26]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [27]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,0.0
Aruba,,2020-08-02,1.8942086594635512
Aruba,,2020-08-03,0.0
Aruba,,2020-08-04,0.0
Afghanistan,,2020-08-01,96.30950955227038
Afghanistan,,2020-08-02,5.69729118841844
Afghanistan,,2020-08-03,77.93904851248102
Afghanistan,,2020-08-04,140.47768153999263
Angola,,2020-08-01,46.18447115696239


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [28]:
import sys,os,os.path
sys.path.append(os.path.expanduser('/home/thinng/code/2020/covid-xprize/'))

In [29]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [30]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [31]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 270 ms, sys: 111 ms, total: 381 ms
Wall time: 8.13 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [32]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-23
End date: 2021-06-21


In [33]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [34]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-23 to 2021-06-21...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 5.37 s, sys: 853 ms, total: 6.23 s
Wall time: 2min 21s
