# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [25]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [26]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [27]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f74e1f5ea90>)

In [28]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [29]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [30]:
# # For testing, restrict training data to that before a hypothetical predictor submission date
# HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
# df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [31]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [32]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [33]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [34]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [35]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [36]:
temp = pd.read_csv('temperature_data.csv')
temp['date_st'] = temp['Date'].apply(lambda e: e[5:])
temp['id'] = temp['GeoID'] + '_' + temp['date_st']
id_temp = dict(zip( temp['id'], temp['temp'] ))
id_holiday = dict(zip( temp['id'], temp['Holiday'] ))
tf = temp[['date_st','temp']]
tf = tf.groupby(['date_st']).mean().reset_index()
date_temp_avg = dict(zip( tf['date_st'], tf['temp'] ))
tf = temp[['date_st','Holiday']]
tf = tf.groupby(['date_st'])['Holiday'].agg(pd.Series.mode).reset_index()
date_holiday_avg = dict(zip( tf['date_st'], tf['Holiday'] ))
id_temp

{'Afghanistan__nan_01-01': 2.7268292682926836,
 'Afghanistan__nan_01-02': 2.9152032520325206,
 'Afghanistan__nan_01-03': 2.9459349593495934,
 'Afghanistan__nan_01-04': 2.7965853658536584,
 'Afghanistan__nan_01-05': 2.4123577235772347,
 'Afghanistan__nan_01-06': 2.3713008130081303,
 'Afghanistan__nan_01-07': 2.4534959349593493,
 'Afghanistan__nan_01-08': 2.2549593495934954,
 'Afghanistan__nan_01-09': 2.219268292682927,
 'Afghanistan__nan_01-10': 2.2303252032520327,
 'Afghanistan__nan_01-11': 2.563983739837398,
 'Afghanistan__nan_01-12': 2.7221951219512204,
 'Afghanistan__nan_01-13': 2.389186991869919,
 'Afghanistan__nan_01-14': 2.290813008130081,
 'Afghanistan__nan_01-15': 2.217967479674797,
 'Afghanistan__nan_01-16': 2.2412195121951224,
 'Afghanistan__nan_01-17': 2.432926829268292,
 'Afghanistan__nan_01-18': 2.406747967479675,
 'Afghanistan__nan_01-19': 2.1417886178861787,
 'Afghanistan__nan_01-20': 2.324878048780488,
 'Afghanistan__nan_01-21': 2.0624390243902444,
 'Afghanistan__nan_01

In [37]:
# Set number of past days to use to make predictions
nb_lookback_days = 30
date_ls = []
geoid_ls = []
country_ls = []
newcase_ls = []
# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
train_geo_ids = [e for e in geo_ids]
geoid_arr = np.zeros(len(train_geo_ids)+1)
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]
        
        date_ls += [ list(gdf['Date'])[d] ]
        geoid_ls += [ list(gdf['GeoID'])[d] ]
        country_ls += [ list(gdf['CountryName'])[d]  ] 
        newcase_ls += [ list(gdf['NewCases'])[d]  ] 
        
        date_st = str(date_ls[-1])[5:10] 
        id_ = geoid_ls[-1] + '_' + date_st

        temperature = date_temp_avg[date_st]
        holiday = date_holiday_avg[date_st]
        if id_ in id_temp:
            temperature = id_temp[id_]
            holiday = id_holiday[id_]            
        
        # Flatten all input data so it fits Lasso input format.
        geoid_arr = np.zeros(len(train_geo_ids)+1)
        geoid_arr[ train_geo_ids.index(g) ] = 1
        X_sample = np.concatenate([geoid_arr, [temperature,holiday], X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d]
        X_samples.append(X_sample)
        y_samples.append(y_sample)


X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()
with open('train_geo_ids.txt', 'w') as f:
    f.write('\n'.join(train_geo_ids))
    
print(X_samples.shape)

(85440, 660)


In [38]:
import datetime
train_idx = [i for i in range(len(date_ls)) if date_ls[i] <= datetime.date(2020,7,31) ]
test_idx = [i for i in range(len(date_ls)) if date_ls[i] >= datetime.date(2020,8,1) ]
train_idx = np.array(train_idx)
test_idx = np.array(test_idx)

In [39]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [40]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = X_samples[train_idx,:], X_samples[test_idx,:],y_samples[train_idx], y_samples[test_idx]
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape,)

(48861, 660) (36579, 660) (48861,) (36579,)


In [41]:
!pip install lightgbm



In [42]:
import random
def seed_everything(seed=0):
    random.seed(seed)
    np.random.seed(seed)
seed_everything(42) 

In [43]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.

lasso_model = Lasso(random_state=42)
lasso_model.fit(X_train, y_train)

print('Lasso result:')
# Evaluate model
train_preds = lasso_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))
test_preds = lasso_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_lasso.pkl', 'wb') as model_file:
    pickle.dump(lasso_model, model_file)

from lightgbm import LGBMRegressor
lgbm_model = LGBMRegressor(random_state=42)
lgbm_model.fit(X_train, y_train)
# Evaluate model
train_preds = lgbm_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('lgbm result:')
print('Train MAE:', mae(train_preds, y_train))
test_preds = lgbm_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_lgbm.pkl', 'wb') as model_file:
    pickle.dump(lgbm_model, model_file)

import xgboost as xgb
xgb_model = xgb.XGBRegressor(random_state=42)
xgb_model.fit(X_train, y_train)
# Evaluate model
train_preds = xgb_model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('xgb result:')
print('Train MAE:', mae(train_preds, y_train))
test_preds = xgb_model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))
with open('models/model_xgb.pkl', 'wb') as model_file:
    pickle.dump(xgb_model, model_file)


Lasso result:
Train MAE: 118.72727991825965
Test MAE: 447.0125097404181
lgbm result:
Train MAE: 84.90123948303615
Test MAE: 636.8706391310907
xgb result:
Train MAE: 52.951500424283076
Test MAE: 624.6861186879314


In [44]:
#with geoid
# Lasso result:
# Train MAE: 118.69420734991954
# Test MAE: 447.06846062728135
# lgbm result:
# Train MAE: 85.51717187475573
# Test MAE: 650.7371662424473
# xgb result:
# Train MAE: 54.2057830354577
# Test MAE: 633.3338386238983

In [45]:
# None geoid result:
# Lasso result:
# Train MAE: 119.2762230373797
# Test MAE: 469.3599710917388
# lgbm result:
# Train MAE: 85.58646245869078
# Test MAE: 665.2549123382636
# xgb result:
# Train MAE: 51.797418546131624
# Test MAE: 650.3725903829637

In [46]:
# Save the best model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(lasso_model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [47]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [48]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 7.3965300629590915
2020-08-07: 10.650240165017394
2020-08-08: 7.500342299602266
2020-08-09: 3.3086164764963346
2020-08-10: 3.5415513216760584
2020-08-11: 0
2020-08-12: 0
2020-08-13: 5.139528777691787
2020-08-14: 1.802766263031657
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0.3595418268590942
2020-08-22: 0
2020-08-23: 6.329626078578531
2020-08-24: 14.454134376543239
2020-08-25: 6.853029041836848
2020-08-26: 10.243713649480313
2020-08-27: 17.353831351393836
2020-08-28: 15.128847381115335
2020-08-29: 9.250552786050742
2020-08-30: 14.473732459851668
2020-08-31: 31.46883191540568

Predicting for Afghanistan__nan
2020-08-01: 92.87378979947317
2020-08-02: 1.2652484666498758
2020-08-03: 72.53020814220258
2020-08-04: 134.0990297639328
2020-08-05: 127.21088262915066
2020-08-06: 95.64459781506174
2020-08-07: 14.327124661152077
2020-08-08:

2020-08-17: 97.2497800880091
2020-08-18: 105.54466940877651
2020-08-19: 59.143306633523686
2020-08-20: 10.424398871410167
2020-08-21: 0
2020-08-22: 0
2020-08-23: 35.462821769489636
2020-08-24: 59.41867208762467
2020-08-25: 82.37056712171913
2020-08-26: 14.477463916504071
2020-08-27: 4.684671631350909
2020-08-28: 0
2020-08-29: 0
2020-08-30: 10.140008035433283
2020-08-31: 37.27947479522764

Predicting for Bangladesh__nan
2020-08-01: 1101.3534838926716
2020-08-02: 1113.7101495246338
2020-08-03: 688.599134700157
2020-08-04: 665.9745256113409
2020-08-05: 1152.9379657691738
2020-08-06: 1018.8180135788922
2020-08-07: 620.2749286533898
2020-08-08: 841.2700974101906
2020-08-09: 615.4720144149461
2020-08-10: 467.4535372581885
2020-08-11: 432.9180618108881
2020-08-12: 636.3081872356507
2020-08-13: 775.1193300850983
2020-08-14: 396.32852836417004
2020-08-15: 281.8837732796503
2020-08-16: 449.16190542214827
2020-08-17: 245.47465967834916
2020-08-18: 168.8876308871254
2020-08-19: 704.7597340377791
2

2020-08-01: 2294.9466771660714
2020-08-02: 447.8062468451571
2020-08-03: 5.947527008132425
2020-08-04: 0
2020-08-05: 1595.9082219080483
2020-08-06: 1327.3808124247757
2020-08-07: 1260.7030296010616
2020-08-08: 1780.0788827946767
2020-08-09: 0
2020-08-10: 0
2020-08-11: 112.58471814012739
2020-08-12: 941.0976010225348
2020-08-13: 1258.0922821625677
2020-08-14: 1898.4558916691335
2020-08-15: 1018.6096097413744
2020-08-16: 233.79742030678995
2020-08-17: 0
2020-08-18: 0
2020-08-19: 1681.8663463374564
2020-08-20: 1995.2693329089793
2020-08-21: 2065.208355327036
2020-08-22: 1380.9761211077166
2020-08-23: 841.3429733398646
2020-08-24: 0
2020-08-25: 492.62682775815347
2020-08-26: 1812.262595467269
2020-08-27: 2358.973562797615
2020-08-28: 2368.30486260148
2020-08-29: 1519.8687667646586
2020-08-30: 932.6653408188553
2020-08-31: 173.74991897770644

Predicting for Brazil__Ceara
2020-08-01: 564.411543627497
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 418.16008994772267
2020-08-06: 846.585

2020-08-22: 2051.5198891588975
2020-08-23: 919.5568050812745
2020-08-24: 0
2020-08-25: 740.8818468096574
2020-08-26: 3165.0304897087667
2020-08-27: 4690.0482646073115
2020-08-28: 4643.796137150107
2020-08-29: 2221.209999693318
2020-08-30: 952.0588365109246
2020-08-31: 15.39060916399535

Predicting for Brazil__Rio de Janeiro
2020-08-01: 1691.2896829481738
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 53.48087021697734
2020-08-06: 1032.6587064701312
2020-08-07: 1989.5146038850899
2020-08-08: 1764.6324699455517
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 1234.4514102579656
2020-08-14: 2331.8052015718536
2020-08-15: 1580.6045136272019
2020-08-16: 625.3113191854849
2020-08-17: 0
2020-08-18: 0
2020-08-19: 713.4742981997001
2020-08-20: 1617.1130464888433
2020-08-21: 2478.3298544459963
2020-08-22: 1959.5158789044153
2020-08-23: 989.3896280775207
2020-08-24: 0
2020-08-25: 319.9363211925628
2020-08-26: 953.6420490952812
2020-08-27: 1856.1193853378677
2020-08-28: 2

2020-08-01: 188.50458139010877
2020-08-02: 111.82493254100457
2020-08-03: 0
2020-08-04: 52.37436895518755
2020-08-05: 240.46681512948163
2020-08-06: 73.29332368177393
2020-08-07: 18.226884859487928
2020-08-08: 179.28820291794716
2020-08-09: 148.07302562520982
2020-08-10: 0
2020-08-11: 67.13835054589798
2020-08-12: 227.23570384108214
2020-08-13: 124.19273025128047
2020-08-14: 20.721185548446677
2020-08-15: 187.26176959255707
2020-08-16: 203.96625576830684
2020-08-17: 0
2020-08-18: 114.5291272926219
2020-08-19: 251.1828364824043
2020-08-20: 202.62144893651967
2020-08-21: 76.82661541656384
2020-08-22: 212.05916738561393
2020-08-23: 230.58156516422332
2020-08-24: 77.32430134919089
2020-08-25: 151.62590324288834
2020-08-26: 259.89532190536715
2020-08-27: 240.8640081972266
2020-08-28: 125.54100238483596
2020-08-29: 223.62242493170865
2020-08-30: 250.54215718510997
2020-08-31: 121.71098766268402

Predicting for Central African Republic__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04

2020-08-22: 392.8301449685231
2020-08-23: 366.8313680317299
2020-08-24: 20.4190451199676
2020-08-25: 246.15201141524196
2020-08-26: 1229.027349610607
2020-08-27: 1459.6011952977228
2020-08-28: 605.606216504921
2020-08-29: 284.61945340093496
2020-08-30: 313.1677151557178
2020-08-31: 47.08161452416179

Predicting for Cuba__nan
2020-08-01: 19.10996718068786
2020-08-02: 37.51165073544959
2020-08-03: 44.57936140420916
2020-08-04: 34.26773575047889
2020-08-05: 50.65523334836986
2020-08-06: 48.355446175058376
2020-08-07: 29.37310351025511
2020-08-08: 29.106551478463828
2020-08-09: 20.21962359107407
2020-08-10: 28.254029365672878
2020-08-11: 37.37606951702719
2020-08-12: 12.501859859917527
2020-08-13: 14.684260143451624
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 23.86744222044201
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predictin

2020-08-01: 366.27894184932256
2020-08-02: 413.13023809829605
2020-08-03: 406.2015663584109
2020-08-04: 258.0851714521765
2020-08-05: 230.53560921960414
2020-08-06: 129.09629887274508
2020-08-07: 241.27778542536035
2020-08-08: 454.51985064922576
2020-08-09: 401.58821313615726
2020-08-10: 450.2509707789668
2020-08-11: 390.3620191985599
2020-08-12: 163.35708147168782
2020-08-13: 163.9719401709682
2020-08-14: 304.2397597093524
2020-08-15: 377.0424634824001
2020-08-16: 476.3874872697385
2020-08-17: 485.34273226531
2020-08-18: 356.8032416843239
2020-08-19: 212.62504749355486
2020-08-20: 166.35393346543572
2020-08-21: 245.526312518086
2020-08-22: 399.77623315735286
2020-08-23: 529.7493215617777
2020-08-24: 474.36509790514157
2020-08-25: 379.71394810317025
2020-08-26: 203.76958597681556
2020-08-27: 199.07943983399878
2020-08-28: 263.83082054212576
2020-08-29: 409.8546278704631
2020-08-30: 520.8735359811635
2020-08-31: 494.61121242821025

Predicting for Ethiopia__nan
2020-08-01: 343.8857229928

2020-08-23: 140.46347818124005
2020-08-24: 150.06375252408867
2020-08-25: 149.47754214225387
2020-08-26: 146.11545041571478
2020-08-27: 149.28683300132082
2020-08-28: 93.77289527917809
2020-08-29: 48.03273468354117
2020-08-30: 96.40549480490104
2020-08-31: 119.25021276073235

Predicting for Guinea__nan
2020-08-01: 6.25626541205008
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Gambia__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 1.606151303174375
2020-08-04: 13.705582545153492
2020-08-05: 27.95539976061461
2020-08-06: 41.997683537706415
2020-08-07: 13.311137975566684
2020-08-08: 6.502610025517328
2020-08-

2020-08-26: 0
2020-08-27: 25.28867613941287
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 3.841091313876893

Predicting for Hungary__nan
2020-08-01: 3534.368578961256
2020-08-02: 4520.6606046277775
2020-08-03: 3647.773176777139
2020-08-04: 2962.010106684035
2020-08-05: 1982.4623269027447
2020-08-06: 0
2020-08-07: 616.1605890576209
2020-08-08: 3039.6652140746983
2020-08-09: 3804.8607698551486
2020-08-10: 3539.382708789574
2020-08-11: 2693.5248689972395
2020-08-12: 1140.4305915644031
2020-08-13: 0
2020-08-14: 208.61342982048228
2020-08-15: 2433.467216428646
2020-08-16: 4163.298734218894
2020-08-17: 3533.864560312269
2020-08-18: 2253.0267627833855
2020-08-19: 1241.9619435427712
2020-08-20: 0
2020-08-21: 61.81940852940863
2020-08-22: 2597.305854888769
2020-08-23: 4425.405851058583
2020-08-24: 3853.962439218183
2020-08-25: 2553.308824481514
2020-08-26: 1105.8342164523815
2020-08-27: 0
2020-08-28: 387.70442554989773
2020-08-29: 2680.9728056035915
2020-08-30: 4622.657905049915
2020-08

2020-08-13: 272.2123027922438
2020-08-14: 305.13061408209586
2020-08-15: 248.0679175571854
2020-08-16: 446.9416165966736
2020-08-17: 370.27507898690993
2020-08-18: 180.27021307944815
2020-08-19: 200.6845098547127
2020-08-20: 172.25589016620444
2020-08-21: 158.0832609725942
2020-08-22: 216.03869315474768
2020-08-23: 395.71152377408936
2020-08-24: 305.41627970591344
2020-08-25: 197.14261663817217
2020-08-26: 120.4020012035547
2020-08-27: 63.15413022422911
2020-08-28: 120.82264505572175
2020-08-29: 141.89141280558445
2020-08-30: 296.24466474115866
2020-08-31: 235.78592348606924

Predicting for Kenya__nan
2020-08-01: 622.2058182195998
2020-08-02: 631.0519815267294
2020-08-03: 400.7716406610652
2020-08-04: 76.89023262129965
2020-08-05: 0
2020-08-06: 0
2020-08-07: 281.29594208068926
2020-08-08: 552.7306456568724
2020-08-09: 505.76295560630194
2020-08-10: 351.0133036764925
2020-08-11: 0
2020-08-12: 0
2020-08-13: 3.1095145682263237
2020-08-14: 291.7958036656961
2020-08-15: 451.1281236992421
20


Predicting for Latvia__nan
2020-08-01: 619.1481226280434
2020-08-02: 551.6260117827235
2020-08-03: 463.13170943679495
2020-08-04: 195.73148709584652
2020-08-05: 46.01852364345527
2020-08-06: 138.99701293411735
2020-08-07: 363.9333184739442
2020-08-08: 586.6264651385635
2020-08-09: 507.23006713348735
2020-08-10: 389.6297604788447
2020-08-11: 188.69346392550932
2020-08-12: 0
2020-08-13: 60.220128082383255
2020-08-14: 365.82555324654993
2020-08-15: 436.11994769212276
2020-08-16: 492.2747366130893
2020-08-17: 339.5210197035172
2020-08-18: 101.04342799056117
2020-08-19: 0
2020-08-20: 25.55746752353413
2020-08-21: 283.9005708892628
2020-08-22: 434.8489135780891
2020-08-23: 527.7498867113244
2020-08-24: 325.3637841066796
2020-08-25: 100.61691440312455
2020-08-26: 0
2020-08-27: 0
2020-08-28: 288.4364088238622
2020-08-29: 446.70302263526287
2020-08-30: 517.156119954909
2020-08-31: 320.50474210584565

Predicting for Macao__nan
2020-08-01: 0
2020-08-02: 1.8102307627029361
2020-08-03: 0.293968861

2020-08-03: 0
2020-08-04: 0
2020-08-05: 8.204059351115031
2020-08-06: 12.458425159502713
2020-08-07: 13.901972143316772
2020-08-08: 14.610660570035556
2020-08-09: 13.674775173418826
2020-08-10: 17.633170629448216
2020-08-11: 20.28851155654506
2020-08-12: 22.84435992145731
2020-08-13: 26.593415580813286
2020-08-14: 23.54424786673299
2020-08-15: 13.911132002838073
2020-08-16: 12.135550741181534
2020-08-17: 12.772580794832258
2020-08-18: 10.24830604249947
2020-08-19: 1.5236103829928078
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 7.659391997283922
2020-08-26: 12.747626062687772
2020-08-27: 15.640324321866046
2020-08-28: 6.926579899794151
2020-08-29: 10.091089219812442
2020-08-30: 14.228700106112758
2020-08-31: 22.646660710822694

Predicting for Malaysia__nan
2020-08-01: 875.2803485686396
2020-08-02: 1147.2326317077404
2020-08-03: 1250.8392284547695
2020-08-04: 923.2984237813042
2020-08-05: 1308.3343413763691
2020-08-06: 1161.7377170349178
2020-08-07: 6

2020-08-06: 53.25350672935657
2020-08-07: 61.42763246020096
2020-08-08: 74.08912818322821
2020-08-09: 93.53115360322553
2020-08-10: 104.83286344869089
2020-08-11: 122.2435594574772
2020-08-12: 140.95373030406182
2020-08-13: 126.70562045915126
2020-08-14: 123.07800666806641
2020-08-15: 116.59604966593369
2020-08-16: 101.07609706359081
2020-08-17: 97.26369097775542
2020-08-18: 73.89961513517609
2020-08-19: 52.092783204177465
2020-08-20: 25.349086994889948
2020-08-21: 2.503241036528715
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Oman__nan
2020-08-01: 52.52567969472456
2020-08-02: 0
2020-08-03: 104.31173718405674
2020-08-04: 396.30186743077263
2020-08-05: 376.3098837561935
2020-08-06: 84.63584594116978
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 44.159374404573924
2020-08-11: 313.92483985758304
2020-08-12: 339.52724152898145
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
202

2020-08-25: 428.46619136664384
2020-08-26: 108.29467186815225
2020-08-27: 0
2020-08-28: 0
2020-08-29: 67.6519122021337
2020-08-30: 364.1222928236497
2020-08-31: 567.4054917965435

Predicting for Romania__nan
2020-08-01: 5685.273408504384
2020-08-02: 5840.045443088952
2020-08-03: 4927.484432498016
2020-08-04: 2801.3688038892947
2020-08-05: 2065.296786998311
2020-08-06: 2384.7476356636494
2020-08-07: 3474.6760289890685
2020-08-08: 5439.937386507519
2020-08-09: 5053.918720516231
2020-08-10: 4728.719553533942
2020-08-11: 2619.514889691185
2020-08-12: 992.5087246017426
2020-08-13: 2139.07651497915
2020-08-14: 3479.714970547032
2020-08-15: 3870.513401934474
2020-08-16: 5189.449899539198
2020-08-17: 4195.303164316058
2020-08-18: 1810.0682067484572
2020-08-19: 1532.6927247306007
2020-08-20: 1985.1489072783188
2020-08-21: 3214.3113220444766
2020-08-22: 4305.7467485888
2020-08-23: 5520.6302780557835
2020-08-24: 4150.10560833243
2020-08-25: 2306.7120849685703
2020-08-26: 1301.6791270738781
2020-0

2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Suriname__nan
2020-08-01: 0
2020-08-02: 0
2020-08-03: 0
2020-08-04: 0
2020-08-05: 0
2020-08-06: 0
2020-08-07: 0
2020-08-08: 0
2020-08-09: 0
2020-08-10: 0
2020-08-11: 0
2020-08-12: 0
2020-08-13: 0
2020-08-14: 0
2020-08-15: 0
2020-08-16: 0
2020-08-17: 0
2020-08-18: 0
2020-08-19: 0
2020-08-20: 0
2020-08-21: 0
2020-08-22: 0
2020-08-23: 0
2020-08-24: 0.5054027321407659
2020-08-25: 4.6368743087710875
2020-08-26: 9.48421872837389
2020-08-27: 0
2020-08-28: 0
2020-08-29: 0
2020-08-30: 0
2020-08-31: 0

Predicting for Slovak Republic__nan
2020-08-01: 2167

2020-08-09: 68.1314655700804
2020-08-10: 77.93073636857662
2020-08-11: 92.07493634078843
2020-08-12: 106.16418595758842
2020-08-13: 117.59273808225834
2020-08-14: 120.25480228501112
2020-08-15: 117.79464330424366
2020-08-16: 121.30949429243555
2020-08-17: 127.04896783760833
2020-08-18: 115.15436906886023
2020-08-19: 108.35216996299036
2020-08-20: 106.50248953258071
2020-08-21: 99.78415298403317
2020-08-22: 91.87749557438889
2020-08-23: 94.07630096315773
2020-08-24: 89.99189331784822
2020-08-25: 63.747081021138406
2020-08-26: 57.88321913055822
2020-08-27: 86.95584049925613
2020-08-28: 29.75047384049067
2020-08-29: 11.734275618822178
2020-08-30: 0
2020-08-31: 0

Predicting for Trinidad and Tobago__nan
2020-08-01: 0
2020-08-02: 4.286984429477979
2020-08-03: 1.316775989355273
2020-08-04: 9.617350561991987
2020-08-05: 19.92623649054187
2020-08-06: 12.283320337940944
2020-08-07: 25.21865331298713
2020-08-08: 35.15365687049399
2020-08-09: 36.979031508235934
2020-08-10: 33.33196456459913
2020-

2020-08-22: 103745.38311527003
2020-08-23: 151219.71503703113
2020-08-24: 135460.11109239748
2020-08-25: 112173.36195278745
2020-08-26: 102904.22224869014
2020-08-27: 92317.32817701304
2020-08-28: 92241.73967421711
2020-08-29: 103533.57116548107
2020-08-30: 152109.32199776766
2020-08-31: 143316.51133920567

Predicting for United States__Alaska
2020-08-01: 388.7244655232331
2020-08-02: 384.33495931554967
2020-08-03: 391.0499181811665
2020-08-04: 352.40900634443733
2020-08-05: 148.56290842032962
2020-08-06: 67.27332855586133
2020-08-07: 130.06733188132506
2020-08-08: 235.32399992096072
2020-08-09: 263.8347800168097
2020-08-10: 321.25497647823806
2020-08-11: 205.33734845288694
2020-08-12: 20.837234876325645
2020-08-13: 0
2020-08-14: 0
2020-08-15: 108.49221183654224
2020-08-16: 214.39107223139268
2020-08-17: 213.67647327412024
2020-08-18: 140.78421565028668
2020-08-19: 30.846743517880526
2020-08-20: 0
2020-08-21: 0
2020-08-22: 128.6463118503752
2020-08-23: 252.01230097528003
2020-08-24: 25

2020-08-23: 513.859441091874
2020-08-24: 520.8547635607384
2020-08-25: 464.73026510579814
2020-08-26: 358.2216158832226
2020-08-27: 197.79431427580272
2020-08-28: 62.23270179560269
2020-08-29: 155.26911244289332
2020-08-30: 417.8054216618459
2020-08-31: 498.5533459673647

Predicting for United States__Florida
2020-08-01: 8176.913043685265
2020-08-02: 9147.739119841257
2020-08-03: 8757.412559423195
2020-08-04: 5545.652295258868
2020-08-05: 5464.623152941039
2020-08-06: 5315.151277964112
2020-08-07: 5239.9846899782115
2020-08-08: 7792.518325174635
2020-08-09: 7786.707503408061
2020-08-10: 8742.51663868314
2020-08-11: 5288.458964231414
2020-08-12: 4118.338879214284
2020-08-13: 4126.048944458361
2020-08-14: 4680.600204928854
2020-08-15: 5933.168548153663
2020-08-16: 8383.307714047214
2020-08-17: 7399.688342461076
2020-08-18: 4832.733886831855
2020-08-19: 4761.023519405207
2020-08-20: 3859.931490401353
2020-08-21: 3986.227941582209
2020-08-22: 6043.956694694521
2020-08-23: 8741.908631855616

2020-08-19: 445.85596788568967
2020-08-20: 632.2237588159695
2020-08-21: 1157.9110694277401
2020-08-22: 1604.3587834956406
2020-08-23: 2300.023900359515
2020-08-24: 1693.341225727795
2020-08-25: 790.4429112509824
2020-08-26: 291.6589168910104
2020-08-27: 583.9418002141714
2020-08-28: 1307.7341663588413
2020-08-29: 1608.360811796023
2020-08-30: 2219.456487041377
2020-08-31: 1687.436616306039

Predicting for United States__Louisiana
2020-08-01: 1511.7852245944032
2020-08-02: 1358.664840900095
2020-08-03: 999.7686859558249
2020-08-04: 228.1563129534747
2020-08-05: 1113.473870353379
2020-08-06: 1503.085491695872
2020-08-07: 1595.1625331429614
2020-08-08: 1485.769620951749
2020-08-09: 450.3660008917587
2020-08-10: 602.2648383333213
2020-08-11: 58.16249110701122
2020-08-12: 775.7200558758842
2020-08-13: 617.5280367161862
2020-08-14: 1440.5744760991465
2020-08-15: 953.9580565678153
2020-08-16: 53.4806084138643
2020-08-17: 142.1845700284079
2020-08-18: 0
2020-08-19: 1011.6983464430044
2020-08-

2020-08-23: 223.4935974481665
2020-08-24: 167.57269476931208
2020-08-25: 0
2020-08-26: 0
2020-08-27: 0
2020-08-28: 123.48728945319404
2020-08-29: 126.974005690753
2020-08-30: 214.9998455450737
2020-08-31: 127.48563970765585

Predicting for United States__North Carolina
2020-08-01: 4976.305073353532
2020-08-02: 5991.557831344028
2020-08-03: 5732.1074105242715
2020-08-04: 4878.805406085229
2020-08-05: 4027.2448262015223
2020-08-06: 2928.873721489135
2020-08-07: 3250.3272567874824
2020-08-08: 5124.708837982523
2020-08-09: 5119.929614666632
2020-08-10: 6231.833329980691
2020-08-11: 4667.132683974891
2020-08-12: 3178.897549992485
2020-08-13: 2507.504371765125
2020-08-14: 2827.732869347356
2020-08-15: 3927.4853408134104
2020-08-16: 5481.5605365774145
2020-08-17: 5625.816977510288
2020-08-18: 4270.663245428693
2020-08-19: 3759.364939331095
2020-08-20: 2313.1843118782203
2020-08-21: 2433.971404442064
2020-08-22: 4123.171194370706
2020-08-23: 5835.688022728155
2020-08-24: 5855.7366058417365
202

2020-08-24: 4671.018531147586
2020-08-25: 4392.40825965829
2020-08-26: 4361.8128541832975
2020-08-27: 5188.150045028289
2020-08-28: 4113.26794005799
2020-08-29: 1910.1147471843892
2020-08-30: 4050.376282490039
2020-08-31: 4516.374598968173

Predicting for United States__Oklahoma
2020-08-01: 1782.872048054097
2020-08-02: 3100.7098853881125
2020-08-03: 4064.9767932018262
2020-08-04: 3059.408331754821
2020-08-05: 1350.4989620609308
2020-08-06: 1298.64774970109
2020-08-07: 958.9856255655584
2020-08-08: 1925.065931671446
2020-08-09: 2944.5867991335126
2020-08-10: 4268.512098513364
2020-08-11: 3056.069505763032
2020-08-12: 1145.783530873442
2020-08-13: 766.6698161416958
2020-08-14: 759.9351934299306
2020-08-15: 1611.6968741958462
2020-08-16: 2991.789507868009
2020-08-17: 4086.5078453855435
2020-08-18: 2836.2696891527917
2020-08-19: 1479.860465741613
2020-08-20: 472.858305528375
2020-08-21: 602.4062176285034
2020-08-22: 1541.2529796044555
2020-08-23: 3236.891797247703
2020-08-24: 4077.8736902

2020-08-29: 1523.1554440926734
2020-08-30: 2277.977760810981
2020-08-31: 1973.1572291235607

Predicting for United States__Virginia
2020-08-01: 1993.4509849186375
2020-08-02: 2433.788010349564
2020-08-03: 2875.115106415309
2020-08-04: 2584.5668962580753
2020-08-05: 2688.6545840638823
2020-08-06: 2162.636063697157
2020-08-07: 1652.8097130906558
2020-08-08: 2162.7941270993047
2020-08-09: 1997.7853676173977
2020-08-10: 2577.76417044014
2020-08-11: 2675.301554107247
2020-08-12: 1828.9063271590335
2020-08-13: 1565.9339211417214
2020-08-14: 1283.7275495796844
2020-08-15: 1117.6141854692087
2020-08-16: 1801.8784686491972
2020-08-17: 2278.0760304333353
2020-08-18: 2156.524743552883
2020-08-19: 2028.3197575083004
2020-08-20: 1488.3433078375706
2020-08-21: 941.0087943586467
2020-08-22: 1131.6875000188538
2020-08-23: 1993.7426876569289
2020-08-24: 2272.4348452085287
2020-08-25: 2363.9565486277647
2020-08-26: 1977.186301003288
2020-08-27: 1328.463163057373
2020-08-28: 960.3666202032495
2020-08-29:

2020-08-01: 6496.047274703903
2020-08-02: 6708.944699403232
2020-08-03: 6447.54702924026
2020-08-04: 4998.727767787101
2020-08-05: 4751.356388896319
2020-08-06: 4546.182931438604
2020-08-07: 4576.318522889795
2020-08-08: 7427.635150242105
2020-08-09: 6965.084564011252
2020-08-10: 6884.2988171838215
2020-08-11: 5784.768784785189
2020-08-12: 4065.8003415304597
2020-08-13: 3841.838214827196
2020-08-14: 4672.385526639867
2020-08-15: 5839.4893868372765
2020-08-16: 7080.144400287321
2020-08-17: 6697.322546440278
2020-08-18: 4882.587438137831
2020-08-19: 4569.3548115277345
2020-08-20: 3726.050938612474
2020-08-21: 4202.430454314473
2020-08-22: 5857.496072841093
2020-08-23: 7666.063547532426
2020-08-24: 6762.798854283226
2020-08-25: 5410.900878662588
2020-08-26: 4538.445171748349
2020-08-27: 3653.505336087444
2020-08-28: 4735.737544308275
2020-08-29: 5936.982725941978
2020-08-30: 7766.912536754831
2020-08-31: 6968.451227554587

Predicting for Zambia__nan
2020-08-01: 35.41710638908934
2020-08-0

In [49]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,0.0
214,Aruba,,2020-08-02,0.0
215,Aruba,,2020-08-03,0.0
216,Aruba,,2020-08-04,0.0
217,Aruba,,2020-08-05,0.0


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [50]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [51]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,0.0
Aruba,,2020-08-02,0.0
Aruba,,2020-08-03,0.0
Aruba,,2020-08-04,0.0
Afghanistan,,2020-08-01,92.87378979947317
Afghanistan,,2020-08-02,1.2652484666498758
Afghanistan,,2020-08-03,72.53020814220258
Afghanistan,,2020-08-04,134.0990297639328
Angola,,2020-08-01,44.22028773532953


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [52]:
import sys,os,os.path
sys.path.append(os.path.expanduser('/home/thinng/code/2020/covid-xprize/'))

In [53]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [54]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [55]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 268 ms, sys: 44.9 ms, total: 313 ms
Wall time: 7.02 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [56]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-24
End date: 2021-06-22


In [57]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [58]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-24 to 2021-06-22...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 5.68 s, sys: 758 ms, total: 6.44 s
Wall time: 2min 27s
