# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f16cbba3490>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92693,Zimbabwe,,Zimbabwe__nan,2020-07-27,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
92694,Zimbabwe,,Zimbabwe__nan,2020-07-28,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
92695,Zimbabwe,,Zimbabwe__nan,2020-07-29,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
92696,Zimbabwe,,Zimbabwe__nan,2020-07-30,213.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [15]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=42)

In [16]:
!pip install lightgbm



In [17]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
# model = Lasso(alpha=0.1,
#               precompute=True,
#               max_iter=10000,
#               positive=True,
#               selection='random')

# model = Lasso()
# # Fit model
# model.fit(X_train, y_train)

In [35]:
# change from Lasso to lightgbm

In [36]:
import random

def seed_everything(seed=0):
    random.seed(seed)
    np.random.seed(seed)
    
seed_everything(42) 

import lightgbm as lgb
NUM_ESTIMATORS = 10000
lgb_params = {
                    'boosting_type': 'gbdt',
                    'objective': 'tweedie',
                    'tweedie_variance_power': 1.1,
                    'metric': 'mae',
                    'subsample': 0.5,
                    'subsample_freq': 1,
                    'learning_rate': 0.05,
                    'num_leaves': 2**11-1,
                    'min_data_in_leaf': 2**12-1,
                    'feature_fraction': 0.5,
                    'max_bin': 100,
                    'n_estimators': NUM_ESTIMATORS,
                    'boost_from_average': False,
                    'verbose': -1,
                } 

# set negative values to zeros
y_train = np.maximum(y_train, 0)
train_data = lgb.Dataset(X_train,  label=y_train)
model = lgb.train(lgb_params,train_data,valid_sets = [train_data],verbose_eval = NUM_ESTIMATORS//20)

[500]	training's l1: 116.695
[1000]	training's l1: 106.821
[1500]	training's l1: 101.451
[2000]	training's l1: 97.5733
[2500]	training's l1: 94.5429
[3000]	training's l1: 91.9153
[3500]	training's l1: 89.5364
[4000]	training's l1: 87.0959
[4500]	training's l1: 85.2548
[5000]	training's l1: 83.5195
[5500]	training's l1: 82.0762
[6000]	training's l1: 80.2533
[6500]	training's l1: 78.7294
[7000]	training's l1: 77.341
[7500]	training's l1: 75.9138
[8000]	training's l1: 74.5701
[8500]	training's l1: 73.4322
[9000]	training's l1: 72.4657
[9500]	training's l1: 71.2562
[10000]	training's l1: 70.2531


In [37]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 70.25309058587398
Test MAE: 116.10916097364483


In [38]:
# Lasso:
# Train MAE: 125.79058821618477
# Test MAE: 126.2785299934449

In [39]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
# for (col, coeff) in zip(x_col_names, list(model.coef_)):
#     if coeff != 0.:
#         print(col, coeff)
# print('Intercept', model.intercept_)

In [40]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [41]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [42]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 11.290425620760875
2020-08-02: 10.039194205035018
2020-08-03: 11.425396219616268
2020-08-04: 11.526868044645925
2020-08-05: 8.034062856753383
2020-08-06: 6.431140855041713
2020-08-07: 6.736066034225573
2020-08-08: 10.325793490547667
2020-08-09: 11.96626145102764
2020-08-10: 11.849329190262857
2020-08-11: 10.374685300155791
2020-08-12: 7.993197856354013
2020-08-13: 6.880708960766592
2020-08-14: 6.1352551059722735
2020-08-15: 9.268891994530339
2020-08-16: 9.974877258940714
2020-08-17: 8.344708257281528
2020-08-18: 9.419041541459475
2020-08-19: 9.061489094901795
2020-08-20: 6.363561336165582
2020-08-21: 6.1854112816203966
2020-08-22: 8.949982940247983
2020-08-23: 9.738967238066758
2020-08-24: 8.074845987496918
2020-08-25: 8.330441489704102
2020-08-26: 7.301027393929616
2020-08-27: 7.503636365583012
2020-08-28: 5.726029743555302
2020-08-29: 8.240017506885856
2020-08-30: 8.473863594261495
2020-08-31: 7.444958094791111

Predicting for Afghanistan__nan
2

2020-08-18: 672.0360733685918
2020-08-19: 640.3213064606407
2020-08-20: 579.7937234669079
2020-08-21: 425.73079908484647
2020-08-22: 514.1875903510564
2020-08-23: 479.48220226355846
2020-08-24: 341.4840642498888
2020-08-25: 290.69663738233794
2020-08-26: 399.3287361836964
2020-08-27: 309.9580592671162
2020-08-28: 216.8862075431163
2020-08-29: 176.28613450563694
2020-08-30: 275.78789253672
2020-08-31: 165.3249548816385

Predicting for Burundi__nan
2020-08-01: 4.107614375447628
2020-08-02: 1.5429606003266076
2020-08-03: 1.2302855951153444
2020-08-04: 1.6811826800581267
2020-08-05: 2.667128661751726
2020-08-06: 2.4773955852353313
2020-08-07: 2.5552690466539514
2020-08-08: 2.934190336524981
2020-08-09: 3.043987846134058
2020-08-10: 3.9809475075479153
2020-08-11: 4.294979820222463
2020-08-12: 5.679298279375914
2020-08-13: 5.282051976063204
2020-08-14: 9.36458350904861
2020-08-15: 17.656365654938828
2020-08-16: 14.931591188925465
2020-08-17: 22.00130684683516
2020-08-18: 23.266488509340356
2

2020-08-06: 383.52294599951205
2020-08-07: 305.8422083570269
2020-08-08: 561.5144103879297
2020-08-09: 752.3108085884828
2020-08-10: 467.66651271917766
2020-08-11: 569.4850359736604
2020-08-12: 568.0581294119455
2020-08-13: 329.63183940000823
2020-08-14: 209.85548132098313
2020-08-15: 270.67550649548417
2020-08-16: 382.63844136247445
2020-08-17: 379.094223655243
2020-08-18: 362.9786730948902
2020-08-19: 287.7883823179316
2020-08-20: 170.0163793337941
2020-08-21: 142.32062603828516
2020-08-22: 218.87365965994476
2020-08-23: 182.5125308280645
2020-08-24: 225.94865386122544
2020-08-25: 191.19335639507162
2020-08-26: 160.92530352654433
2020-08-27: 101.36886019608329
2020-08-28: 75.78149003028251
2020-08-29: 110.15321348948801
2020-08-30: 141.23204371281042
2020-08-31: 121.77683113372153

Predicting for Belarus__nan
2020-08-01: 1071.0203494089676
2020-08-02: 1494.7427057151172
2020-08-03: 1371.378327197836
2020-08-04: 1405.1563644211933
2020-08-05: 957.0663645684873
2020-08-06: 711.06755434

2020-08-24: 208.02846679798282
2020-08-25: 77.76299377798222
2020-08-26: 74.92532832143486
2020-08-27: 129.50723858580008
2020-08-28: 169.05614645984832
2020-08-29: 197.6281821749524
2020-08-30: 151.71871745052363
2020-08-31: 115.07910245075367

Predicting for Brazil__Bahia
2020-08-01: 875.288647644146
2020-08-02: 2261.381795520622
2020-08-03: 1051.904021826379
2020-08-04: 424.6185309417788
2020-08-05: 485.13556771823505
2020-08-06: 760.6497194101062
2020-08-07: 1038.3565788755811
2020-08-08: 701.9253154455544
2020-08-09: 913.4690178993737
2020-08-10: 734.8984174592496
2020-08-11: 218.14989166384032
2020-08-12: 194.1550516029117
2020-08-13: 468.0938534666999
2020-08-14: 836.9346763441481
2020-08-15: 660.0519886036742
2020-08-16: 762.1357617482705
2020-08-17: 425.04390958328156
2020-08-18: 243.51114136047843
2020-08-19: 263.8455081444237
2020-08-20: 405.22240452464087
2020-08-21: 413.90208815097014
2020-08-22: 495.2916531104633
2020-08-23: 462.77295865278705
2020-08-24: 351.037747471193

2020-08-16: 585.4186229293991
2020-08-17: 422.3547869234839
2020-08-18: 287.0685248054351
2020-08-19: 264.7398392058578
2020-08-20: 355.35276640172464
2020-08-21: 387.55565630633583
2020-08-22: 477.34860059928326
2020-08-23: 549.2219438021303
2020-08-24: 364.5531182082545
2020-08-25: 143.1519959624485
2020-08-26: 150.64519809735057
2020-08-27: 261.4829112069053
2020-08-28: 194.66693751301892
2020-08-29: 275.61684976995633
2020-08-30: 318.90819723045496
2020-08-31: 207.339762361905

Predicting for Brazil__Paraiba
2020-08-01: 406.703376747169
2020-08-02: 680.4536171339752
2020-08-03: 752.8591447151464
2020-08-04: 407.3368632596182
2020-08-05: 192.0424209506318
2020-08-06: 407.9428003538984
2020-08-07: 524.7376982107495
2020-08-08: 604.9609686457278
2020-08-09: 640.6405185888532
2020-08-10: 481.4767857284608
2020-08-11: 167.04160513765706
2020-08-12: 171.419145089082
2020-08-13: 359.27718215252725
2020-08-14: 607.4394024377768
2020-08-15: 669.1157890499388
2020-08-16: 750.4375887420165
20

2020-08-01: 1086.583231734293
2020-08-02: 3884.931881705366
2020-08-03: 1217.1537432353268
2020-08-04: 452.336555857796
2020-08-05: 425.13018334494615
2020-08-06: 957.9925219586736
2020-08-07: 1130.1526916030584
2020-08-08: 1099.9518332573707
2020-08-09: 1206.3554833581168
2020-08-10: 1116.2950422193617
2020-08-11: 327.4132122892653
2020-08-12: 225.62875632784002
2020-08-13: 644.0254311469027
2020-08-14: 980.5816261380697
2020-08-15: 1015.4910842492795
2020-08-16: 947.9965558512083
2020-08-17: 626.0910920148118
2020-08-18: 325.9662864479192
2020-08-19: 342.11463845450925
2020-08-20: 531.7788566525004
2020-08-21: 622.0793059521742
2020-08-22: 633.6502083954516
2020-08-23: 704.104965659661
2020-08-24: 445.3312306591874
2020-08-25: 204.10422766343936
2020-08-26: 173.99684402354623
2020-08-27: 293.76592903728016
2020-08-28: 321.1070253794806
2020-08-29: 349.0268295069157
2020-08-30: 493.3128221394096
2020-08-31: 323.7929262742223

Predicting for Brazil__Sergipe
2020-08-01: 369.176986310153

2020-08-19: 1.4674734494222146
2020-08-20: 1.2445642107441581
2020-08-21: 1.099267492518959
2020-08-22: 1.4488832992041993
2020-08-23: 1.2682654650221379
2020-08-24: 1.1494936575428059
2020-08-25: 1.5656045642536909
2020-08-26: 1.3858612605733807
2020-08-27: 1.2512770475005335
2020-08-28: 1.669270780164097
2020-08-29: 1.347454200107349
2020-08-30: 1.4612079367206776
2020-08-31: 1.7009491410595907

Predicting for Canada__nan
2020-08-01: 3299.0766834088063
2020-08-02: 4575.181577845323
2020-08-03: 2764.193693958097
2020-08-04: 3066.0908144237283
2020-08-05: 2491.2932718513007
2020-08-06: 1670.938897803532
2020-08-07: 1583.876495879001
2020-08-08: 2072.8534737779223
2020-08-09: 1495.1849623034068
2020-08-10: 1365.8803961834515
2020-08-11: 1409.3318785221886
2020-08-12: 1096.7029976894753
2020-08-13: 885.728605798482
2020-08-14: 945.3039616591748
2020-08-15: 1164.3840871501875
2020-08-16: 777.425440901893
2020-08-17: 812.8554275261241
2020-08-18: 867.7938251236967
2020-08-19: 760.284552096

2020-08-11: 1.4487593910441918
2020-08-12: 1.535720312350388
2020-08-13: 1.4335625095350675
2020-08-14: 1.0622157966045418
2020-08-15: 2.2694432922383223
2020-08-16: 1.209649253694219
2020-08-17: 1.6326963837038722
2020-08-18: 2.3303924853984386
2020-08-19: 2.318949382435852
2020-08-20: 3.096198210272248
2020-08-21: 2.4696769749709175
2020-08-22: 3.276254192977028
2020-08-23: 2.919829333355959
2020-08-24: 3.5123181884262538
2020-08-25: 3.6351868375523795
2020-08-26: 6.287885743006304
2020-08-27: 7.421292026123801
2020-08-28: 8.05204695611964
2020-08-29: 13.912110234467606
2020-08-30: 18.526781622877976
2020-08-31: 23.561419434610112

Predicting for Cape Verde__nan
2020-08-01: 52.98239447957184
2020-08-02: 46.61511019159784
2020-08-03: 38.29943275641045
2020-08-04: 38.05778786031933
2020-08-05: 30.28736272726899
2020-08-06: 21.365317606295335
2020-08-07: 12.98863779791564
2020-08-08: 29.943837813216724
2020-08-09: 26.792211163721483
2020-08-10: 22.010326210964944
2020-08-11: 20.47158926

2020-08-01: 344.239887729766
2020-08-02: 436.3097978377741
2020-08-03: 740.5118144457141
2020-08-04: 1045.992461729504
2020-08-05: 1103.3792626740867
2020-08-06: 477.7802361801232
2020-08-07: 335.5662040369739
2020-08-08: 328.98961837635335
2020-08-09: 384.2351757640519
2020-08-10: 474.87411440489467
2020-08-11: 535.8978538548823
2020-08-12: 429.56930571671774
2020-08-13: 415.677089778887
2020-08-14: 191.40521332746582
2020-08-15: 185.59999650165815
2020-08-16: 264.21048096561003
2020-08-17: 420.0887324207425
2020-08-18: 434.27158614692956
2020-08-19: 359.8420605304868
2020-08-20: 294.6721800725766
2020-08-21: 232.83416891828335
2020-08-22: 211.26536535164593
2020-08-23: 198.56254079119822
2020-08-24: 231.70982756895367
2020-08-25: 235.34942007730714
2020-08-26: 280.30228544228004
2020-08-27: 181.3253886021488
2020-08-28: 80.98728120496739
2020-08-29: 92.87283555002564
2020-08-30: 146.11813063683562
2020-08-31: 144.6502004523631

Predicting for Algeria__nan
2020-08-01: 322.757241365525

2020-08-17: 203.06885547133584
2020-08-18: 182.82477073261302
2020-08-19: 115.0679072039857
2020-08-20: 91.62679609132448
2020-08-21: 69.34898947084525
2020-08-22: 104.14469934431544
2020-08-23: 120.19385960112842
2020-08-24: 118.64107264142244
2020-08-25: 79.73743838621594
2020-08-26: 61.822653837976034
2020-08-27: 47.18643864652444
2020-08-28: 33.27547039030402
2020-08-29: 63.18926394133692
2020-08-30: 60.44411227353187
2020-08-31: 51.61637581769051

Predicting for Fiji__nan
2020-08-01: 0.6256661566787398
2020-08-02: 0.8338056512488718
2020-08-03: 0.5865949433386344
2020-08-04: 0.6124914439042554
2020-08-05: 0.6732395573431958
2020-08-06: 0.7727375062582904
2020-08-07: 1.062975253541104
2020-08-08: 0.9950280625245095
2020-08-09: 1.0097513036402779
2020-08-10: 0.8979096194560365
2020-08-11: 1.0415575574530949
2020-08-12: 0.9495343801447678
2020-08-13: 0.9379415321805105
2020-08-14: 1.201872492304178
2020-08-15: 1.2617567534520602
2020-08-16: 1.288799923330678
2020-08-17: 1.06170881929


Predicting for United Kingdom__Wales
2020-08-01: 1284.9401584547459
2020-08-02: 1194.48169771539
2020-08-03: 447.0109511843425
2020-08-04: 237.93913738398746
2020-08-05: 278.9526661961503
2020-08-06: 292.57992628447585
2020-08-07: 403.74599684640714
2020-08-08: 591.7403037314473
2020-08-09: 475.72780694323404
2020-08-10: 232.39091951193117
2020-08-11: 136.39668762105347
2020-08-12: 99.87367574443138
2020-08-13: 204.69935999514306
2020-08-14: 378.2955445011031
2020-08-15: 498.4841521675327
2020-08-16: 262.20340956572807
2020-08-17: 169.49043601904992
2020-08-18: 103.9522964602739
2020-08-19: 122.34786873214271
2020-08-20: 212.25179735557163
2020-08-21: 225.59833538973214
2020-08-22: 318.57622277594226
2020-08-23: 248.79102664599688
2020-08-24: 171.65220919169522
2020-08-25: 65.21027714421497
2020-08-26: 104.47407505125886
2020-08-27: 99.23397682268252
2020-08-28: 110.43490641131808
2020-08-29: 185.25481946163097
2020-08-30: 158.38615026248374
2020-08-31: 73.08351158486562

Predicting f

2020-08-06: 47.77935491686007
2020-08-07: 43.26435852590566
2020-08-08: 36.58938739463694
2020-08-09: 58.75911643542468
2020-08-10: 83.94209883287598
2020-08-11: 67.82536886448631
2020-08-12: 75.23390680908545
2020-08-13: 41.910872498484096
2020-08-14: 41.96474559742124
2020-08-15: 33.88799082623157
2020-08-16: 58.2824741819418
2020-08-17: 78.94509124853747
2020-08-18: 86.43205893651334
2020-08-19: 52.4417558275313
2020-08-20: 46.40780325152985
2020-08-21: 43.379048369141934
2020-08-22: 44.15861939293567
2020-08-23: 52.85686684135534
2020-08-24: 90.48625680279592
2020-08-25: 77.63746833805989
2020-08-26: 57.410343395482464
2020-08-27: 60.87060620985023
2020-08-28: 39.385168794163604
2020-08-29: 41.33381297569992
2020-08-30: 51.14047653184053
2020-08-31: 79.15048148950267

Predicting for Hong Kong__nan
2020-08-01: 118.83034111493404
2020-08-02: 61.22724945988006
2020-08-03: 95.35752205503991
2020-08-04: 67.54077516469141
2020-08-05: 75.71373500062005
2020-08-06: 59.279899712396336
2020-


Predicting for Iraq__nan
2020-08-01: 947.8773054919617
2020-08-02: 1251.4271616129722
2020-08-03: 856.8358887275529
2020-08-04: 839.0199155375703
2020-08-05: 647.0149565476214
2020-08-06: 511.4318763125939
2020-08-07: 527.1933781217858
2020-08-08: 749.7231808100936
2020-08-09: 599.7757823426491
2020-08-10: 406.05707460672403
2020-08-11: 416.9747323866861
2020-08-12: 328.09116004598224
2020-08-13: 330.81394063263747
2020-08-14: 359.12724852749267
2020-08-15: 410.9143336395334
2020-08-16: 386.40583455650767
2020-08-17: 277.9324015490336
2020-08-18: 247.5579256403799
2020-08-19: 185.32930600985384
2020-08-20: 196.99024361890275
2020-08-21: 169.34467533037483
2020-08-22: 189.34514464500188
2020-08-23: 176.18065123743963
2020-08-24: 166.78008707915237
2020-08-25: 113.75383447858901
2020-08-26: 137.1431943095161
2020-08-27: 122.23154221404386
2020-08-28: 74.98515916008023
2020-08-29: 79.69168913803284
2020-08-30: 115.94509734530652
2020-08-31: 64.9871698950725

Predicting for Iceland__nan
2

2020-08-20: 81.75408276388792
2020-08-21: 58.49773874171898
2020-08-22: 109.83634800596316
2020-08-23: 151.4022536698798
2020-08-24: 117.72944853259963
2020-08-25: 120.11075266657836
2020-08-26: 82.26718424063903
2020-08-27: 49.651999261306244
2020-08-28: 28.260519309528895
2020-08-29: 76.96266762623654
2020-08-30: 82.8484702933208
2020-08-31: 51.471860293249996

Predicting for Kyrgyz Republic__nan
2020-08-01: 188.02879518682195
2020-08-02: 444.07768662999956
2020-08-03: 242.3707001498203
2020-08-04: 446.86579644649373
2020-08-05: 225.60776430769263
2020-08-06: 339.98163453114114
2020-08-07: 154.7778466269156
2020-08-08: 204.58378721859887
2020-08-09: 414.7970675897528
2020-08-10: 248.08373333787313
2020-08-11: 207.19295681511682
2020-08-12: 151.0776357429697
2020-08-13: 239.6816153734639
2020-08-14: 136.49576751677833
2020-08-15: 126.98358250353347
2020-08-16: 238.30604907532438
2020-08-17: 163.7072365720817
2020-08-18: 186.39460390377056
2020-08-19: 157.21800436598267
2020-08-20: 121

2020-08-01: 12.435533446366527
2020-08-02: 6.715499195320796
2020-08-03: 10.12482911829735
2020-08-04: 16.074148178401767
2020-08-05: 20.472406808365065
2020-08-06: 23.65451841807692
2020-08-07: 20.507542115028954
2020-08-08: 33.79000113845243
2020-08-09: 30.60861268244646
2020-08-10: 41.62230813091313
2020-08-11: 39.70231993468643
2020-08-12: 42.50678776424016
2020-08-13: 42.673373493373106
2020-08-14: 53.46760611604324
2020-08-15: 80.3766305093537
2020-08-16: 80.31638696296892
2020-08-17: 69.88566979366679
2020-08-18: 120.59942334525377
2020-08-19: 103.08581953165461
2020-08-20: 113.79438851225663
2020-08-21: 124.30008084388227
2020-08-22: 126.54292674196634
2020-08-23: 189.46338124338618
2020-08-24: 178.58917882696878
2020-08-25: 208.38055873655668
2020-08-26: 199.64005059031328
2020-08-27: 282.57740639055265
2020-08-28: 279.1844609149515
2020-08-29: 328.7260555143122
2020-08-30: 278.74009022630185
2020-08-31: 259.83796015845246

Predicting for Lithuania__nan
2020-08-01: 1186.356250

2020-08-29: 632.6999576218855
2020-08-30: 1249.4285728952425
2020-08-31: 1445.6901945656482

Predicting for Mali__nan
2020-08-01: 141.79926527742404
2020-08-02: 59.96506099170408
2020-08-03: 79.06861294637285
2020-08-04: 77.02778659069367
2020-08-05: 84.69467879508792
2020-08-06: 54.53342756866018
2020-08-07: 72.28337340306192
2020-08-08: 86.07364228828156
2020-08-09: 53.380226698381115
2020-08-10: 53.03784995033361
2020-08-11: 56.35477879021942
2020-08-12: 68.03603258976837
2020-08-13: 41.37671453628059
2020-08-14: 42.28240656818029
2020-08-15: 51.67833704229241
2020-08-16: 50.803154834230334
2020-08-17: 38.325131936501336
2020-08-18: 36.331305420579525
2020-08-19: 40.10680623455685
2020-08-20: 28.025180228121588
2020-08-21: 26.813966220841305
2020-08-22: 27.335541576207092
2020-08-23: 25.058610105660613
2020-08-24: 31.13690944996024
2020-08-25: 22.756705517987417
2020-08-26: 22.22323256466599
2020-08-27: 15.539842984182895
2020-08-28: 14.93072733267929
2020-08-29: 19.61817725995388
2

2020-08-13: 104.30382893451987
2020-08-14: 105.22597209622305
2020-08-15: 116.59285624791542
2020-08-16: 136.71129770651805
2020-08-17: 95.22703017946682
2020-08-18: 95.82675763131653
2020-08-19: 87.86675197122793
2020-08-20: 84.63330243855269
2020-08-21: 83.51233684284982
2020-08-22: 101.60015255302734
2020-08-23: 108.34272970553627
2020-08-24: 84.54669474040978
2020-08-25: 66.57739104569497
2020-08-26: 60.36281571811843
2020-08-27: 65.92413318396716
2020-08-28: 63.76745153133508
2020-08-29: 59.331503397562585
2020-08-30: 70.94818726748096
2020-08-31: 46.25049668653129

Predicting for Nigeria__nan
2020-08-01: 562.1832040976881
2020-08-02: 430.5187052110414
2020-08-03: 391.8725508231455
2020-08-04: 575.0831920762165
2020-08-05: 568.1627702579885
2020-08-06: 401.99554223166916
2020-08-07: 262.72852769519193
2020-08-08: 350.16259247618
2020-08-09: 367.3294159545141
2020-08-10: 289.42617447171637
2020-08-11: 381.12513146730333
2020-08-12: 232.39826644344376
2020-08-13: 142.75026900321572


2020-08-01: 360.87020766896154
2020-08-02: 583.0934789180527
2020-08-03: 570.6069523398664
2020-08-04: 812.1324737167558
2020-08-05: 944.7834264636048
2020-08-06: 473.9359717317259
2020-08-07: 196.03090999124245
2020-08-08: 702.2625625080947
2020-08-09: 429.9797661366583
2020-08-10: 733.2390873624493
2020-08-11: 334.11474863973325
2020-08-12: 766.6058041331843
2020-08-13: 380.78304754758597
2020-08-14: 272.32709008942817
2020-08-15: 290.0379998839913
2020-08-16: 591.9910302074927
2020-08-17: 500.11424070124616
2020-08-18: 324.428939357626
2020-08-19: 292.4999050502782
2020-08-20: 420.92730427829497
2020-08-21: 277.38948553267824
2020-08-22: 246.85449911407494
2020-08-23: 325.8051516939207
2020-08-24: 393.3312869788254
2020-08-25: 245.66762949827884
2020-08-26: 260.02990776934644
2020-08-27: 254.28254965378576
2020-08-28: 124.84506112980263
2020-08-29: 167.11264394388274
2020-08-30: 237.0462763251682
2020-08-31: 236.4533092689678

Predicting for Philippines__nan
2020-08-01: 661.04519547

2020-08-15: 17.32014060869617
2020-08-16: 16.948200705283682
2020-08-17: 16.812660119315545
2020-08-18: 18.017633945975682
2020-08-19: 14.45390561369425
2020-08-20: 12.19023160600147
2020-08-21: 12.030881506422043
2020-08-22: 9.400716746537888
2020-08-23: 12.684200160071851
2020-08-24: 13.5765933308686
2020-08-25: 13.478754610351768
2020-08-26: 13.162050711153617
2020-08-27: 9.207805047591258
2020-08-28: 6.876223457285647
2020-08-29: 8.212511438889015
2020-08-30: 10.134262892863937
2020-08-31: 7.190383825946136

Predicting for Kosovo__nan
2020-08-01: 241.9516742333813
2020-08-02: 196.7116402994954
2020-08-03: 422.08384334503927
2020-08-04: 529.2739001398335
2020-08-05: 596.3758455426363
2020-08-06: 392.0328303543523
2020-08-07: 187.77507083390412
2020-08-08: 179.2493239998921
2020-08-09: 182.75500703270805
2020-08-10: 272.59330776804285
2020-08-11: 367.31721727855245
2020-08-12: 351.3257255687912
2020-08-13: 243.48070523806052
2020-08-14: 155.5049768639843
2020-08-15: 127.0506970053105

2020-08-29: 1.2818102932067366
2020-08-30: 1.2510124703959047
2020-08-31: 1.2919018973527103

Predicting for Sierra Leone__nan
2020-08-01: 1.4477874963164425
2020-08-02: 2.135280482019851
2020-08-03: 1.245946005842612
2020-08-04: 1.3714243775956743
2020-08-05: 1.2527723764756153
2020-08-06: 0.85750091063057
2020-08-07: 1.2389406731712163
2020-08-08: 1.390197936559989
2020-08-09: 1.6581579631437466
2020-08-10: 1.1219983738738089
2020-08-11: 1.1768612544012738
2020-08-12: 1.4286410808224201
2020-08-13: 1.3593226175123336
2020-08-14: 1.5671335269822713
2020-08-15: 1.0578796717356473
2020-08-16: 1.3798937749914533
2020-08-17: 2.057415013585134
2020-08-18: 1.3748720867487496
2020-08-19: 1.606164621413237
2020-08-20: 1.6392940834580092
2020-08-21: 1.6360191335263334
2020-08-22: 2.102181280195843
2020-08-23: 2.0891371312819573
2020-08-24: 2.9341131872249977
2020-08-25: 2.7988844187285524
2020-08-26: 4.137515519038725
2020-08-27: 3.585885732565462
2020-08-28: 4.4909544673785335
2020-08-29: 5.9

2020-08-10: 6419.775988678878
2020-08-11: 4878.770657919047
2020-08-12: 1135.327882853021
2020-08-13: 791.4648409590367
2020-08-14: 2136.540195536637
2020-08-15: 2310.2560401795054
2020-08-16: 4386.714818845449
2020-08-17: 4961.1162215359345
2020-08-18: 5304.33843500069
2020-08-19: 1447.2127900121134
2020-08-20: 842.0813234858908
2020-08-21: 1547.1108183417634
2020-08-22: 2382.864613049795
2020-08-23: 3818.7150157848064
2020-08-24: 3114.4388960433025
2020-08-25: 3417.7779278050093
2020-08-26: 1183.8458890110412
2020-08-27: 773.4124503185726
2020-08-28: 1160.2098268126765
2020-08-29: 1498.1105085703866
2020-08-30: 2388.0779838636954
2020-08-31: 2288.79642115371

Predicting for Eswatini__nan
2020-08-01: 56.004944787446306
2020-08-02: 41.981540994129276
2020-08-03: 49.66819497482327
2020-08-04: 52.92956281469427
2020-08-05: 49.77733102677678
2020-08-06: 45.389955097835866
2020-08-07: 44.653810829698756
2020-08-08: 52.52197600461196
2020-08-09: 66.41409852723916
2020-08-10: 84.048729792463

2020-08-22: 1.3783376291339595
2020-08-23: 1.3036524460980152
2020-08-24: 1.354099638314547
2020-08-25: 1.2608500432161835
2020-08-26: 1.2691790670165246
2020-08-27: 1.2297227951692935
2020-08-28: 1.1486430014626103
2020-08-29: 1.2818102932067366
2020-08-30: 1.2510124703959047
2020-08-31: 1.2919018973527103

Predicting for Trinidad and Tobago__nan
2020-08-01: 10.625834236727545
2020-08-02: 12.79290149611729
2020-08-03: 13.365204138255844
2020-08-04: 11.907218274494541
2020-08-05: 9.133936022755377
2020-08-06: 8.929568927000966
2020-08-07: 9.391797595838163
2020-08-08: 7.654437828682294
2020-08-09: 7.528759295205331
2020-08-10: 8.649691646876136
2020-08-11: 7.5460308743357976
2020-08-12: 5.172418533100545
2020-08-13: 7.232875531134264
2020-08-14: 4.948958697814027
2020-08-15: 6.094952775950185
2020-08-16: 7.60507010876717
2020-08-17: 6.791794867539758
2020-08-18: 9.022883479779772
2020-08-19: 8.548467114064769
2020-08-20: 8.5177608436199
2020-08-21: 5.138336558065217
2020-08-22: 7.47804

2020-08-22: 59.61346109936518
2020-08-23: 82.57585258023562
2020-08-24: 69.27680398562217
2020-08-25: 61.77387073337119
2020-08-26: 47.408899481183774
2020-08-27: 46.33211689811034
2020-08-28: 20.93425277638852
2020-08-29: 29.568343368152636
2020-08-30: 44.17194033606914
2020-08-31: 29.10591925123384

Predicting for United States__Alabama
2020-08-01: 2638.686550734927
2020-08-02: 3357.6418878220175
2020-08-03: 2641.607819339311
2020-08-04: 1529.9740597699913
2020-08-05: 1721.2943111280813
2020-08-06: 1001.3102408037262
2020-08-07: 1217.9274703960466
2020-08-08: 1503.4583964153132
2020-08-09: 1808.4104362761636
2020-08-10: 1123.2249061608834
2020-08-11: 1097.3851823439084
2020-08-12: 701.9706785859421
2020-08-13: 808.5589706379354
2020-08-14: 855.1748501828627
2020-08-15: 1070.2667318644405
2020-08-16: 892.3796820442811
2020-08-17: 611.9162794286045
2020-08-18: 652.6031671578147
2020-08-19: 582.6095453253081
2020-08-20: 585.4688113924408
2020-08-21: 459.41634144046213
2020-08-22: 663.85

2020-08-26: 1273.2542127660702
2020-08-27: 1187.1943657908416
2020-08-28: 700.0192881337002
2020-08-29: 517.8369321682833
2020-08-30: 1257.9832595137552
2020-08-31: 925.1653143663342

Predicting for United States__Georgia
2020-08-01: 3175.273485537994
2020-08-02: 7194.485355972672
2020-08-03: 4382.604483474782
2020-08-04: 3820.290461991942
2020-08-05: 2409.629718096433
2020-08-06: 2105.234742465708
2020-08-07: 1856.2193651247067
2020-08-08: 2734.927880626878
2020-08-09: 2794.788214981316
2020-08-10: 2397.3085653306575
2020-08-11: 2467.786351835565
2020-08-12: 1700.0274752676848
2020-08-13: 930.527388112575
2020-08-14: 1182.1747149656203
2020-08-15: 1559.7418434354786
2020-08-16: 1408.0830031737808
2020-08-17: 1426.650517217068
2020-08-18: 1351.6197849998239
2020-08-19: 742.9872102152867
2020-08-20: 739.5497665198106
2020-08-21: 682.1329802881907
2020-08-22: 903.4727947903382
2020-08-23: 983.4230695166458
2020-08-24: 669.1155280576878
2020-08-25: 541.9908371883878
2020-08-26: 667.372477

2020-08-29: 292.90670544097475
2020-08-30: 471.2146835101074
2020-08-31: 417.3065752250619

Predicting for United States__Massachusetts
2020-08-01: 3264.407071023588
2020-08-02: 4931.214535821417
2020-08-03: 4088.2784732874065
2020-08-04: 4683.415756995806
2020-08-05: 4405.196414641686
2020-08-06: 1513.6590885724818
2020-08-07: 2367.8764371702637
2020-08-08: 2766.9348637339654
2020-08-09: 2939.3826147065924
2020-08-10: 2029.944581370221
2020-08-11: 1849.507670508465
2020-08-12: 2500.1273261378274
2020-08-13: 1090.5409384268391
2020-08-14: 940.2634855403561
2020-08-15: 1129.7331286159965
2020-08-16: 1329.3307929361658
2020-08-17: 1178.3237202808702
2020-08-18: 1322.1313207030018
2020-08-19: 925.1979025535159
2020-08-20: 710.1966728892794
2020-08-21: 512.7372308118405
2020-08-22: 884.0425893444105
2020-08-23: 872.8606909491649
2020-08-24: 622.7152993511812
2020-08-25: 448.74155326955406
2020-08-26: 611.3574480990819
2020-08-27: 412.9409845476331
2020-08-28: 332.2888692350374
2020-08-29: 

2020-08-28: 412.9767846457668
2020-08-29: 356.7234999364315
2020-08-30: 482.21519182090645
2020-08-31: 369.3898814065573

Predicting for United States__North Dakota
2020-08-01: 197.37684108476012
2020-08-02: 497.7442489895719
2020-08-03: 338.95922441749303
2020-08-04: 301.75611102074026
2020-08-05: 180.67059397479258
2020-08-06: 183.89623746692126
2020-08-07: 158.73921599216447
2020-08-08: 159.57359373482024
2020-08-09: 314.36938609795027
2020-08-10: 218.2111044902669
2020-08-11: 245.4721148500097
2020-08-12: 162.89962117766677
2020-08-13: 97.61114002080238
2020-08-14: 96.93561698860252
2020-08-15: 88.7875513797289
2020-08-16: 215.08233192123555
2020-08-17: 156.6984713319669
2020-08-18: 160.72069051193546
2020-08-19: 83.44556667142888
2020-08-20: 58.879360086246294
2020-08-21: 63.760878226919424
2020-08-22: 52.39069444900398
2020-08-23: 118.32268126630915
2020-08-24: 101.28114006101869
2020-08-25: 86.933493852068
2020-08-26: 45.63442467244206
2020-08-27: 30.171061873252164
2020-08-28: 

2020-08-07: 850.6877512259443
2020-08-08: 1219.4611285257254
2020-08-09: 1104.5344491975159
2020-08-10: 807.8250739132234
2020-08-11: 687.6627584866634
2020-08-12: 1280.7192293838045
2020-08-13: 639.1583943238551
2020-08-14: 508.15003901624743
2020-08-15: 606.440682613244
2020-08-16: 712.1896950979655
2020-08-17: 459.974257106046
2020-08-18: 589.2063467953279
2020-08-19: 595.9334865482845
2020-08-20: 428.19278630021506
2020-08-21: 316.4644502068632
2020-08-22: 443.9884886420236
2020-08-23: 444.5252726746574
2020-08-24: 308.75171567333615
2020-08-25: 278.0504806108555
2020-08-26: 383.21258753847275
2020-08-27: 222.7553261457476
2020-08-28: 168.66694414820427
2020-08-29: 159.9366539826204
2020-08-30: 205.33414969979816
2020-08-31: 142.06351555341342

Predicting for United States__Oregon
2020-08-01: 817.605619589928
2020-08-02: 1280.227853753804
2020-08-03: 903.6020266891704
2020-08-04: 1098.6636239165437
2020-08-05: 753.0126083525666
2020-08-06: 549.3569267861764
2020-08-07: 632.71553738

2020-08-08: 1692.3965324718038
2020-08-09: 1178.0911433122528
2020-08-10: 1129.764272681906
2020-08-11: 1000.7042218475707
2020-08-12: 713.6211350165873
2020-08-13: 795.8220795120586
2020-08-14: 768.71890111319
2020-08-15: 835.2450871881908
2020-08-16: 739.0273166638731
2020-08-17: 623.8699723898335
2020-08-18: 770.7332867172382
2020-08-19: 663.1262656443012
2020-08-20: 574.8195752437673
2020-08-21: 426.29929967859056
2020-08-22: 544.6015649336185
2020-08-23: 501.78116759468776
2020-08-24: 379.9269209451597
2020-08-25: 289.2586610123682
2020-08-26: 410.84479724443247
2020-08-27: 320.86849713944144
2020-08-28: 236.07602075211588
2020-08-29: 175.8417612874261
2020-08-30: 286.64146636432724
2020-08-31: 198.36927476679057

Predicting for United States__Virgin Islands
2020-08-01: 14.225002053698843
2020-08-02: 10.635826083231514
2020-08-03: 11.156770813314447
2020-08-04: 9.968829372838185
2020-08-05: 11.217295800901432
2020-08-06: 14.113008100664956
2020-08-07: 11.681041095553514
2020-08-08

2020-08-01: 14.225002053698843
2020-08-02: 10.635826083231514
2020-08-03: 11.156770813314447
2020-08-04: 9.968829372838185
2020-08-05: 11.217295800901432
2020-08-06: 14.113008100664956
2020-08-07: 11.681041095553514
2020-08-08: 28.05107970852323
2020-08-09: 22.942656806282038
2020-08-10: 30.520995571047884
2020-08-11: 36.52274184160293
2020-08-12: 34.921356540236715
2020-08-13: 27.07261454704525
2020-08-14: 25.783807047184585
2020-08-15: 37.918467025113635
2020-08-16: 48.123298648394254
2020-08-17: 36.78244651708351
2020-08-18: 45.12070017073462
2020-08-19: 40.53896334275246
2020-08-20: 29.920424987323585
2020-08-21: 30.297649624345837
2020-08-22: 52.33087010751111
2020-08-23: 55.58487605071404
2020-08-24: 58.46457083663405
2020-08-25: 72.87469494330053
2020-08-26: 53.60824593720985
2020-08-27: 55.288149455615425
2020-08-28: 61.93025122071568
2020-08-29: 81.50750982661062
2020-08-30: 94.03535320352199
2020-08-31: 88.25478009965451

Predicting for Vietnam__nan
2020-08-01: 5.550147991387

In [43]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,11.290426
214,Aruba,,2020-08-02,10.039194
215,Aruba,,2020-08-03,11.425396
216,Aruba,,2020-08-04,11.526868
217,Aruba,,2020-08-05,8.034063


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [44]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [45]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,11.290425620760875
Aruba,,2020-08-02,10.039194205035018
Aruba,,2020-08-03,11.425396219616268
Aruba,,2020-08-04,11.526868044645925
Afghanistan,,2020-08-01,146.62497627378957
Afghanistan,,2020-08-02,89.12843873424981
Afghanistan,,2020-08-03,64.45813666067774
Afghanistan,,2020-08-04,64.9466436745464
Angola,,2020-08-01,87.5574555562355


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [46]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [47]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
All good!


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [48]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 250 ms, sys: 160 ms, total: 410 ms
Wall time: 7.5 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [49]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-22
End date: 2021-06-20


In [50]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [51]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-22 to 2021-06-20...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 7.3 s, sys: 1.03 s, total: 8.33 s
Wall time: 4min 13s
