# Stage 1 Model Selection: Summary of Results

Model selection was conducted using a method of time series cross-validation called rolling forecast origin. To avoid leakage of future observations, the method incrementally moves the forecast origin forward in time and then makes a prediction. For each new fold we implemented a stride of seven days. We performed a two stage model selection procedure. In the first stage, we used an aggregate regional level time series to screen and identify the most promising candidate models for up to a 365 day forecast (27 folds).

This notebook generates the summary result tables and charts for stage 1 of model selection (screening)


In [1]:
import pandas as pd
import os
from os import listdir
from os.path import isfile, join

In [2]:
cwd = os.getcwd()
cwd

'/home/tom/Documents/code/swast-benchmarking/results'

In [3]:
if cwd[-7:] != "results":
    mypath = './results/model_selection/stage1/'
    TABLE_PATH = './paper/tables/'
    FIGURE_PATH = './paper/figures/'
    APPENDIX_PATH = './paper/appendix/'
else:
    mypath = './model_selection/stage1/'
    TABLE_PATH = '../paper/tables/'
    FIGURE_PATH = '../paper/figures/'
    APPENDIX_PATH = '../paper/appendix/'

In [4]:
result_files = [f for f in listdir(mypath) if isfile(join(mypath, f))]

In [5]:
results_mean = pd.DataFrame()
results_med = pd.DataFrame()
results_mean_std = pd.DataFrame()

## Point Estimate Results

In [6]:
error_measures = ['smape', 'rmse', 'mase', 'coverage_80', 'coverage_95']

In [7]:
for metric in error_measures:
    to_read = [filename for filename in result_files if metric in filename]
    model_names = [name[:name.index('_')] for name in to_read]
    
    for filename, model_name in zip(to_read, model_names):
        df = pd.read_csv(mypath + filename, index_col=0)

        prefix = model_name + '_' + metric
        results_mean[prefix + '_mean'] = df.mean()
        results_mean[prefix  + '_std'] = df.std()
        results_med[prefix + '_med'] = df.median()
        results_med[prefix + '_iqr'] = df.quantile(0.75) - df.quantile(0.25)
        
        results_mean_std[prefix] = results_mean[prefix + '_mean'].map('{:,.2f}'.format) \
            + ' (' + results_mean[prefix  + '_std'].map('{:,.2f}'.format) + ')'

In [8]:
results_mean.filter(like="fbp")

Unnamed: 0,Trust-fbp-arima_smape_mean,Trust-fbp-arima_smape_std,Trust-fbp_smape_mean,Trust-fbp_smape_std,Trust-fbp-arima_rmse_mean,Trust-fbp-arima_rmse_std,Trust-fbp_rmse_mean,Trust-fbp_rmse_std,Trust-fbp-arima_mase_mean,Trust-fbp-arima_mase_std,Trust-fbp_mase_mean,Trust-fbp_mase_std,Trust-fbp_coverage_80_mean,Trust-fbp_coverage_80_std,Trust-fbp-arima_coverage_80_mean,Trust-fbp-arima_coverage_80_std,Trust-fbp_coverage_95_mean,Trust-fbp_coverage_95_std,Trust-fbp-arima_coverage_95_mean,Trust-fbp-arima_coverage_95_std
7,2.888748,1.154105,2.992764,1.453381,76.322438,31.861061,81.477287,45.993262,0.779735,0.338296,0.807967,0.423243,0.777778,0.203321,0.78836,0.214726,0.915344,0.138727,0.936508,0.127365
14,3.069071,1.078228,3.054805,1.252905,82.961792,31.404482,86.065006,41.350955,0.829899,0.319189,0.824645,0.360452,0.769841,0.170802,0.775132,0.181248,0.912698,0.114377,0.928571,0.112066
21,3.14256,0.854233,3.077355,1.008357,87.044831,27.738723,88.93257,36.300717,0.850839,0.258226,0.830861,0.288207,0.763668,0.140693,0.776014,0.141517,0.91358,0.092485,0.924162,0.096384
28,3.181019,0.628019,3.085396,0.817484,89.581903,24.119081,90.760518,32.138012,0.862184,0.197235,0.833448,0.231103,0.763228,0.117264,0.781746,0.109109,0.914021,0.078158,0.921958,0.082896
35,3.243412,0.546131,3.098144,0.727439,92.174299,22.037625,92.251442,28.970328,0.880112,0.174982,0.837451,0.201529,0.765079,0.107002,0.774603,0.101583,0.913228,0.07153,0.920635,0.073201
42,3.307821,0.517146,3.115493,0.678602,94.456973,20.510388,93.563846,26.26138,0.898571,0.165175,0.842769,0.182511,0.763668,0.102073,0.768078,0.097409,0.910935,0.067857,0.919753,0.064414
49,3.363752,0.48871,3.138307,0.630983,96.489349,19.224001,94.93357,23.533562,0.914623,0.155135,0.849555,0.164352,0.764928,0.098424,0.764928,0.090275,0.908541,0.064131,0.919123,0.056171
56,3.419441,0.440075,3.163085,0.580087,98.388413,17.644049,96.35959,20.534614,0.930381,0.13977,0.856983,0.145062,0.761905,0.095138,0.761905,0.079552,0.906746,0.059294,0.917328,0.04831
63,3.480343,0.420162,3.191124,0.543042,100.425831,15.985038,97.673905,17.567544,0.947574,0.130419,0.865372,0.130012,0.760141,0.09259,0.753674,0.077808,0.904762,0.056206,0.91652,0.041902
70,3.522194,0.391239,3.216202,0.526889,101.982125,13.918642,98.77572,15.222167,0.959693,0.117955,0.873086,0.122297,0.757672,0.091429,0.74709,0.0703,0.902116,0.054428,0.916402,0.035814


## Formatted results

In [9]:
point_forecasts = pd.concat([results_mean_std.filter(like='smape'), 
                             results_mean_std.filter(like='rmse'),
                             results_mean_std.filter(like='mase')], axis=1)

In [10]:
pf_smape = point_forecasts.filter(like='smape')
pf_smape.reindex(sorted(pf_smape.columns), axis=1)

Unnamed: 0,Trust-arima_smape,Trust-fbp-arima_smape,Trust-fbp_smape,Trust-reg-arima_smape,Trust-snaive_smape
7,3.04 (1.26),2.89 (1.15),2.99 (1.45),2.82 (0.96),3.48 (1.19)
14,3.39 (1.28),3.07 (1.08),3.05 (1.25),3.16 (0.96),3.94 (1.32)
21,3.58 (1.19),3.14 (0.85),3.08 (1.01),3.38 (0.95),4.12 (1.38)
28,3.73 (1.14),3.18 (0.63),3.09 (0.82),3.53 (0.88),4.19 (1.33)
35,3.87 (1.18),3.24 (0.55),3.10 (0.73),3.70 (0.94),4.35 (1.34)
42,3.98 (1.19),3.31 (0.52),3.12 (0.68),3.85 (0.99),4.51 (1.33)
49,4.07 (1.17),3.36 (0.49),3.14 (0.63),3.98 (1.02),4.61 (1.32)
56,4.14 (1.12),3.42 (0.44),3.16 (0.58),4.09 (0.98),4.67 (1.33)
63,4.22 (1.09),3.48 (0.42),3.19 (0.54),4.20 (1.05),4.77 (1.36)
70,4.27 (1.02),3.52 (0.39),3.22 (0.53),4.29 (1.06),4.82 (1.26)


In [11]:
pf_rmse = point_forecasts.filter(like='rmse')
pf_rmse.reindex(sorted(pf_rmse.columns), axis=1)

Unnamed: 0,Trust-arima_rmse,Trust-fbp-arima_rmse,Trust-fbp_rmse,Trust-reg-arima_rmse,Trust-snaive_rmse
7,81.33 (38.91),76.32 (31.86),81.48 (45.99),74.38 (25.07),96.91 (37.11)
14,92.78 (41.97),82.96 (31.40),86.07 (41.35),84.99 (26.96),109.71 (41.88)
21,99.88 (41.13),87.04 (27.74),88.93 (36.30),91.97 (26.99),115.38 (42.94)
28,104.92 (40.30),89.58 (24.12),90.76 (32.14),96.63 (25.82),118.34 (42.92)
35,109.55 (40.69),92.17 (22.04),92.25 (28.97),101.42 (26.70),123.26 (42.33)
42,113.36 (40.41),94.46 (20.51),93.56 (26.26),105.73 (27.14),128.11 (40.34)
49,116.52 (39.75),96.49 (19.22),94.93 (23.53),109.54 (27.64),131.40 (39.79)
56,119.28 (38.52),98.39 (17.64),96.36 (20.53),112.77 (26.88),133.95 (40.20)
63,122.31 (37.26),100.43 (15.99),97.67 (17.57),116.08 (28.47),136.59 (40.30)
70,124.65 (35.42),101.98 (13.92),98.78 (15.22),118.89 (28.51),138.70 (37.25)


In [12]:
pf_mase = point_forecasts.filter(like='mase')
pf_mase.reindex(sorted(pf_mase.columns), axis=1)

Unnamed: 0,Trust-arima_mase,Trust-fbp-arima_mase,Trust-fbp_mase,Trust-reg-arima_mase,Trust-snaive_mase
7,0.82 (0.39),0.78 (0.34),0.81 (0.42),0.76 (0.28),0.94 (0.35)
14,0.92 (0.40),0.83 (0.32),0.82 (0.36),0.85 (0.29),1.06 (0.39)
21,0.97 (0.37),0.85 (0.26),0.83 (0.29),0.91 (0.29),1.12 (0.41)
28,1.01 (0.36),0.86 (0.20),0.83 (0.23),0.96 (0.28),1.14 (0.40)
35,1.05 (0.37),0.88 (0.17),0.84 (0.20),1.00 (0.29),1.18 (0.40)
42,1.08 (0.37),0.90 (0.17),0.84 (0.18),1.05 (0.31),1.23 (0.40)
49,1.11 (0.36),0.91 (0.16),0.85 (0.16),1.08 (0.31),1.25 (0.39)
56,1.13 (0.34),0.93 (0.14),0.86 (0.15),1.11 (0.30),1.27 (0.39)
63,1.15 (0.33),0.95 (0.13),0.87 (0.13),1.14 (0.31),1.30 (0.40)
70,1.17 (0.31),0.96 (0.12),0.87 (0.12),1.17 (0.31),1.31 (0.37)


In [13]:
columns = pf_rmse.columns
columns = [s.replace('_rmse', '') for s in columns]
pf_rmse.columns = columns
pf_rmse.sort_index(axis=1)

Unnamed: 0,Trust-arima,Trust-fbp,Trust-fbp-arima,Trust-reg-arima,Trust-snaive
7,81.33 (38.91),81.48 (45.99),76.32 (31.86),74.38 (25.07),96.91 (37.11)
14,92.78 (41.97),86.07 (41.35),82.96 (31.40),84.99 (26.96),109.71 (41.88)
21,99.88 (41.13),88.93 (36.30),87.04 (27.74),91.97 (26.99),115.38 (42.94)
28,104.92 (40.30),90.76 (32.14),89.58 (24.12),96.63 (25.82),118.34 (42.92)
35,109.55 (40.69),92.25 (28.97),92.17 (22.04),101.42 (26.70),123.26 (42.33)
42,113.36 (40.41),93.56 (26.26),94.46 (20.51),105.73 (27.14),128.11 (40.34)
49,116.52 (39.75),94.93 (23.53),96.49 (19.22),109.54 (27.64),131.40 (39.79)
56,119.28 (38.52),96.36 (20.53),98.39 (17.64),112.77 (26.88),133.95 (40.20)
63,122.31 (37.26),97.67 (17.57),100.43 (15.99),116.08 (28.47),136.59 (40.30)
70,124.65 (35.42),98.78 (15.22),101.98 (13.92),118.89 (28.51),138.70 (37.25)


In [14]:
columns = pf_smape.columns
columns = [s.replace('_smape', '') for s in columns]
pf_smape.columns = columns
pf_smape.sort_index(axis=1)

Unnamed: 0,Trust-arima,Trust-fbp,Trust-fbp-arima,Trust-reg-arima,Trust-snaive
7,3.04 (1.26),2.99 (1.45),2.89 (1.15),2.82 (0.96),3.48 (1.19)
14,3.39 (1.28),3.05 (1.25),3.07 (1.08),3.16 (0.96),3.94 (1.32)
21,3.58 (1.19),3.08 (1.01),3.14 (0.85),3.38 (0.95),4.12 (1.38)
28,3.73 (1.14),3.09 (0.82),3.18 (0.63),3.53 (0.88),4.19 (1.33)
35,3.87 (1.18),3.10 (0.73),3.24 (0.55),3.70 (0.94),4.35 (1.34)
42,3.98 (1.19),3.12 (0.68),3.31 (0.52),3.85 (0.99),4.51 (1.33)
49,4.07 (1.17),3.14 (0.63),3.36 (0.49),3.98 (1.02),4.61 (1.32)
56,4.14 (1.12),3.16 (0.58),3.42 (0.44),4.09 (0.98),4.67 (1.33)
63,4.22 (1.09),3.19 (0.54),3.48 (0.42),4.20 (1.05),4.77 (1.36)
70,4.27 (1.02),3.22 (0.53),3.52 (0.39),4.29 (1.06),4.82 (1.26)


In [15]:
columns = pf_mase.columns
columns = [s.replace('_mase', '') for s in columns]
pf_mase.columns = columns
pf_mase.sort_index(axis=1)

Unnamed: 0,Trust-arima,Trust-fbp,Trust-fbp-arima,Trust-reg-arima,Trust-snaive
7,0.82 (0.39),0.81 (0.42),0.78 (0.34),0.76 (0.28),0.94 (0.35)
14,0.92 (0.40),0.82 (0.36),0.83 (0.32),0.85 (0.29),1.06 (0.39)
21,0.97 (0.37),0.83 (0.29),0.85 (0.26),0.91 (0.29),1.12 (0.41)
28,1.01 (0.36),0.83 (0.23),0.86 (0.20),0.96 (0.28),1.14 (0.40)
35,1.05 (0.37),0.84 (0.20),0.88 (0.17),1.00 (0.29),1.18 (0.40)
42,1.08 (0.37),0.84 (0.18),0.90 (0.17),1.05 (0.31),1.23 (0.40)
49,1.11 (0.36),0.85 (0.16),0.91 (0.16),1.08 (0.31),1.25 (0.39)
56,1.13 (0.34),0.86 (0.15),0.93 (0.14),1.11 (0.30),1.27 (0.39)
63,1.15 (0.33),0.87 (0.13),0.95 (0.13),1.14 (0.31),1.30 (0.40)
70,1.17 (0.31),0.87 (0.12),0.96 (0.12),1.17 (0.31),1.31 (0.37)


In [16]:
coverage = results_mean_std.filter(like='80')
coverage

Unnamed: 0,Trust-arima_coverage_80,Trust-fbp_coverage_80,Trust-fbp-arima_coverage_80
7,0.81 (0.20),0.78 (0.20),0.79 (0.21)
14,0.78 (0.17),0.77 (0.17),0.78 (0.18)
21,0.77 (0.15),0.76 (0.14),0.78 (0.14)
28,0.76 (0.14),0.76 (0.12),0.78 (0.11)
35,0.75 (0.15),0.77 (0.11),0.77 (0.10)
42,0.74 (0.15),0.76 (0.10),0.77 (0.10)
49,0.73 (0.15),0.76 (0.10),0.76 (0.09)
56,0.73 (0.14),0.76 (0.10),0.76 (0.08)
63,0.72 (0.14),0.76 (0.09),0.75 (0.08)
70,0.72 (0.13),0.76 (0.09),0.75 (0.07)


In [17]:
coverage_80 = results_mean_std.filter(like='80')
columns = coverage_80.columns
columns = [s.replace('_coverage_80', '') for s in columns]
coverage_80.columns = columns
coverage_80.sort_index(axis=1)

Unnamed: 0,Trust-arima,Trust-fbp,Trust-fbp-arima
7,0.81 (0.20),0.78 (0.20),0.79 (0.21)
14,0.78 (0.17),0.77 (0.17),0.78 (0.18)
21,0.77 (0.15),0.76 (0.14),0.78 (0.14)
28,0.76 (0.14),0.76 (0.12),0.78 (0.11)
35,0.75 (0.15),0.77 (0.11),0.77 (0.10)
42,0.74 (0.15),0.76 (0.10),0.77 (0.10)
49,0.73 (0.15),0.76 (0.10),0.76 (0.09)
56,0.73 (0.14),0.76 (0.10),0.76 (0.08)
63,0.72 (0.14),0.76 (0.09),0.75 (0.08)
70,0.72 (0.13),0.76 (0.09),0.75 (0.07)


In [18]:
coverage_95 = results_mean_std.filter(like='95')
columns = coverage_95.columns
columns = [s.replace('_coverage_95', '') for s in columns]
coverage_95.columns = columns
coverage_95.sort_index(axis=1)

Unnamed: 0,Trust-arima,Trust-fbp,Trust-fbp-arima,Trust-reg-arima
7,0.96 (0.09),0.92 (0.14),0.94 (0.13),0.96 (0.08)
14,0.95 (0.11),0.91 (0.11),0.93 (0.11),0.94 (0.11)
21,0.95 (0.10),0.91 (0.09),0.92 (0.10),0.93 (0.11)
28,0.94 (0.09),0.91 (0.08),0.92 (0.08),0.93 (0.10)
35,0.94 (0.09),0.91 (0.07),0.92 (0.07),0.92 (0.10)
42,0.94 (0.08),0.91 (0.07),0.92 (0.06),0.91 (0.10)
49,0.93 (0.08),0.91 (0.06),0.92 (0.06),0.91 (0.09)
56,0.93 (0.08),0.91 (0.06),0.92 (0.05),0.90 (0.08)
63,0.93 (0.07),0.90 (0.06),0.92 (0.04),0.89 (0.09)
70,0.92 (0.06),0.90 (0.05),0.92 (0.04),0.89 (0.09)


In [19]:
#Write these rather large tables to a latex file to use in appendix

#Table S1: MASE
pf_mase.sort_index(axis=1).to_latex(buf=f'{APPENDIX_PATH}tableS1.tex')

#Table S2: 80% Coverage
coverage_80.sort_index(axis=1).to_latex(buf=f'{APPENDIX_PATH}tableS2.tex')

#Table S3: 95% coverage
coverage_95.sort_index(axis=1).to_latex(buf=f'{APPENDIX_PATH}tableS3.tex')

#extra rables for sMAPE and RMSE - not used in paper.
pf_smape.sort_index(axis=1).to_latex(buf=f'{APPENDIX_PATH}symmetricmape.tex')
pf_rmse.sort_index(axis=1).to_latex(buf=f'{APPENDIX_PATH}rootmeansqua.tex')


print(f'Appendix summary tables written to {APPENDIX_PATH}')

Appendix summary tables written to ../paper/appendix/


## Naive Benchmark (Seasonal Naive)

A naive baseline forecasting method was chosen. This was to ensure that the sophisticated methods we test in the study were only considered for the final benchmark if they provided more more accurate point forecasts than the simplest of models. As emergency care demand data are seasonal we opted for the well-known Seasonal Naive method.  This method works by using the most recent observation for the same day and carrying it forward.  For example, if we are forecasting next Tuesday then the observation from the most recent Tuesday is used as the predicted value.

In [20]:
pf_snaive = point_forecasts.filter(like='snaive')
pf_snaive.reindex(sorted(pf_snaive.columns), axis=1)
columns = pf_snaive.columns
columns = [s.replace('_rmse', '') for s in columns]
columns = [s.replace('_smape', '') for s in columns]
columns = [s.replace('_mase', '') for s in columns]
pf_snaive.columns = ['sMAPE', 'RMSE', 'MASE']

In [21]:
pf_snaive[['MASE', 'sMAPE', 'RMSE']].to_latex(buf=f'{TABLE_PATH}table2.tex')

In [22]:
pf_snaive[:-1]

Unnamed: 0,sMAPE,RMSE,MASE
7,3.48 (1.19),96.91 (37.11),0.94 (0.35)
14,3.94 (1.32),109.71 (41.88),1.06 (0.39)
21,4.12 (1.38),115.38 (42.94),1.12 (0.41)
28,4.19 (1.33),118.34 (42.92),1.14 (0.40)
35,4.35 (1.34),123.26 (42.33),1.18 (0.40)
42,4.51 (1.33),128.11 (40.34),1.23 (0.40)
49,4.61 (1.32),131.40 (39.79),1.25 (0.39)
56,4.67 (1.33),133.95 (40.20),1.27 (0.39)
63,4.77 (1.36),136.59 (40.30),1.30 (0.40)
70,4.82 (1.26),138.70 (37.25),1.31 (0.37)


In [23]:
print('\nTable 2: Cross-Validation of Seasonal Naive Point Forecasts')
print(pf_snaive[:-1])


Table 2: Cross-Validation of Seasonal Naive Point Forecasts
          sMAPE            RMSE         MASE
7   3.48 (1.19)   96.91 (37.11)  0.94 (0.35)
14  3.94 (1.32)  109.71 (41.88)  1.06 (0.39)
21  4.12 (1.38)  115.38 (42.94)  1.12 (0.41)
28  4.19 (1.33)  118.34 (42.92)  1.14 (0.40)
35  4.35 (1.34)  123.26 (42.33)  1.18 (0.40)
42  4.51 (1.33)  128.11 (40.34)  1.23 (0.40)
49  4.61 (1.32)  131.40 (39.79)  1.25 (0.39)
56  4.67 (1.33)  133.95 (40.20)  1.27 (0.39)
63  4.77 (1.36)  136.59 (40.30)  1.30 (0.40)
70  4.82 (1.26)  138.70 (37.25)  1.31 (0.37)
77  4.88 (1.18)  140.73 (34.49)  1.33 (0.34)
84  4.94 (1.21)  143.04 (34.07)  1.34 (0.35)
