# Dados coletados do CovidHub

https://covid19forecasthub.org/doc/ensemble/
## Sobre os Dados
No Covid-Hub, forma feitos modelos para casos e mortes, com forecasts por estado,
sendo estes agregados para nível nacional. Neste notebook utilizamos os dados de nível nacional (US).

Os dados reais (ground truth) de Covid se encontram no dataframe `covid_data_death`,
enquanto `forecasts_death` contém os modelos de forecasting (ensembles e modelos componentes utlizados no ensamble).

Forecasts podem conter quantis ou point predictions, que está na coluna `type`. Aqui usamos somente os quantis, que
foram os valores utilizados para o ensamble.

### Modelos

Existem diversos modelos na tabela de `forecasts_death`.
#### COVIDhub-4_week_ensemble

"This ensemble produces forecasts of incident cases (discontinued as of February 2023), incident deaths, and cumulative deaths (discontinued as of March 2023) at horizons of 1 through 4 weeks ahead, and forecasts of incident hospitalizations at horizons of 1 through 28 days ahead. For all of these targets, the ensemble forecasts are computed as the equally-weighted median of all component forecasts at each location, forecast horizon, and quantile level."

In [35]:
import pandas as pd
import polars as pl
import polars.selectors as cs

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import scoringrules as sr
from mosqlient.scoring import compute_wis


from lets_plot import *
LetsPlot.setup_html()

#### Ground Truth

Os dados de ground truth são simples. A tabela tem `value` para os valores e `target_end_date` como a data
da medição.

In [36]:
covid_data = pl.read_parquet('./data/covid_truth_data_death.parquet')
covid_data = covid_data.with_columns(
    pl.lit('ground_truth').alias('type'),
    pl.col('target_end_date').cast(pl.Datetime),
    pl.col('target_end_date').dt.weekday().alias('target_day_of_week')
)


In [37]:
(
    ggplot(data=covid_data) +
    geom_line(aes(x='target_end_date', y='value')) +
    geom_point(aes(x='target_end_date', y='value')) +
    ggsize(1400, 400)+
    theme_bw()
)

#### Forecasts

Na tabela de forecast temos os modelos, o valor previsto para um certo `horizon` e um certo `quantile`.
Cada forecast tem um `forecast_date` que é quando o forecast foi submetido, e tem um `target_end_date`,
que é a data que está sendo prevista (por exemplo, para forecast_date de 1/1/2020 eu prevejo o número
de óbitos do target_end_date 8/1/2020).

Há modelos que são submetidos mais de uma vez, de forma que apresentam
duas datas de 'forecast_date' para um mesmo quantil, horizon e target_end_date. A razão disso pode ter sido
uma correção, ou erro de submissão. De toda forma, o ensemble usa a última submissão.

Assim, nossa tabela de `forecasts_death` será filtrada para o `forecast_date` que caem na Segunda, além disso, iremos retirar os modelos que não constam no artigo, para tanto utilizamos como critério os nomes dos modelos que aparecem na página $6$ do material suplementar (*Supplemental Materials for Comparing trained and untrained
probabilistic ensemble forecasts of COVID-19 cases and deaths in the United States*: https://ars.els-cdn.com/content/image/1-s2.0-S0169207022000966-mmc1.pdf).

Ressaltamos que nem todos os modelos presentes lá estavam nos dados, então temos $49$ modelos, $9$ a menos que o artigo (que aponta para o uso de $58$). 

In [38]:
#The list contain the methods that were in the paper, supplementary materials page 6
matched_models = ['AIpert-pwllnod',
'BPagano-RtDriven',
'CEID-Walk',
'CMU-TimeSeries',
'Columbia_UNC-SurvCon',
'Covid19Sim-Simulator',
'CovidActNow-SEIR_CAN',
'CovidAnalytics-DELPHI',
'COVIDhub-baseline',
'CU-select',
#'COVIDhub-ensemble',
#'COVIDhub_CDC-ensemble',
#'COVIDhub-trained_ensemble',
#'COVIDhub-4_week_ensemble',
'DDS-NBDS',
'epiforecasts-ensemble1',
'Google_Harvard-CPF',
'GT_CHHS-COVID19',
'GT-DeepCOVID',
'IEM_MED-CovidProject',
'IHME-CurveFit',
'IowaStateLW-STEM',
'IUPUI-HkPrMobiDyR',
'JCB-PRM',
'JHU_CSSE-DECOM',
'JHU_IDD-CovidSP',
'JHUAPL-Bucky',
'Karlen-pypm',
'LANL-GrowthRate',
'LNQ-ens1',
'Microsoft-DeepSTIA',
'MIT_CritData-GBCF',
'MIT_ISOLAT-Mixtures',
'MIT-Cassandra',
'MITCovAlliance-SIR',
'MOBS-GLEAM_COVID',
'MSRA-DeepST',
'MUNI-ARIMA',
'NotreDame-FRED',
'NotreDame-mobility',
'OliverWyman-Navigator',
'PSI-DRAFT',
'Quantori-Multiagents',
'RobertWalraven-ESG',
'RPI_UW-Mob_Collision',
'SDSC_ISG-TrendModel',
'SigSci-TS',
'SteveMcConnell-CovidComplete',
'SWC-TerminusCM',
'UA-EpiCovDA',
'UChicago-CovidIL',
'UCLA-SuEIR',
'UCM_MESALab-FoGSEIR',
'UCSB-ACTS',
'UCSD_NEU-DeepGLEAM',
'UMass-MechBayes',
'UMich-RidgeTfReg',
'USACE-ERDC_SEIR',
'USC-SI_kJalpha',
'UT-Mobility',
'Wadhwani_AI-BayesOpt',
'YYG-ParamSearch']
matched_models

['AIpert-pwllnod',
 'BPagano-RtDriven',
 'CEID-Walk',
 'CMU-TimeSeries',
 'Columbia_UNC-SurvCon',
 'Covid19Sim-Simulator',
 'CovidActNow-SEIR_CAN',
 'CovidAnalytics-DELPHI',
 'COVIDhub-baseline',
 'CU-select',
 'DDS-NBDS',
 'epiforecasts-ensemble1',
 'Google_Harvard-CPF',
 'GT_CHHS-COVID19',
 'GT-DeepCOVID',
 'IEM_MED-CovidProject',
 'IHME-CurveFit',
 'IowaStateLW-STEM',
 'IUPUI-HkPrMobiDyR',
 'JCB-PRM',
 'JHU_CSSE-DECOM',
 'JHU_IDD-CovidSP',
 'JHUAPL-Bucky',
 'Karlen-pypm',
 'LANL-GrowthRate',
 'LNQ-ens1',
 'Microsoft-DeepSTIA',
 'MIT_CritData-GBCF',
 'MIT_ISOLAT-Mixtures',
 'MIT-Cassandra',
 'MITCovAlliance-SIR',
 'MOBS-GLEAM_COVID',
 'MSRA-DeepST',
 'MUNI-ARIMA',
 'NotreDame-FRED',
 'NotreDame-mobility',
 'OliverWyman-Navigator',
 'PSI-DRAFT',
 'Quantori-Multiagents',
 'RobertWalraven-ESG',
 'RPI_UW-Mob_Collision',
 'SDSC_ISG-TrendModel',
 'SigSci-TS',
 'SteveMcConnell-CovidComplete',
 'SWC-TerminusCM',
 'UA-EpiCovDA',
 'UChicago-CovidIL',
 'UCLA-SuEIR',
 'UCM_MESALab-FoGSEIR',
 'UC

In [39]:
models = pl.read_parquet('./data/models_death.parquet')

models = models.with_columns(
    pl.col('model').is_in(matched_models).alias('include_ensemble')
)

#adding COVIDhub-4_week_ensemble, we where use after
models = models.with_columns(
    pl.when(pl.col('model').is_in(['COVIDhub-4_week_ensemble']))#, 'COVIDhub-trained_ensemble']))
    .then(pl.lit(True))
    .otherwise(pl.col('include_ensemble'))
    .alias('include_ensemble')
)
models

model,designation,include_ensemble
str,str,bool
"""KITmetricslab-select_ensemble""","""primary""",false
"""CovidAnalytics-DELPHI""","""primary""",true
"""UMass-MechBayes""","""primary""",true
"""UT-Osiris""","""primary""",false
"""USACE-ERDC_SEIR""","""primary""",true
…,…,…
"""MIT-Cassandra""","""primary""",true
"""PandemicCentral-COVIDForest""","""primary""",false
"""JHU_IDD-CovidSP""","""primary""",true
"""OHT_JHU-nbxd""","""primary""",false


In [6]:
forecast = pl.read_parquet('./data/fullforecasts_death.parquet')
forecast = forecast.with_columns(
    pl.col('forecast_date').cast(pl.Datetime),
    pl.col('target_end_date').cast(pl.Datetime),
    pl.col('target_end_date').dt.weekday().alias('target_day_of_week'),
    pl.col('forecast_date').dt.weekday().alias('forecast_day_of_week')
)
# ).filter(
#     pl.col('forecast_day_of_week') == 1,
# )

forecast = forecast.join(models, on='model', how='left').filter(
    pl.col('include_ensemble') == True
)

latest_dates = (
    forecast
    .group_by(['model', 'quantile', 'target_end_date', 'horizon'])
    .agg(pl.col('forecast_date').max().alias('forecast_date'))
)

forecast = forecast.join(latest_dates, on=['model', 'quantile', 'target_end_date', 'horizon','forecast_date'], how='inner')

In [7]:
forecast.filter(
    pl.col('model') == 'BPagano-RtDriven'
).sort('forecast_date').head(10)

model,forecast_date,location,horizon,temporal_resolution,target_variable,target_end_date,type,quantile,value,location_name,population,geo_type,geo_value,abbreviation,full_location_name,target_day_of_week,forecast_day_of_week,designation,include_ensemble
str,datetime[μs],str,str,str,str,datetime[μs],str,f64,f64,str,f64,str,str,str,str,i8,i8,str,bool
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.01,3103.283,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.025,3328.707,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.05,3530.225,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.1,3766.196,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.15,3926.44,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.2,4054.113,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.25,4163.785,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.3,4262.349,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.35,4353.726,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True
"""BPagano-RtDriven""",2020-10-04 00:00:00,"""US""","""1""","""wk""","""inc death""",2020-10-17 00:00:00,"""quantile""",0.4,4440.46,"""United States""",332875137.0,"""state""","""us""","""US""","""United States""",6,7,"""primary""",True


In [8]:
forecast['model'].unique()

model
str
"""Covid19Sim-Simulator"""
"""PSI-DRAFT"""
"""YYG-ParamSearch"""
"""AIpert-pwllnod"""
"""epiforecasts-ensemble1"""
…
"""SigSci-TS"""
"""JHU_IDD-CovidSP"""
"""RobertWalraven-ESG"""
"""JCB-PRM"""


# Visualizando os Dados

In [9]:
def quantile_plot(forecast, ground_truth):
    quantile_df  = forecast.pivot(
        index=['target_end_date', 'horizon'],
        on='quantile',
        values='value'
    )
    return (
        ggplot(quantile_df, aes(x='target_end_date')) +
        geom_ribbon(aes(ymin='0.025', ymax='0.975'), fill='#084594', alpha=0.1,size=0.2,manual_key='2.5% to 97.5%') +
        geom_ribbon(aes(ymin='0.25', ymax='0.75'), fill='#2171b5', alpha=0.3,size=0.2,manual_key='25% to 75%') +
        geom_line(aes(y='0.5'), color='blue', size=0.4, manual_key='Median (0.5)') +
        facet_grid(y='horizon', scales='fixed') +
        geom_line(aes(x='target_end_date', y='value'), data=ground_truth, color='black',size=0.5, linetype='dashed', manual_key='Ground Truth') +
        ggsize(1200, 800) +
        theme_bw() +
        labs(title='Forecast Quantiles as Ribbons', y='Value', x='Date')
        )
    

In [10]:
import ipywidgets as widgets
from IPython.display import display, clear_output

model_options = forecast.select('model').unique().to_series().sort().to_list()
model_dropdown = widgets.Dropdown(
    options=model_options,
    value='COVIDhub-4_week_ensemble',
    description='Model:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='50%')
)

output = widgets.Output()

def update_plot(change):
    with output:
        clear_output(wait=True)
        selected_model = change['new']
        display(quantile_plot(forecast.filter(pl.col('model') == selected_model), covid_data))

model_dropdown.observe(update_plot, names='value')

display(model_dropdown)
with output:
    display(quantile_plot(forecast.filter(pl.col('model') == model_dropdown.value), covid_data))
display(output)

Dropdown(description='Model:', index=3, layout=Layout(width='50%'), options=('AIpert-pwllnod', 'BPagano-RtDriv…

Output()

In [11]:
fs = forecast.pivot(
    values='value',
    index=['model', 'forecast_date', 'target_end_date', 'horizon'],
    on='quantile'
)

fs = fs.join(covid_data[['target_end_date','value']],on=['target_end_date'],how='left')

In [12]:
fs.head()

model,forecast_date,target_end_date,horizon,0.025,0.975,0.25,0.75,0.01,0.05,0.1,0.15,0.2,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.8,0.85,0.9,0.95,0.99,value
str,datetime[μs],datetime[μs],str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-23 00:00:00,"""1""",15585.12472,30107.23154,,,,,,,,,,,,,,,,,,,,,,21731.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-30 00:00:00,"""2""",15873.52349,36138.769513,,,,,,,,,,,,,,,,,,,,,,21862.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-06 00:00:00,"""3""",16301.06891,44059.836613,,,,,,,,,,,,,,,,,,,,,,21394.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-13 00:00:00,"""4""",16888.65601,54505.710936,,,,,,,,,,,,,,,,,,,,,,17536.0
"""AIpert-pwllnod""",2021-01-18 00:00:00,2021-01-30 00:00:00,"""1""",17927.563802,33673.501984,,,,,,,,,,,,,,,,,,,,,,21862.0


In [13]:
#0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99
fs = fs.rename({'0.01' : 'lower_98',
                '0.025': 'lower_95',
                '0.05' : 'lower_90',
                '0.1' : 'lower_80',
                '0.15' : 'lower_70',
                '0.2' : 'lower_60',
                '0.25' : 'lower_50',
                '0.3' : 'lower_40',
                '0.35' : 'lower_30',
                '0.4' : 'lower_20',
                '0.45' : 'lower_10',
                '0.5' : 'pred',
                '0.55' : 'upper_10',
                '0.6' : 'upper_20',
                '0.65' : 'upper_30',
                '0.7' : 'upper_40',
                '0.75' : 'upper_50',
                '0.8' : 'upper_60',
                '0.85' : 'upper_70',
                '0.9' : 'upper_80',
                '0.95' : 'upper_90',
                '0.975' : 'upper_95',
                '0.99' : 'upper_98'})
fs.head()

model,forecast_date,target_end_date,horizon,lower_95,upper_95,lower_50,upper_50,lower_98,lower_90,lower_80,lower_70,lower_60,lower_40,lower_30,lower_20,lower_10,pred,upper_10,upper_20,upper_30,upper_40,upper_60,upper_70,upper_80,upper_90,upper_98,value
str,datetime[μs],datetime[μs],str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-23 00:00:00,"""1""",15585.12472,30107.23154,,,,,,,,,,,,,,,,,,,,,,21731.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-30 00:00:00,"""2""",15873.52349,36138.769513,,,,,,,,,,,,,,,,,,,,,,21862.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-06 00:00:00,"""3""",16301.06891,44059.836613,,,,,,,,,,,,,,,,,,,,,,21394.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-13 00:00:00,"""4""",16888.65601,54505.710936,,,,,,,,,,,,,,,,,,,,,,17536.0
"""AIpert-pwllnod""",2021-01-18 00:00:00,2021-01-30 00:00:00,"""1""",17927.563802,33673.501984,,,,,,,,,,,,,,,,,,,,,,21862.0


In [14]:
aux_matched_models = np.array(fs['model'].unique())
aux_matched_models

array(['BPagano-RtDriven', 'COVIDhub-4_week_ensemble',
       'UCM_MESALab-FoGSEIR', 'MIT-Cassandra', 'AIpert-pwllnod',
       'USACE-ERDC_SEIR', 'MOBS-GLEAM_COVID', 'IEM_MED-CovidProject',
       'RPI_UW-Mob_Collision', 'IHME-CurveFit', 'Covid19Sim-Simulator',
       'JCB-PRM', 'MITCovAlliance-SIR', 'MIT_CritData-GBCF',
       'RobertWalraven-ESG', 'COVIDhub-baseline', 'UMass-MechBayes',
       'Microsoft-DeepSTIA', 'DDS-NBDS', 'SigSci-TS',
       'SDSC_ISG-TrendModel', 'JHUAPL-Bucky', 'OliverWyman-Navigator',
       'PSI-DRAFT', 'SWC-TerminusCM', 'NotreDame-mobility',
       'epiforecasts-ensemble1', 'JHU_IDD-CovidSP', 'GT-DeepCOVID',
       'LANL-GrowthRate', 'CEID-Walk', 'IowaStateLW-STEM', 'MUNI-ARIMA',
       'YYG-ParamSearch', 'Columbia_UNC-SurvCon', 'UCSD_NEU-DeepGLEAM',
       'UMich-RidgeTfReg', 'CovidAnalytics-DELPHI', 'MIT_ISOLAT-Mixtures',
       'UT-Mobility', 'MSRA-DeepST', 'UA-EpiCovDA', 'LNQ-ens1',
       'USC-SI_kJalpha', 'CU-select', 'JHU_CSSE-DECOM',
       'SteveMc

In [15]:
'''
The models with ### before appears in paper page six but not in the dataset of full_data.
aux_matched_models = ['AIpert-pwllnod', 
                    'BPagano-RtDriven',
                    'CEID-Walk',
                    ### CMU-TimeSeries
                    'Columbia_UNC-SurvCon',
                    'Covid19Sim-Simulator',
                    ### CovidActNow-SEIR_CAN
                    'CovidAnalytics-DELPHI',
                    'COVIDhub-baseline', 
                    *** 'COVIDhub-4_week_ensemble', 
                    'CU-select',
                    'DDS-NBDS', 
                    'epiforecasts-ensemble1',
                    ### Google_Harvard-CPF
                    ### GT_CHHS-COVID19
                    'GT-DeepCOVID',
                    'IEM_MED-CovidProject',
                    'IHME-CurveFit',
                    'IowaStateLW-STEM',
                    'IUPUI-HkPrMobiDyR',
                    'JCB-PRM',
                    'JHU_CSSE-DECOM',
                    'JHU_IDD-CovidSP',
                    'JHUAPL-Bucky',
                    'Karlen-pypm',
                    'LANL-GrowthRate',
                    'LNQ-ens1',
                    'Microsoft-DeepSTIA',
                    'MIT_CritData-GBCF',
                    'MIT_ISOLAT-Mixtures',
                    'MIT-Cassandra',
                    'MITCovAlliance-SIR',
                    'MOBS-GLEAM_COVID',
                    'MSRA-DeepST',
                    'MUNI-ARIMA',
                    ### NotreDame-FRED
                    'NotreDame-mobility',
                    'OliverWyman-Navigator',
                    'PSI-DRAFT',
                    ### Quantori-Multiagents
                    'RobertWalraven-ESG',
                    'RPI_UW-Mob_Collision', 
                    'SDSC_ISG-TrendModel', 
                    'SigSci-TS', 
                    'SteveMcConnell-CovidComplete',
                    'SWC-TerminusCM', 
                    'UA-EpiCovDA',
                    ### UChicago-CovidIL
                    'UCLA-SuEIR',
                    'UCM_MESALab-FoGSEIR',
                    ### UCSB-ACTS
                    'UCSD_NEU-DeepGLEAM',
                    'UMass-MechBayes',
                    'UMich-RidgeTfReg',
                    'USACE-ERDC_SEIR',
                    'USC-SI_kJalpha',
                    'UT-Mobility', 
                    ### Wadhwani_AI-BayesOpt
                    'YYG-ParamSearch'
                    ]
'''
aux_list_matched = []
for i in aux_matched_models:
    if i in matched_models:
        aux_list_matched.append(i)
    else:
        print("Not match:", i)
print("# matched:", len(aux_list_matched))
print("# total here:", len(aux_matched_models))

print("Not in here, but in page 6 of supplementary material:")
for i in matched_models:
      if i not in aux_matched_models:
          print("   ", i)

Not match: COVIDhub-4_week_ensemble
# matched: 49
# total here: 50
Not in here, but in page 6 of supplementary material:
    CMU-TimeSeries
    CovidActNow-SEIR_CAN
    Google_Harvard-CPF
    GT_CHHS-COVID19
    NotreDame-FRED
    Quantori-Multiagents
    UChicago-CovidIL
    UCSB-ACTS
    Wadhwani_AI-BayesOpt


In [16]:
aux_fs = fs.filter(pl.col('model')!="COVIDhub-4_week_ensemble")
print(len(aux_fs['model'].unique()))
aux_fs

49


model,forecast_date,target_end_date,horizon,lower_95,upper_95,lower_50,upper_50,lower_98,lower_90,lower_80,lower_70,lower_60,lower_40,lower_30,lower_20,lower_10,pred,upper_10,upper_20,upper_30,upper_40,upper_60,upper_70,upper_80,upper_90,upper_98,value
str,datetime[μs],datetime[μs],str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-23 00:00:00,"""1""",15585.12472,30107.23154,,,,,,,,,,,,,,,,,,,,,,21731.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-30 00:00:00,"""2""",15873.52349,36138.769513,,,,,,,,,,,,,,,,,,,,,,21862.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-06 00:00:00,"""3""",16301.06891,44059.836613,,,,,,,,,,,,,,,,,,,,,,21394.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-13 00:00:00,"""4""",16888.65601,54505.710936,,,,,,,,,,,,,,,,,,,,,,17536.0
"""AIpert-pwllnod""",2021-01-18 00:00:00,2021-01-30 00:00:00,"""1""",17927.563802,33673.501984,,,,,,,,,,,,,,,,,,,,,,21862.0
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""epiforecasts-ensemble1""",2022-02-28 00:00:00,2022-04-02 00:00:00,"""4""",3038.0,18729.0,5133.0,11104.0,2896.0,3291.0,3829.0,4302.0,4687.0,5559.0,5950.0,6370.0,6855.0,7380.0,8041.0,8569.0,9332.0,10139.0,12284.0,13382.0,14780.0,16977.0,19713.0,4512.0
"""epiforecasts-ensemble1""",2022-03-07 00:00:00,2022-03-19 00:00:00,"""1""",5558.0,9888.0,6587.0,8533.0,5477.0,5708.0,5990.0,6235.0,6428.0,6769.0,6946.0,7125.0,7333.0,7524.0,7694.0,7876.0,8088.0,8294.0,8787.0,9080.0,9388.0,9661.0,10084.0,7256.0
"""epiforecasts-ensemble1""",2022-03-07 00:00:00,2022-03-26 00:00:00,"""2""",3484.0,9098.0,4574.0,6944.0,3434.0,3605.0,3891.0,4135.0,4344.0,4793.0,4994.0,5175.0,5408.0,5628.0,5887.0,6111.0,6358.0,6661.0,7248.0,7556.0,8045.0,8660.0,9447.0,5412.0
"""epiforecasts-ensemble1""",2022-03-07 00:00:00,2022-04-02 00:00:00,"""3""",1963.0,8653.0,3017.0,5659.0,1886.0,2092.0,2330.0,2584.0,2814.0,3193.0,3404.0,3619.0,3833.0,4109.0,4399.0,4744.0,4993.0,5329.0,6099.0,6583.0,7280.0,8072.0,9068.0,4512.0


In [17]:
aux_fs.write_csv("results/forecast_quantiles_death_v3.csv")

## Computing WIS

In [18]:
fs = fs.rename({'lower_98' : '0.01',
                'lower_95' : '0.025',
                'lower_90' : '0.05',
                'lower_80' : '0.1',
                'lower_70' : '0.15',
                'lower_60' : '0.2',
                'lower_50' : '0.25',
                'lower_40' : '0.3',
                'lower_30' : '0.35',
                'lower_20' : '0.4',
                'lower_10' : '0.45',
                'pred' : '0.5',
                'upper_10' : '0.55',
                'upper_20' : '0.6',
                'upper_30' : '0.65',
                'upper_40' : '0.7',
                'upper_50' : '0.75',
                'upper_60' : '0.8',
                'upper_70' : '0.85',
                'upper_80' : '0.9',
                'upper_90' : '0.95',
                'upper_95' : '0.975',
                'upper_98' : '0.99'})
fs.head()

model,forecast_date,target_end_date,horizon,0.025,0.975,0.25,0.75,0.01,0.05,0.1,0.15,0.2,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.8,0.85,0.9,0.95,0.99,value
str,datetime[μs],datetime[μs],str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-23 00:00:00,"""1""",15585.12472,30107.23154,,,,,,,,,,,,,,,,,,,,,,21731.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-30 00:00:00,"""2""",15873.52349,36138.769513,,,,,,,,,,,,,,,,,,,,,,21862.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-06 00:00:00,"""3""",16301.06891,44059.836613,,,,,,,,,,,,,,,,,,,,,,21394.0
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-13 00:00:00,"""4""",16888.65601,54505.710936,,,,,,,,,,,,,,,,,,,,,,17536.0
"""AIpert-pwllnod""",2021-01-18 00:00:00,2021-01-30 00:00:00,"""1""",17927.563802,33673.501984,,,,,,,,,,,,,,,,,,,,,,21862.0


In [19]:
def compute_wis2(quantiles, y, taus):
    """
    - quantiles (list): List of predictive quantiles q_1 to q_K
    - y (float): Observed quantity
    - taus (list): List of corresponding tau values (e.g., [0.025, 0.25, 0.5, 0.75, 0.975])
    
    Returns:
        float: WIS value
    """
    K = len(quantiles)
    wis = 0
    for k in range(K):
        indicator = 1 if y <= quantiles[k] else 0
        wis += 2 * (indicator - taus[k]) * (quantiles[k] - y)
    return wis / K
    
def maybe_compute_wis2(quantiles, y, taus):
    try:
        return compute_wis2(quantiles, y, taus)
    except Exception:
        return np.nan

        
taus = [0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99]
def row_wis2(row,taus):
    quantiles = [row[str(t)] for t in taus]
    y = row['value']
    return maybe_compute_wis2(quantiles, y, taus)

In [20]:
fs = fs.with_columns(
    pl.struct(pl.all()).map_elements(lambda r: row_wis2(r,taus),return_dtype=float).alias('wis')
)

In [21]:
fs.head()

model,forecast_date,target_end_date,horizon,0.025,0.975,0.25,0.75,0.01,0.05,0.1,0.15,0.2,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.8,0.85,0.9,0.95,0.99,value,wis
str,datetime[μs],datetime[μs],str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-23 00:00:00,"""1""",15585.12472,30107.23154,,,,,,,,,,,,,,,,,,,,,,21731.0,
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-01-30 00:00:00,"""2""",15873.52349,36138.769513,,,,,,,,,,,,,,,,,,,,,,21862.0,
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-06 00:00:00,"""3""",16301.06891,44059.836613,,,,,,,,,,,,,,,,,,,,,,21394.0,
"""AIpert-pwllnod""",2021-01-11 00:00:00,2021-02-13 00:00:00,"""4""",16888.65601,54505.710936,,,,,,,,,,,,,,,,,,,,,,17536.0,
"""AIpert-pwllnod""",2021-01-18 00:00:00,2021-01-30 00:00:00,"""1""",17927.563802,33673.501984,,,,,,,,,,,,,,,,,,,,,,21862.0,


## Model Ensamble - Untrained

*Non-Robust untrained*: Use the mean of each quantile prediction in each component forecast

*Robust untrained*: Use the median of each quantile prediction in each component forecast


Let us create a dataframe with the mean and median of each quantile prediction in each component forecast.

In [22]:
components = fs.join(models.filter('include_ensemble'),on='model',how='inner') #filter only models that are included in the ensemble

# Compute the mediana and mean
components = components.group_by(['target_end_date','horizon']).agg(
    [pl.col(str(q)).mean().alias('mean_' + str(q)) for q in taus] +
    [pl.col(str(q)).median().alias('median_' + str(q)) for q in taus]+
    [pl.col('value').first()]
).sort('target_end_date')

# Adjust the dataframe format
components = components.unpivot(
    index=['target_end_date','horizon'],
    on   = ['mean_'+str(q) for q in taus] + ['median_'+str(q) for q in taus]
).with_columns(
    pl.col('variable').str.split('_').map_elements(lambda x: float(x[1]) if len(x) > 1 else None,return_dtype=float).alias('quantile'),
    pl.col('variable').str.split('_').map_elements(lambda x: 'nonrobust_untrained' if x[0] == 'mean' else 'robust_untrained',return_dtype=str).alias('model')
).sort(['model','target_end_date','horizon','quantile'])

In [40]:
components

target_end_date,horizon,variable,value,quantile,model
datetime[μs],str,str,f64,f64,str
2020-05-30 00:00:00,"""1""","""mean_0.01""",4814.87207,0.01,"""nonrobust_untrained"""
2020-05-30 00:00:00,"""1""","""mean_0.025""",5267.539624,0.025,"""nonrobust_untrained"""
2020-05-30 00:00:00,"""1""","""mean_0.05""",5710.081472,0.05,"""nonrobust_untrained"""
2020-05-30 00:00:00,"""1""","""mean_0.1""",6463.135493,0.1,"""nonrobust_untrained"""
2020-05-30 00:00:00,"""1""","""mean_0.15""",6775.090371,0.15,"""nonrobust_untrained"""
…,…,…,…,…,…
2022-04-16 00:00:00,"""4""","""median_0.85""",8567.0,0.85,"""robust_untrained"""
2022-04-16 00:00:00,"""4""","""median_0.9""",9682.608271,0.9,"""robust_untrained"""
2022-04-16 00:00:00,"""4""","""median_0.95""",11446.59615,0.95,"""robust_untrained"""
2022-04-16 00:00:00,"""4""","""median_0.975""",12006.9889,0.975,"""robust_untrained"""


Let us now compare our ensemble with the one that comes directly from the CovidHub dataset. 
Note that the CovidHub only has the `COVIDhub-4_week_ensemble`, which is the robust untrained ensemble. So we compare that with a robust untrained ensemble (made by us) using only the $49$ models that we have acess.

In [24]:
covidhub_robust_untrained = fs.sort('target_end_date').filter(
    pl.col('model') == 'COVIDhub-4_week_ensemble'
)

covidhub_robust_untrained = covidhub_robust_untrained.rename({'value':'y'})\
    .unpivot(
        on=[str(t) for t in taus],
        index=["target_end_date",'model','horizon','forecast_date','wis','y'])\
    .rename({'variable':'quantile'})\
    .with_columns(
        pl.col('quantile').cast(pl.Float64)
    ).sort(['model','target_end_date','horizon','quantile'])


In [25]:
comparison = covidhub_robust_untrained.join(components.filter(pl.col('model') =='robust_untrained')[['target_end_date','horizon','quantile','value']],
                                    on=['target_end_date','horizon','quantile'])

comparison = comparison.with_columns(
    (pl.col('value')- pl.col('value_right')).alias('diff'),
)

comparison = comparison.with_columns(
    (pl.col('diff')/pl.col('value')).alias('relative_diff')
)

In [26]:
comparison = comparison.rename({"value_right": "value_calculated", "value": "value_COVIDhub"})

In [27]:
comparison.sort('relative_diff')

target_end_date,model,horizon,forecast_date,wis,y,quantile,value_COVIDhub,value_calculated,diff,relative_diff
datetime[μs],str,str,datetime[μs],f64,f64,f64,f64,f64,f64,f64
2020-07-25 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-06-22 00:00:00,1738.453054,6461.0,0.01,1892.473494,2373.95344,-481.479946,-0.254418
2020-07-25 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-06-22 00:00:00,1738.453054,6461.0,0.025,2020.397113,2457.500189,-437.103075,-0.216345
2020-08-08 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-07-06 00:00:00,1801.97514,7264.0,0.25,3449.45079,4177.818084,-728.367294,-0.211155
2020-08-08 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-07-06 00:00:00,1801.97514,7264.0,0.35,3802.982724,4576.103232,-773.120508,-0.203293
2020-08-08 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-07-06 00:00:00,1801.97514,7264.0,0.3,3635.410395,4373.130688,-737.720293,-0.202926
…,…,…,…,…,…,…,…,…,…,…
2020-07-11 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-06-08 00:00:00,321.610967,5295.0,0.65,5770.706205,3481.888198,2288.818007,0.396627
2020-07-11 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-06-08 00:00:00,321.610967,5295.0,0.55,5386.148982,3247.470652,2138.67833,0.39707
2020-07-11 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-06-08 00:00:00,321.610967,5295.0,0.6,5565.924099,3354.581226,2211.342874,0.3973
2020-07-11 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2020-06-08 00:00:00,321.610967,5295.0,0.7,6027.753629,3602.763074,2424.990556,0.402304


In [28]:
(
    # ggplot(data=comparison.filter(pl.col('horizon')=='1',pl.col('quantile')==0.5))
    # ggplot(data=comparison.filter(pl.col('horizon')=='2'))
    ggplot(data=comparison)
    + geom_line(aes(x='target_end_date',y='value_COVIDhub'),color='blue')
    + geom_line(aes(x='target_end_date', y='value_calculated'), color='red', linetype='dashed')
    + geom_point(aes(x='target_end_date', y='diff'), color='green')
    + facet_grid(x='horizon',y='quantile',scales='free_y')
)

In [29]:
comparison_selected = comparison.select(["target_end_date","model","horizon","forecast_date","wis","y", "quantile","value_COVIDhub","value_calculated"])
comparison_selected = comparison_selected.rename({"wis": "wis_COVIDhub"})
comparison_selected

target_end_date,model,horizon,forecast_date,wis_COVIDhub,y,quantile,value_COVIDhub,value_calculated
datetime[μs],str,str,datetime[μs],f64,f64,f64,f64,f64
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,0.01,4192.586346,3530.078525
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,0.025,4386.786431,3651.724528
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,0.05,4592.840585,3768.819348
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,0.1,4819.626987,3990.9
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,0.15,4974.223759,4225.913469
…,…,…,…,…,…,…,…,…
2022-04-16 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2022-03-14 00:00:00,788.510435,3127.0,0.85,8567.0,8567.0
2022-04-16 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2022-03-14 00:00:00,788.510435,3127.0,0.9,9683.0,9682.608271
2022-04-16 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2022-03-14 00:00:00,788.510435,3127.0,0.95,11447.0,11446.59615
2022-04-16 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2022-03-14 00:00:00,788.510435,3127.0,0.975,11692.0,12006.9889


In [30]:
id_cols = ["target_end_date", "model", "horizon", "forecast_date", "wis_COVIDhub", "y"]

df_long = comparison_selected.unpivot(
    index=["quantile"] + id_cols,
    on=["value_COVIDhub", "value_calculated"],
    variable_name="value_type",
    value_name="value"
)

df_long = df_long.with_columns(
    (pl.col("quantile").cast(str) + "_" + pl.col("value_type").str.replace("value_", "")).alias("cat_val_type")
)

df_wide = df_long.pivot(
    values="value",
    index=id_cols,
    on="cat_val_type",
    aggregate_function="first"
)

df_wide

target_end_date,model,horizon,forecast_date,wis_COVIDhub,y,0.01_COVIDhub,0.025_COVIDhub,0.05_COVIDhub,0.1_COVIDhub,0.15_COVIDhub,0.2_COVIDhub,0.25_COVIDhub,0.3_COVIDhub,0.35_COVIDhub,0.4_COVIDhub,0.45_COVIDhub,0.5_COVIDhub,0.55_COVIDhub,0.6_COVIDhub,0.65_COVIDhub,0.7_COVIDhub,0.75_COVIDhub,0.8_COVIDhub,0.85_COVIDhub,0.9_COVIDhub,0.95_COVIDhub,0.975_COVIDhub,0.99_COVIDhub,0.01_calculated,0.025_calculated,0.05_calculated,0.1_calculated,0.15_calculated,0.2_calculated,0.25_calculated,0.3_calculated,0.35_calculated,0.4_calculated,0.45_calculated,0.5_calculated,0.55_calculated,0.6_calculated,0.65_calculated,0.7_calculated,0.75_calculated,0.8_calculated,0.85_calculated,0.9_calculated,0.95_calculated,0.975_calculated,0.99_calculated
datetime[μs],str,str,datetime[μs],f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,4192.586346,4386.786431,4592.840585,4819.626987,4974.223759,5124.842101,5248.872354,5358.659994,5472.227019,5565.827844,5654.499757,5773.725831,5880.153643,6010.892246,6137.567112,6256.263617,6379.734312,6510.537952,6682.963581,6901.63735,7219.748612,7564.237784,7876.355869,3530.078525,3651.724528,3768.819348,3990.9,4225.913469,4333.616709,4435.91808,4592.712092,4742.2373,4778.871429,4815.275,4950.471578,5048.458128,5175.406428,5269.058938,5443.5331,5920.880744,6059.759956,6220.848925,6407.199732,6679.167733,6944.270268,7746.782128
2020-06-27 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-15 00:00:00,627.193827,3865.0,3552.442503,3702.651719,3904.707491,4109.553631,4256.332343,4365.428314,4467.726067,4561.046199,4651.148473,4735.769383,4815.083255,4890.467505,4984.828231,5058.33847,5139.078523,5257.635567,5382.494237,5483.795395,5643.786849,5814.74096,6054.048164,6302.481563,6509.417741,3247.529281,3477.438251,3762.915873,4032.325242,4280.976172,4350.544295,4418.301268,4530.0,4574.5,4636.503623,4714.276677,4839.501048,4927.645106,5004.481031,5127.327358,5218.644103,5337.747119,5455.897697,5623.893424,5807.87048,5985.024082,6155.740782,6343.097031
2020-06-27 00:00:00,"""COVIDhub-4_week_ensemble""","""2""",2020-06-08 00:00:00,1033.442342,3865.0,3717.985802,3893.134537,4125.763798,4380.737749,4570.166569,4730.929302,4863.553319,4994.230479,5116.020477,5255.831687,5368.395279,5504.412064,5618.759366,5756.05309,5884.392323,6033.130212,6215.611691,6390.14996,6610.400476,6882.832778,7283.320097,7693.319991,8095.932636,2750.917664,2954.214373,3186.831811,3270.768953,3434.126137,3575.506514,3674.408136,3770.808123,3875.882055,4262.6954,4394.742857,4437.107143,4480.096429,4613.761744,4822.025809,5021.614892,5247.969635,5523.179032,5841.616016,6263.940378,6988.268599,7494.523865,7838.523052
2020-07-04 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-22 00:00:00,167.775616,3682.0,2679.983604,2809.01954,2957.335159,3143.215735,3282.840098,3403.545895,3510.515774,3595.094581,3680.313056,3756.568709,3832.657715,3912.870661,3988.67385,4078.217776,4185.582627,4278.004753,4372.66253,4502.695883,4670.232954,4836.242835,5086.35473,5258.268595,5451.284614,2621.892254,2799.749741,3026.671176,3220.407029,3359.515275,3489.176876,3656.850934,3829.711722,3972.705368,4066.297011,4123.783667,4232.410965,4294.770482,4346.70191,4476.299668,4591.250168,4682.923826,4790.914982,4897.076545,5002.829046,5233.166321,5432.962373,5651.554272
2020-07-04 00:00:00,"""COVIDhub-4_week_ensemble""","""2""",2020-06-15 00:00:00,642.443695,3682.0,3143.270625,3340.46516,3552.218407,3801.157639,3976.372273,4102.68465,4226.584454,4337.782613,4449.410196,4564.762033,4675.989301,4784.258716,4905.479031,5033.091747,5138.514865,5282.606983,5410.21708,5567.25468,5761.062059,6048.368835,6355.891015,6665.631794,6986.692647,2720.210271,2990.818192,3374.813586,3653.084701,3881.126887,4010.064098,4141.990919,4245.076071,4447.214779,4560.344998,4677.55615,4870.629358,4955.739515,5045.045874,5129.757433,5262.298217,5357.945566,5466.430745,5600.414814,5793.346337,6050.076593,6270.388635,6538.705481
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
2022-04-02 00:00:00,"""COVIDhub-4_week_ensemble""","""3""",2022-03-07 00:00:00,786.703478,4512.0,1599.0,1998.0,2484.0,3072.0,3617.0,3879.0,4108.0,4383.0,4780.0,5167.0,5439.0,5695.0,6089.0,6703.0,7051.0,7239.0,7597.0,8331.0,9420.0,10041.0,10518.0,10816.0,11088.0,1455.733736,1980.5,2471.323439,3072.337983,3519.067624,3834.326225,4105.030135,4372.65695,4741.35535,5096.4593,5367.69055,5716.5,6159.388763,6716.238805,7148.244043,7246.551465,7763.528398,8482.30637,9535.065205,10122.5,10669.849805,11093.026465,11359.550749
2022-04-02 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2022-02-28 00:00:00,1228.288696,4512.0,2270.0,2538.0,3034.0,3742.0,4327.0,4608.0,4987.0,5426.0,5882.0,6209.0,6466.0,6902.0,7213.0,7452.0,7716.0,7977.0,8928.0,9284.0,9598.0,10384.0,12006.0,12482.0,13397.0,1773.378341,2516.294981,2885.829514,3637.309495,4148.761985,4521.5,4970.0,5360.358192,5848.919667,6129.1,6461.304635,6771.069887,6976.569887,7255.0434,7567.7673,7870.9478,8652.677141,9346.575579,9935.51545,10498.592447,12720.394089,14314.677725,15464.06526
2022-04-09 00:00:00,"""COVIDhub-4_week_ensemble""","""3""",2022-03-14 00:00:00,661.452609,3902.0,1257.0,1964.0,2153.0,2655.0,3095.0,3295.0,3562.0,3784.0,4040.0,4278.0,4438.0,4791.0,5112.0,5398.0,6047.0,6485.0,6957.0,7366.0,8293.0,9195.0,10491.0,11003.0,12120.0,1257.47893,1964.154632,2153.4,2655.211044,3095.3769,3295.146427,3562.474509,3784.103369,4040.456229,4278.12202,4480.379788,4812.619017,5157.0,5417.5,6046.818142,6484.66651,6956.838309,7365.874529,8292.548506,9194.745307,10490.80725,11002.586688,12137.607321
2022-04-09 00:00:00,"""COVIDhub-4_week_ensemble""","""4""",2022-03-07 00:00:00,789.432609,3902.0,841.0,1148.0,1635.0,2166.0,2428.0,2677.0,2968.0,3368.0,3739.0,4164.0,4370.0,4752.0,5244.0,5717.0,6597.0,7046.0,7464.0,8193.0,9089.0,10140.0,10468.0,10583.0,11241.0,766.240839,1095.0,1580.579455,2072.493357,2357.071017,2643.264342,2965.273809,3317.376117,3654.036396,4048.228298,4325.072112,4800.0,5329.927888,5854.929865,6600.627576,7064.642754,7517.845774,8198.995448,9310.288865,10145.846131,10489.5,10614.063433,11450.000024


In [31]:
df_wide = df_wide.rename({'0.01_COVIDhub' : 'lower_98_COVIDhub',
                        '0.025_COVIDhub' : 'lower_95_COVIDhub',
                        '0.05_COVIDhub' : 'lower_90_COVIDhub',
                        '0.1_COVIDhub' : 'lower_80_COVIDhub',
                        '0.15_COVIDhub' : 'lower_70_COVIDhub',
                        '0.2_COVIDhub' : 'lower_60_COVIDhub',
                        '0.25_COVIDhub' : 'lower_50_COVIDhub',
                        '0.3_COVIDhub' : 'lower_40_COVIDhub',
                        '0.35_COVIDhub' : 'lower_30_COVIDhub',
                        '0.4_COVIDhub' : 'lower_20_COVIDhub',
                        '0.45_COVIDhub' : 'lower_10_COVIDhub',
                        '0.5_COVIDhub' : 'pred_COVIDhub',
                        '0.55_COVIDhub' : 'upper_10_COVIDhub',
                        '0.6_COVIDhub' :'upper_20_COVIDhub',
                        '0.65_COVIDhub' :'upper_30_COVIDhub',
                        '0.7_COVIDhub' : 'upper_40_COVIDhub',
                        '0.75_COVIDhub' :'upper_50_COVIDhub',
                        '0.8_COVIDhub' :'upper_60_COVIDhub',
                        '0.85_COVIDhub' : 'upper_70_COVIDhub',
                        '0.9_COVIDhub' :'upper_80_COVIDhub',
                        '0.95_COVIDhub' :'upper_90_COVIDhub',
                        '0.975_COVIDhub' :'upper_95_COVIDhub',
                        '0.99_COVIDhub' :'upper_98_COVIDhub',
                        '0.01_calculated' : 'lower_98_calculated',
                        '0.025_calculated' : 'lower_95_calculated',
                        '0.05_calculated' : 'lower_90_calculated',
                        '0.1_calculated' : 'lower_80_calculated',
                        '0.15_calculated' : 'lower_70_calculated',
                        '0.2_calculated' : 'lower_60_calculated',
                        '0.25_calculated' : 'lower_50_calculated',
                        '0.3_calculated' : 'lower_40_calculated',
                        '0.35_calculated' : 'lower_30_calculated',
                        '0.4_calculated' : 'lower_20_calculated',
                        '0.45_calculated' : 'lower_10_calculated',
                        '0.5_calculated' : 'pred_calculated',
                        '0.55_calculated' : 'upper_10_calculated',
                        '0.6_calculated' :'upper_20_calculated',
                        '0.65_calculated' :'upper_30_calculated',
                        '0.7_calculated' : 'upper_40_calculated',
                        '0.75_calculated' :'upper_50_calculated',
                        '0.8_calculated' :'upper_60_calculated',
                        '0.85_calculated' : 'upper_70_calculated',
                        '0.9_calculated' :'upper_80_calculated',
                        '0.95_calculated' :'upper_90_calculated',
                        '0.975_calculated' :'upper_95_calculated',
                        '0.99_calculated' :'upper_98_calculated'})

df_wide.head()

target_end_date,model,horizon,forecast_date,wis_COVIDhub,y,lower_98_COVIDhub,lower_95_COVIDhub,lower_90_COVIDhub,lower_80_COVIDhub,lower_70_COVIDhub,lower_60_COVIDhub,lower_50_COVIDhub,lower_40_COVIDhub,lower_30_COVIDhub,lower_20_COVIDhub,lower_10_COVIDhub,pred_COVIDhub,upper_10_COVIDhub,upper_20_COVIDhub,upper_30_COVIDhub,upper_40_COVIDhub,upper_50_COVIDhub,upper_60_COVIDhub,upper_70_COVIDhub,upper_80_COVIDhub,upper_90_COVIDhub,upper_95_COVIDhub,upper_98_COVIDhub,lower_98_calculated,lower_95_calculated,lower_90_calculated,lower_80_calculated,lower_70_calculated,lower_60_calculated,lower_50_calculated,lower_40_calculated,lower_30_calculated,lower_20_calculated,lower_10_calculated,pred_calculated,upper_10_calculated,upper_20_calculated,upper_30_calculated,upper_40_calculated,upper_50_calculated,upper_60_calculated,upper_70_calculated,upper_80_calculated,upper_90_calculated,upper_95_calculated,upper_98_calculated
datetime[μs],str,str,datetime[μs],f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
2020-06-20 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-08 00:00:00,1096.222806,4150.0,4192.586346,4386.786431,4592.840585,4819.626987,4974.223759,5124.842101,5248.872354,5358.659994,5472.227019,5565.827844,5654.499757,5773.725831,5880.153643,6010.892246,6137.567112,6256.263617,6379.734312,6510.537952,6682.963581,6901.63735,7219.748612,7564.237784,7876.355869,3530.078525,3651.724528,3768.819348,3990.9,4225.913469,4333.616709,4435.91808,4592.712092,4742.2373,4778.871429,4815.275,4950.471578,5048.458128,5175.406428,5269.058938,5443.5331,5920.880744,6059.759956,6220.848925,6407.199732,6679.167733,6944.270268,7746.782128
2020-06-27 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-15 00:00:00,627.193827,3865.0,3552.442503,3702.651719,3904.707491,4109.553631,4256.332343,4365.428314,4467.726067,4561.046199,4651.148473,4735.769383,4815.083255,4890.467505,4984.828231,5058.33847,5139.078523,5257.635567,5382.494237,5483.795395,5643.786849,5814.74096,6054.048164,6302.481563,6509.417741,3247.529281,3477.438251,3762.915873,4032.325242,4280.976172,4350.544295,4418.301268,4530.0,4574.5,4636.503623,4714.276677,4839.501048,4927.645106,5004.481031,5127.327358,5218.644103,5337.747119,5455.897697,5623.893424,5807.87048,5985.024082,6155.740782,6343.097031
2020-06-27 00:00:00,"""COVIDhub-4_week_ensemble""","""2""",2020-06-08 00:00:00,1033.442342,3865.0,3717.985802,3893.134537,4125.763798,4380.737749,4570.166569,4730.929302,4863.553319,4994.230479,5116.020477,5255.831687,5368.395279,5504.412064,5618.759366,5756.05309,5884.392323,6033.130212,6215.611691,6390.14996,6610.400476,6882.832778,7283.320097,7693.319991,8095.932636,2750.917664,2954.214373,3186.831811,3270.768953,3434.126137,3575.506514,3674.408136,3770.808123,3875.882055,4262.6954,4394.742857,4437.107143,4480.096429,4613.761744,4822.025809,5021.614892,5247.969635,5523.179032,5841.616016,6263.940378,6988.268599,7494.523865,7838.523052
2020-07-04 00:00:00,"""COVIDhub-4_week_ensemble""","""1""",2020-06-22 00:00:00,167.775616,3682.0,2679.983604,2809.01954,2957.335159,3143.215735,3282.840098,3403.545895,3510.515774,3595.094581,3680.313056,3756.568709,3832.657715,3912.870661,3988.67385,4078.217776,4185.582627,4278.004753,4372.66253,4502.695883,4670.232954,4836.242835,5086.35473,5258.268595,5451.284614,2621.892254,2799.749741,3026.671176,3220.407029,3359.515275,3489.176876,3656.850934,3829.711722,3972.705368,4066.297011,4123.783667,4232.410965,4294.770482,4346.70191,4476.299668,4591.250168,4682.923826,4790.914982,4897.076545,5002.829046,5233.166321,5432.962373,5651.554272
2020-07-04 00:00:00,"""COVIDhub-4_week_ensemble""","""2""",2020-06-15 00:00:00,642.443695,3682.0,3143.270625,3340.46516,3552.218407,3801.157639,3976.372273,4102.68465,4226.584454,4337.782613,4449.410196,4564.762033,4675.989301,4784.258716,4905.479031,5033.091747,5138.514865,5282.606983,5410.21708,5567.25468,5761.062059,6048.368835,6355.891015,6665.631794,6986.692647,2720.210271,2990.818192,3374.813586,3653.084701,3881.126887,4010.064098,4141.990919,4245.076071,4447.214779,4560.344998,4677.55615,4870.629358,4955.739515,5045.045874,5129.757433,5262.298217,5357.945566,5466.430745,5600.414814,5793.346337,6050.076593,6270.388635,6538.705481


In [32]:
df_wide.write_csv("results/comparison_4weeks_WIS_death_v3.csv")

In [33]:
df_aux = pd.read_csv("results/comparison_4weeks_WIS_death_v3.csv")
df_aux

quantile_probs = [0.01, 0.025,
                  0.05, 0.1, 0.15,
                  0.2, 0.25, 0.3,
                  0.35, 0.4, 0.45,
                  0.5,
                  0.55, 0.6, 0.65,
                  0.7, 0.75, 0.8,
                  0.85, 0.9, 0.95,
                  0.975, 0.99]

quantile_names = ['lower_98', 'lower_95',
                  'lower_90', 'lower_80', 'lower_70',
                  'lower_60', 'lower_50', 'lower_40',
                  'lower_30', 'lower_20', 'lower_10',
                  'pred',
                  'upper_10', 'upper_20', 'upper_30',
                  'upper_40', 'upper_50', 'upper_60',
                  'upper_70', 'upper_80', 'upper_90',
                  'upper_95', 'upper_98']

rename_map = {f"{col}_calculated" : col for col in quantile_names}
df_aux.rename(columns=rename_map, inplace=True)

wis_calculated = compute_wis(df_aux[quantile_names],df_aux.y.to_numpy())
df_aux['wis_calculated'] = wis_calculated
df_aux

rename_map = {col : f"{col}_calculated" for col in quantile_names}
df_aux.rename(columns=rename_map, inplace=True)
df_aux

Unnamed: 0,target_end_date,model,horizon,forecast_date,wis_COVIDhub,y,lower_98_COVIDhub,lower_95_COVIDhub,lower_90_COVIDhub,lower_80_COVIDhub,...,upper_30_calculated,upper_40_calculated,upper_50_calculated,upper_60_calculated,upper_70_calculated,upper_80_calculated,upper_90_calculated,upper_95_calculated,upper_98_calculated,wis_calculated
0,2020-06-20T00:00:00.000000,COVIDhub-4_week_ensemble,1,2020-06-08T00:00:00.000000,1096.222806,4150.0,4192.586346,4386.786431,4592.840585,4819.626987,...,5269.058938,5443.533100,5920.880744,6059.759956,6220.848925,6407.199732,6679.167733,6944.270268,7746.782128,478.721553
1,2020-06-27T00:00:00.000000,COVIDhub-4_week_ensemble,1,2020-06-15T00:00:00.000000,627.193827,3865.0,3552.442503,3702.651719,3904.707491,4109.553631,...,5127.327358,5218.644103,5337.747119,5455.897697,5623.893424,5807.870480,5985.024082,6155.740782,6343.097031,590.166110
2,2020-06-27T00:00:00.000000,COVIDhub-4_week_ensemble,2,2020-06-08T00:00:00.000000,1033.442342,3865.0,3717.985802,3893.134537,4125.763798,4380.737749,...,4822.025809,5021.614892,5247.969635,5523.179032,5841.616016,6263.940378,6988.268599,7494.523865,7838.523052,339.775914
3,2020-07-04T00:00:00.000000,COVIDhub-4_week_ensemble,1,2020-06-22T00:00:00.000000,167.775616,3682.0,2679.983604,2809.019540,2957.335159,3143.215735,...,4476.299668,4591.250168,4682.923826,4790.914982,4897.076545,5002.829046,5233.166321,5432.962373,5651.554272,283.975891
4,2020-07-04T00:00:00.000000,COVIDhub-4_week_ensemble,2,2020-06-15T00:00:00.000000,642.443695,3682.0,3143.270625,3340.465160,3552.218407,3801.157639,...,5129.757433,5262.298217,5357.945566,5466.430745,5600.414814,5793.346337,6050.076593,6270.388635,6538.705481,605.906672
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
367,2022-04-02T00:00:00.000000,COVIDhub-4_week_ensemble,3,2022-03-07T00:00:00.000000,786.703478,4512.0,1599.000000,1998.000000,2484.000000,3072.000000,...,7148.244043,7246.551465,7763.528398,8482.306370,9535.065205,10122.500000,10669.849805,11093.026465,11359.550749,797.297798
368,2022-04-02T00:00:00.000000,COVIDhub-4_week_ensemble,4,2022-02-28T00:00:00.000000,1228.288696,4512.0,2270.000000,2538.000000,3034.000000,3742.000000,...,7567.767300,7870.947800,8652.677141,9346.575579,9935.515450,10498.592447,12720.394089,14314.677725,15464.065260,1195.578965
369,2022-04-09T00:00:00.000000,COVIDhub-4_week_ensemble,3,2022-03-14T00:00:00.000000,661.452609,3902.0,1257.000000,1964.000000,2153.000000,2655.000000,...,6046.818142,6484.666510,6956.838309,7365.874529,8292.548506,9194.745307,10490.807250,11002.586688,12137.607321,666.851446
370,2022-04-09T00:00:00.000000,COVIDhub-4_week_ensemble,4,2022-03-07T00:00:00.000000,789.432609,3902.0,841.000000,1148.000000,1635.000000,2166.000000,...,6600.627576,7064.642754,7517.845774,8198.995448,9310.288865,10145.846131,10489.500000,10614.063433,11450.000024,803.349601


In [34]:
df_aux.to_csv("results/comparison_4weeks_WIS_death_v3.csv")

**Observation:** value_calculated and wis_calculated contains respectively, our quantile median (based on $49$ selected models that are in the paper cited above to make the ensemble, but not all names appear on the data and there's some extra models name in the data avaiable, so we filtered $49$ that are common on paper and at dataset) and the WIS of $49$ based on "COVIDhu's" methodology robust untrainded. In the total on page $6$ of supplementary material appears $58$ models.