# Ensemble Evaluation: Timepoint 3

Location: New York State

Timepoint 3: January 4, 2022. Setting: New York State coinciding with the arrival of the first Omicron wave. At-home testing widely available.

For each timepoint, consider the following:
 - What is the most relevant data to use for model calibration?
 - What was our understanding of COVID-19 viral mechanisms at the time? For example, early in the pandemic, we didn't know if reinfection was a common occurance, or even possible.
 - What are the parameters related to contagiousness/transmissibility and severity of the dominant strain at the time?
 - What policies were in place for a stated location, and how can this information be incorporated into models?

For each timepoint:

1. (a) Take a single model, calibrate it using any historical data prior to the given date, and create a 4-week forecast for cases, hospitalizations, and deaths beginning on the given date. (b) Evaluate the forecast using the COVID-19 Forecasting Hub Error Metrics (WIS, MAE). The single model evaluation should be done in the same way as the ensemble.

2. Repeat (1), but with an ensemble of different models.

a. It is fine to calibrate each model independently and weight naively.

b. It would also be fine to calibrate the ensemble as a whole, assigning weights to the different component models, so that you minimize the error of the ensemble vs. historical data.

c. Use the calibration scores and error metrics computed by the CDC Forecasting Hub. As stated on their website:

“Periodically, we evaluate the accuracy and precision of the ensemble forecast and component models over recent and historical forecasting periods. Models forecasting incident hospitalizations at a national and state level are evaluated using adjusted relative weighted interval scores (WIS, a measure of distributional accuracy), and adjusted relative mean absolute error (MAE), and calibration scores. Scores are evaluated across weeks, locations, and targets. You can read a paper explaining these procedures in more detail, and look at the most recent monthly evaluation reports. The final report that includes case and death forecast evaluations is 2023-03-13.”

3. Produce the forecast outputs in the format specified by the CDC forecasting challenge, including the specified quantiles.

### Load dependencies

In [1]:
import os
import pandas as pd
import numpy as np
from pyciemss.Ensemble.interfaces import (
    load_and_sample_petri_ensemble, load_and_calibrate_and_sample_ensemble_model
)
from pyciemss.PetriNetODE.interfaces import (
    load_and_sample_petri_model,
    load_and_calibrate_and_sample_petri_model,
    load_and_optimize_and_sample_petri_model,
    load_and_calibrate_and_optimize_and_sample_petri_model
)
from pyciemss.visuals import plots
from pyciemss.utils import get_tspan

## Get data

In [2]:
# url = 'https://raw.githubusercontent.com/DARPA-ASKEM/experiments/main/thin-thread-examples/milestone_12month/evaluation/ensemble_eval_SA/datasets/aabb3684-a7ea-4f60-98f1-a8e673ad6df5/dataset.csv'
url = 'https://raw.githubusercontent.com/ciemss/pyciemss/283-july-evaluation-scenario-3/notebook/july_evaluation/Scenario3/data/processed_dataset.csv'
ww_data = pd.read_csv(url, index_col="time")
ww_data_train = ww_data[0:80]
ww_data_train.to_csv("ww_data3.csv")
# # Grab test data for four-week forecast (01/04/2022 - 02/01/2022)
# test_data = ny_data[0:742].drop(columns="timestep")

# # Select historical data up to Timepoint 3: 01/03/2022 (the first 714 rows)
# ny_data = ny_data[0:713]
# ny_data[["I", "H", "D"]].to_csv("NY_data3.csv")

## SEIV model

In [3]:
import sympy
import itertools

from mira.metamodel import *
from mira.modeling import Model
from mira.modeling.askenet.petrinet import AskeNetPetriNetModel

person_units = lambda: Unit(expression=sympy.Symbol('person'))
virus_units = lambda: Unit(expression=sympy.Symbol('virus'))
virus_per_gram_units = lambda: Unit(expression=sympy.Symbol('virus')/sympy.Symbol('gram'))
day_units = lambda: Unit(expression=sympy.Symbol('day'))
per_day_units = lambda: Unit(expression=1/sympy.Symbol('day'))
dimensionless_units = lambda: Unit(expression=sympy.Integer('1'))
gram_units = lambda: Unit(expression=sympy.Symbol('gram'))
per_day_per_person_units = lambda: Unit(expression=1/(sympy.Symbol('day')*sympy.Symbol('person')))

# See Table 1 of the paper
c = {
    'S': Concept(name='S', units=person_units(), identifiers={'ido': '0000514'}),
    'E': Concept(name='E', units=person_units(), identifiers={'apollosv': '0000154'}),
    'I': Concept(name='I', units=person_units(), identifiers={'ido': '0000511'}),
    'V': Concept(name='V', units=person_units(), identifiers={'vido': '0001331'}),
}


parameters = {
    'gamma': Parameter(name='gamma', value=0.08, units=per_day_units()),
    'delta': Parameter(name='delta', value=1/8, units=per_day_units()),
    'alpha': Parameter(name='alpha', value=500, units=gram_units(),
                       distribution=Distribution(type='Uniform1',
                                                 parameters={
                                                     'minimum': 51,
                                                     'maximum': 796
                                                 })),
    'lambda': Parameter(name='lambda', value=0.2, 
                            distribution=Distribution(type='Uniform1',
                                                      parameters={
                                                          'minimum': 0.1,
                                                          'maximum': 0.3
                                                      }),
                        units=per_day_per_person_units()),
    'beta': Parameter(name='beta', value=4.49e7, units=virus_per_gram_units()),
    'k': Parameter(name='k', value=1/3, units=per_day_units()),
}

initials = {
    'S': Initial(concept=Concept(name='S'), value=2_300_000),
    'E': Initial(concept=Concept(name='E'), value=1000),
    'I': Initial(concept=Concept(name='I'), value=0),
    'V': Initial(concept=Concept(name='V'), value=0),
}

S, E, I, V, gamma, delta, alpha, lmbd, beta, k = \
    sympy.symbols('S E I V gamma delta alpha lambda beta k')

t1 = ControlledConversion(subject=c['S'],
                          outcome=c['E'],
                          controller=c['I'],
                          rate_law=S*I*lmbd/(S+I+E))
t2 = NaturalConversion(subject=c['E'],
                       outcome=c['I'],
                       rate_law=k*E)
t3 = NaturalDegradation(subject=c['I'],
                        rate_law=delta*I)
t4 = ControlledProduction(outcome=c['V'],
                          controller=c['I'],
                          rate_law=alpha*beta*(1-gamma)*I)
templates = [t1, t2, t3, t4]
observables = {}
SEIV = TemplateModel(
    templates=templates,
    parameters=parameters,
    initials=initials,
    time=Time(name='t', units=day_units()),
    observables=observables,
    annotations=Annotations(name='Scenario 3 base model'))

### Plot prior viral load against measured viral load

In [4]:
num_samples = 10
start_time = 0
end_time = 80 #226 # between 10/02/2020 and 01/25/2021
num_timepoints = (end_time-start_time)*10 + 1

timepoints = list(get_tspan(start_time, end_time, num_timepoints).detach().numpy())

prior_samples = load_and_sample_petri_model(SEIV, num_samples, timepoints=timepoints, method="dopri5",
                                            visual_options={"title": "3_base", "subset":["I_sol", "E_sol", "V_sol"]}, 
                                            time_unit="days")
#display(prior_samples)

schema = plots.trajectories(pd.DataFrame(prior_samples["data"]), subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}))
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




### Plot all other priors

In [5]:
schema = plots.trajectories(prior_samples["data"].drop(columns='V_sol'), subset=".*_sol",
                            #points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                                        )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




### Calibrate

In [6]:
post_samples = load_and_calibrate_and_sample_petri_model(
    SEIV,
    'ww_data3.csv',
    num_samples,
    num_iterations=100,
    timepoints=timepoints,
    verbose=True,
    noise_scale=1.,
    method="dopri5", time_unit="days")
post_samples['data']



iteration 0: loss = 4.4308144315128873e+18
iteration 25: loss = 4.4308144315128873e+18
iteration 50: loss = 4.4308144315128873e+18
iteration 75: loss = 4.4308144315128873e+18


Unnamed: 0,timepoint_id,sample_id,lambda_param,alpha_param,k_param,delta_param,beta_param,gamma_param,E_sol,I_sol,S_sol,V_sol,timepoint_days
0,0,0,0.221726,749.107117,0.333333,0.125,44900000.0,0.08,1000.000000,3.333334e-08,2300000.000,5.157364e-08,0.000000
1,1,0,0.221726,749.107117,0.333333,0.125,44900000.0,0.08,967.575867,3.258273e+01,2299999.500,5.079591e+10,0.100000
2,2,0,0.221726,749.107117,0.333333,0.125,44900000.0,0.08,936.908752,6.371604e+01,2299999.250,2.001540e+11,0.200000
3,3,0,0.221726,749.107117,0.333333,0.125,44900000.0,0.08,907.910706,9.347472e+01,2299997.000,4.437054e+11,0.300000
4,4,0,0.221726,749.107117,0.333333,0.125,44900000.0,0.08,880.497559,1.219296e+02,2299994.500,7.773078e+11,0.400000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8005,796,9,0.214840,755.039490,0.333333,0.125,44900000.0,0.08,29174.382812,5.444507e+04,2096851.875,3.007370e+16,79.599998
8006,797,9,0.214840,755.039490,0.333333,0.125,44900000.0,0.08,29327.089844,5.473770e+04,2095724.250,3.024396e+16,79.699997
8007,798,9,0.214840,755.039490,0.333333,0.125,44900000.0,0.08,29480.501953,5.503177e+04,2094590.625,3.041514e+16,79.799995
8008,799,9,0.214840,755.039490,0.333333,0.125,44900000.0,0.08,29634.617188,5.532727e+04,2093451.375,3.058724e+16,79.899994


In [7]:
schema = plots.trajectories(pd.DataFrame(post_samples["data"]), subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}))
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




In [8]:
schema = plots.trajectories(post_samples["data"].drop(columns=['V_sol']), subset=".*_sol",
                            #points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




## SEIVCDU model

In [9]:
# Add uncertainty
from mira.sources.askenet import model_from_url
SEIVCDU = model_from_url('https://raw.githubusercontent.com/ciemss/pyciemss/283-july-evaluation-scenario-3/notebook/july_evaluation/Scenario3/ES3_detection_log10V.json')
SEIVCDU.parameters['lambda'].value = 0.208 #9.06e-8
SEIVCDU.parameters['lambda'].distribution = Distribution(type="Uniform1", parameters={"minimum": 0.2, "maximum":0.28})
SEIVCDU.parameters['gamma'].value = 0.125
SEIVCDU.parameters['gamma'].distribution = Distribution(type='Uniform1', parameters={'minimum': 0.06, 'maximum': 0.09})
SEIVCDU.parameters['alpha'].distribution = Distribution(type='Uniform1', parameters={'minimum': 51, 'maximum': 796})

SEIVCDU.parameters['beta'].value = 44852600
SEIVCDU.parameters['k'].value = 0.5
SEIVCDU.parameters['k'].distribution = Distribution(type="Uniform1", parameters={"minimum": 0.25, "maximum":0.5})



In [10]:
from mira.modeling.askenet.petrinet import AskeNetPetriNetModel

### Plot prior viral load against measured viral load

In [11]:
num_samples = 10
start_time = 0
end_time = 80 #226 # between 10/02/2020 and 01/25/2021
num_timepoints = (end_time-start_time)*10 + 1
timepoints = list(get_tspan(start_time, end_time, num_timepoints).detach().numpy())

prior_samples = load_and_sample_petri_model( SEIVCDU,
    num_samples, timepoints=timepoints, method="dopri5",
                                            visual_options={"title": "3_base", "subset":["I_sol", "E_sol", "V_sol"]}, 
                                            time_unit="days")

# Plot results
schema = plots.trajectories(pd.DataFrame(prior_samples["data"]), subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}))
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




### Plot all other priors

In [12]:

schema = plots.trajectories(prior_samples["data"], subset=['S_sol','I_sol','C_sol', 'U_sol', 'D_sol'],
                           # points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




### Plot posterior viral load against measured viral load

In [13]:
post_samples = load_and_calibrate_and_sample_petri_model(
    SEIVCDU,
    'ww_data3.csv',
    num_samples,
    num_iterations=100,
    timepoints=timepoints,
    verbose=True,
    noise_scale=1.,
    method="euler", time_unit="days")

# Plot results
schema = plots.trajectories(pd.DataFrame(post_samples["data"]), subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}))
schema = plots.pad(schema, 5)
plots.ipy_display(schema)



iteration 0: loss = 2950.4457161426544
iteration 25: loss = 2937.73815369606
iteration 50: loss = 2931.5336575508118
iteration 75: loss = 2930.9372696876526



### Plot posteriors of other variables

In [14]:

schema = plots.trajectories(post_samples["data"], subset=['S_sol','I_sol','C_sol', 'U_sol', 'D_sol'],
                           # points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




## SEIRHDS_ww model

In [15]:
from mira.sources.askenet import model_from_json_file
c = {
    'V': Concept(name='V', units=person_units(), identifiers={'vido': '0001331'}),
    'I': Concept(name='I', units=person_units(), identifiers={'ido': '0000511'}),

}
SEIRHDS_ww = model_from_json_file("backburner_models/SEIRHDS_ww.json")
a_ww, b_ww, g_ww, I = sympy.symbols('a_ww b_ww g_ww I')
t4 = ControlledProduction(outcome=c['V'],
                          controller=c['I'],
                          rate_law=a_ww*b_ww*(1-g_ww)*I)
SEIRHDS_ww.templates.append(t4)
SEIRHDS_ww.observables = {}
SEIRHDS_ww.initials['V'] = Initial(concept=Concept(name='V'), value=5)
SEIRHDS_ww.parameters['g_ww'].distribution=Distribution(type='Uniform1', parameters={'minimum': 0.2, 'maximum': 0.6})
SEIRHDS_ww.parameters['beta'].distribution=Distribution(type='Uniform1', parameters={'minimum': 0.3, 'maximum': 0.55})

### Plot prior viral load against measured viral load

In [16]:
num_samples = 10
start_time = 0
end_time = 80 #226 # between 10/02/2020 and 01/25/2021
num_timepoints = (end_time-start_time)*10 + 1
timepoints = list(get_tspan(start_time, end_time, num_timepoints).detach().numpy())

prior_samples = load_and_sample_petri_model( SEIRHDS_ww,
    num_samples, timepoints=timepoints, method="dopri5",
                                            visual_options={"title": "3_base", "subset":["I_sol", "E_sol", "V_sol"]}, 
                                            time_unit="days")

# Plot results
schema = plots.trajectories(prior_samples["data"], subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}))
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




In [17]:
schema = plots.trajectories(prior_samples["data"].drop(columns=['V_sol']), subset=".*_sol",
                            #points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




### Plot posterior viral load against measured viral load

In [18]:
post_samples = load_and_calibrate_and_sample_petri_model(
    SEIRHDS_ww,
    'ww_data3.csv',
    num_samples,
    num_iterations=100,
    timepoints=timepoints,
    verbose=True,
    noise_scale=1.,
    method="euler", time_unit="days")

# Plot results
schema = plots.trajectories(post_samples["data"], subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}))
schema = plots.pad(schema, 5)
plots.ipy_display(schema)



iteration 0: loss = 2.380285263583647e+16
iteration 25: loss = 1.2782888898527268e+16
iteration 50: loss = inf
iteration 75: loss = inf



In [19]:
schema = plots.trajectories(post_samples["data"].drop(columns=['V_sol']), subset=".*_sol",
                            #points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




# Calibrate ensemble

In [38]:
num_samples = 2
model_paths = [SEIV, SEIVCDU, SEIRHDS_ww]
data_path = "../../notebook/ensemble_eval_sa/ww_data3.csv"
weights = [0.7,0.2, 0.1]
seiv= dict(S='S', I='I', E='E', V='V')
solution_mappings = [seiv,seiv, seiv]
prior_ensemble = load_and_sample_petri_ensemble(model_paths,
                                                weights,
                                                solution_mappings,
                                                num_samples,
                                                timepoints,
                                               time_unit="days")

### Plot ensemble priors

In [None]:
schema = plots.trajectories(prior_ensemble["data"], subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)                                    

In [29]:
schema = plots.trajectories(prior_ensemble["data"].drop(columns=['V_sol']), subset=".*_sol",
                            #points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"})
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)          




### Run the calibration and sampling

In [None]:
# Run the calibration and sampling
post_ensemble = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    num_iterations=26,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

### Save results and compare predicting and observed viral loads

In [35]:
# Save results
post_ensemble["data"].to_csv(
  "ensemble_results/calibrated_ensemble_trajectories.csv", index=False
 )
post_ensemble["quantiles"].to_csv(
   "ensemble_results/calibrated_ensemble_quantiles.csv", index=False
)

# Plot results
schema = plots.trajectories(post_ensemble["data"], subset="V_sol",
                            points=ww_data_train.reset_index(drop=True).rename(columns={"V":"V_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)




### Plot posteriors of other variables

In [37]:
schema = plots.trajectories(post_ensemble["data"].drop(columns=['V_sol']), subset=".*_sol",
                            # points=test_data.reset_index(drop=True).rename(columns={"V":"V_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)


