# Ensemble Evaluation: Timepoint 1

Location: New York State

Timepoint 1: April 3, 2020. Setting: New York State at the beginning of the pandemic when masking was the main preventative measure. No vaccines available.

For each timepoint, consider the following:
 - What is the most relevant data to use for model calibration?
 - What was our understanding of COVID-19 viral mechanisms at the time? For example, early in the pandemic, we didn't know if reinfection was a common occurance, or even possible.
 - What are the parameters related to contagiousness/transmissibility and severity of the dominant strain at the time?
 - What policies were in place for a stated location, and how can this information be incorporated into models?

For each timepoint:

1. (a) Take a single model, calibrate it using any historical data prior to the given date, and create a 4-week forecast for cases, hospitalizations, and deaths beginning on the given date. (b) Evaluate the forecast using the COVID-19 Forecasting Hub Error Metrics (WIS, MAE). The single model evaluation should be done in the same way as the ensemble.

2. Repeat (1), but with an ensemble of different models.

a. It is fine to calibrate each model independently and weight naively.

b. It would also be fine to calibrate the ensemble as a whole, assigning weights to the different component models, so that you minimize the error of the ensemble vs. historical data.

c. Use the calibration scores and error metrics computed by the CDC Forecasting Hub. As stated on their website:

“Periodically, we evaluate the accuracy and precision of the ensemble forecast and component models over recent and historical forecasting periods. Models forecasting incident hospitalizations at a national and state level are evaluated using adjusted relative weighted interval scores (WIS, a measure of distributional accuracy), and adjusted relative mean absolute error (MAE), and calibration scores. Scores are evaluated across weeks, locations, and targets. You can read a paper explaining these procedures in more detail, and look at the most recent monthly evaluation reports. The final report that includes case and death forecast evaluations is 2023-03-13.”

3. Produce the forecast outputs in the format specified by the CDC forecasting challenge, including the specified quantiles.

### Load dependencies

In [1]:
import os
import pandas as pd
import numpy as np
from pyciemss.Ensemble.interfaces import (
    load_and_sample_petri_ensemble, load_and_calibrate_and_sample_ensemble_model
)
from pyciemss.PetriNetODE.interfaces import (
    load_and_sample_petri_model,
    load_and_calibrate_and_sample_petri_model,
    load_and_optimize_and_sample_petri_model,
    load_and_calibrate_and_optimize_and_sample_petri_model
)
from pyciemss.visuals import plots

## Get data

In [20]:
url = 'https://raw.githubusercontent.com/DARPA-ASKEM/experiments/main/thin-thread-examples/milestone_12month/evaluation/ensemble_eval_SA/datasets/aabb3684-a7ea-4f60-98f1-a8e673ad6df5/dataset.csv'
ny_data = pd.read_csv(url)

# Grab test data for four-week forecast (04/03/2020 - 05/01/2020)
test_data = ny_data[0:101].drop(columns="timestep")

# Select historical data up to Timepoint 1, 04/02/2020 (the first 73 rows)
# No hospitalization data at this point
ny_data = ny_data[0:72]
ny_data[["I", "H", "D"]].to_csv("NY_data1.csv")

## Select relevant models

In [43]:
model1_location = "../../notebook/ensemble_eval_sa/SEIRHDS_basic_deterministic.json"
model2_location = "../../notebook/ensemble_eval_sa/SEIRHDS_basic_config.json"
model3_location = "../../notebook/ensemble_eval_sa/SIRHD_age_stratified.json"
model4_location = "../../notebook/ensemble_eval_sa/SIRHD_mask_V1.json"
model5_location = "../../notebook/ensemble_eval_sa/SIRHD_mask_V2.json"
model6_location = "../../notebook/ensemble_eval_sa/SIRHD_mask_V3.json"
model7_location = "../../notebook/ensemble_eval_sa/SIRHD_V1C.json"
model8_location = "../../notebook/ensemble_eval_sa/SEIRD_ymo_age_strat.json"

### Load and sample selected models

In [23]:
num_samples = 3
start_timepoint = 0
stop_timepoint = 73 + 28 # simulate for four weeks after end of data
timepoints = [float(i) for i in range(stop_timepoint + 1)]
result = load_and_sample_petri_model(model1_location, num_samples, timepoints=timepoints, time_unit="days", visual_options={"title": "SIR Model", "subset": ".*_sol"})
plots.ipy_display(result["visual"])




In [24]:
result = load_and_sample_petri_model(model4_location, num_samples, timepoints=timepoints, time_unit="days", visual_options={"title": "SIR Model", "subset": ".*_sol"})
plots.ipy_display(result["visual"])




In [25]:
result = load_and_sample_petri_model(model6_location, num_samples, timepoints=timepoints, time_unit="days", visual_options={"title": "SIR Model", "subset": ".*_sol"})
plots.ipy_display(result["visual"])




### Load and sample an ensemble of one

In [26]:
weights = [1]
num_samples = 100
solution_mappings = [{"I": "I", "H": "H", "D": "D"}]

# Run sampling
result = load_and_sample_petri_ensemble(
    [model1_location], weights, solution_mappings, num_samples, timepoints, 
    time_unit="days",
    visual_options={"subset":".*_sol"}
)
plots.ipy_display(result["visual"])




### Load and calibrate and sample an ensemble of one

In [47]:
num_samples = 100
model_paths = [model8_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1]
solution_mappings = [{"I": "infected", "D": "dead"}]

# Run the calibration and sampling
result = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result["data"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_sample_results.csv"), index=False
# )
# result["quantiles"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_quantile_results.csv"), index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 1622.6701895147562
iteration 25: loss = 1487.9706531912088
iteration 50: loss = 1478.1395074278116
iteration 75: loss = 1475.2962504774332
iteration 100: loss = 1473.7366079241037
iteration 125: loss = 1472.2048124223948
iteration 150: loss = 1473.3301531225443
iteration 175: loss = 1473.3863328844309
iteration 200: loss = 1479.7508193403482
iteration 225: loss = 1470.5316552072763
iteration 250: loss = 1472.099831238389
iteration 275: loss = 1478.1394332796335
iteration 300: loss = 1470.9737903028727
iteration 325: loss = 1483.0190180689096
iteration 350: loss = 1468.555995836854
iteration 375: loss = 1471.7175911813974
iteration 400: loss = 1470.2056640535593
iteration 425: loss = 1481.889570608735
iteration 450: loss = 1480.421251192689
iteration 475: loss = 1476.1133641153574



### Load and calibrate and sample an ensemble of models

In [48]:
num_samples = 2
model_paths = [model1_location, model4_location, model6_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/3, 1/3, 1/3]
solution_mappings = [{"I": "I", "H": "H", "D": "D"},
                     {"I": "I", "H": "H", "D": "D"}, 
                     {"I": "Cases", "H": "Hosp", "D": "Deaths"},
                     ]

# Run the calibration and sampling
result = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=200,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result["data"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_sample_results.csv"), index=False
# )
# result["quantiles"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_quantile_results.csv"), index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 4025.446513772011
iteration 25: loss = 3432.0786576271057
iteration 50: loss = 3128.979478120804
iteration 75: loss = 2693.819436073303
iteration 100: loss = 1558.9415887594223
iteration 125: loss = 1518.3038387298584
iteration 150: loss = 1519.7286143302917
iteration 175: loss = 1515.2364356517792



In [29]:
num_samples = 2
model_paths = [model1_location, model4_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/2, 1/2]
solution_mappings = [{"I": "I", "H": "H", "D": "D"},
                     {"I": "I", "H": "H", "D": "D"}, 
                     ]

# Run the calibration and sampling
result = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=10,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result["data"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_sample_results.csv"), index=False
# )
# result["quantiles"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_quantile_results.csv"), index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 2315.568779349327



In [30]:
num_samples = 2
model_paths = [model1_location, model4_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/2, 1/2]
solution_mappings = [{"I": "I", "H": "H", "D": "D"},
                     {"I": "I", "H": "H", "D": "D"}, 
                     ]

# Run the calibration and sampling
result = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=1.0,
    num_iterations=10,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result["data"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_sample_results.csv"), index=False
# )
# result["quantiles"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_quantile_results.csv"), index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 7158.102586686611



In [31]:
num_samples = 2
model_paths = [model1_location, model4_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/2, 1/2]
solution_mappings = [{"I": "I", "H": "H", "D": "D"},
                     {"I": "I", "H": "H", "D": "D"}, 
                     ]

# Run the calibration and sampling
result = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=100.0,
    num_iterations=10,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result["data"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_sample_results.csv"), index=False
# )
# result["quantiles"].to_csv(
#     os.path.join(DEMO_PATH, "results_petri_ensemble/calibrated_quantile_results.csv"), index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 6728.977160692215



In [37]:
# NYS population by age groups young (0 - 34), middle age (35 - 64), and old (65+)
total_pop = 19_340_000
data_total_pop = 19_745_289
young = 233692+232647+229937+232293+231488+1144642+1151632+724228+513152+1391899+1503695+1387078
middle = 1272526+1190035+1313743+1392386+1376334+1191373
old = 1009515+703525+509769+371300+438400

x = [young, middle, old]
y = [i/data_total_pop for i in x]
z = [round(i*total_pop) for i in y]
z

[8792135, 7577601, 2970264]

In [38]:
sum(z)

19340000