# Ensemble Evaluation: Timepoint 1

Location: New York State

Timepoint 1: April 3, 2020. Setting: New York State at the beginning of the pandemic when masking was the main preventative measure. No vaccines available. Reinfection not considered. No hospitalization data available through T1.

## Set up for ensemble modeling

### Load dependencies
Import functionality from the pyciemss library to allow for model sampling and calibration.

In [2]:
import os
import pandas as pd
import numpy as np
from pyciemss.Ensemble.interfaces import (
    load_and_sample_petri_ensemble, load_and_calibrate_and_sample_ensemble_model
)
from pyciemss.PetriNetODE.interfaces import (
    load_and_sample_petri_model,
    load_and_calibrate_and_sample_petri_model,
    load_and_optimize_and_sample_petri_model,
    load_and_calibrate_and_optimize_and_sample_petri_model
)
from pyciemss.visuals import plots

### Collect relevant models
<!-- We have chosen x number of models to capture the relevant COVID-19 dynamics for this setting. 
 - `model1` contains compartments SEIRHD, and is stratified by age into four groups.
 - `model2` is the same as `model1`, but allows for reinfection
 - `model3` is the same as `model1`, but with a variation in transmission rate to account for masking efficacy and compliance. -->

In [2]:
model1_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_base_model_ee.json"
model2_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_ee.json"
model3_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_beta_c_varying_ee.json"
model4_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_k_varying_ee.json"
model5_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_age_stratified_v1.json"
model6_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_age_stratified_v2.json"
model7_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_age_stratified_v3.json"
model8_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRD_ymo_age_strat.json"

### Gather data, and set training and forecast intervals
For this timepoint, only case and death data is available for calibration. 

We take the total population of New York State to be 19,340,000. Population age-structure estimates for New York State were taken (and scaled appropriately) from [here](https://www.health.ny.gov/statistics/vital_statistics/2016/table01.htm).

First recorded case in New York State: March 1, 2020.

Also relevant is that a statewide stay-at-home order for non-essential workers was implemented on March 22, 2020, and masking policy was implemented on April 15, 2020. A nice list of COVID-19 policy interventions for New York City and State has been compiled [here](https://www.investopedia.com/historical-timeline-of-covid-19-in-new-york-city-5071986![image.png](attachment:d380d245-11ca-41f1-9863-e2a2dcf5ebce.png)![image.png](attachment:b30f6e31-bc7f-43a8-b7c8-b1df78db3b86.png)![image.png](attachment:b5109588-59ed-447c-9271-50eb56fe9081.png)![image.png](attachment:2c42138a-696d-4cef-9b79-4843d38ff029.png)).

In [3]:
url = 'https://raw.githubusercontent.com/DARPA-ASKEM/experiments/main/thin-thread-examples/milestone_12month/evaluation/ensemble_eval_SA/datasets/aabb3684-a7ea-4f60-98f1-a8e673ad6df5/dataset.csv'
ny_data = pd.read_csv(url)

In [5]:
ny_data = ny_data[0:159]
ww_data3 = ny_data[["I"]].copy()
ww_data3 = ww_data3.cumsum()
ww_data3

Unnamed: 0,I
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
...,...
154,1954075.0
155,1957171.0
156,1960430.0
157,1963781.0


In [3]:
url = 'https://raw.githubusercontent.com/DARPA-ASKEM/experiments/main/thin-thread-examples/milestone_12month/evaluation/ensemble_eval_SA/datasets/aabb3684-a7ea-4f60-98f1-a8e673ad6df5/dataset.csv'
ny_data = pd.read_csv(url)

# Grab test data for four-week forecast (04/03/2020 - 05/01/2020)
test_data = ny_data[41:101].reset_index()
test_data = test_data.drop(columns="timestep")
test_data = test_data.drop(columns="index")

# Select historical data from 03/01/2020 up to Timepoint 1, 04/02/2020 (the first 73 rows)
# No hospitalization data at this point
ny_data = ny_data[41:72].reset_index(drop=True)
ny_data1 = ny_data.assign(timepoints=[float(i) for i in range(len(ny_data))])
ny_data = ny_data1[["timepoints", "I", "H", "D"]]
ny_data[["I", "H", "D"]].to_csv("NY_data1.csv")

# Set timepoints
start_timepoint = 0
stop_timepoint = len(ny_data) + 28 # simulate for four weeks after end of data
timepoints = [float(i) for i in range(stop_timepoint + 1)]

# Calibrate and sample an ensemble of one model

In [4]:
test_data

Unnamed: 0,I,H,D
0,1.0,,0.0
1,10.0,,0.0
2,21.0,,0.0
3,24.0,,0.0
4,76.0,,0.0
5,104.0,,0.0
6,128.0,,0.0
7,131.0,,0.0
8,192.0,,1.0
9,220.0,,1.0


In [5]:
num_samples = 100
model_paths = [model2_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1]
solution_mappings = [{"I": "I", "H": "H", "D": "D"}]
# solution_mappings = [{"I": "infected", "H": "hospitalized", "D": "dead"}]
# timepoints = timepoints[0:30]
# Run the calibration and sampling
result1 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=250,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result1["data"].to_csv(
    "partI_ensemble_of_one_results.csv", index=False
)
result1["quantiles"].to_csv(
    "partI_ensemble_of_one_quantiles.csv", index=False
)

# Plot results
schema = plots.trajectories(pd.DataFrame(result1["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 699.9535211622715
iteration 25: loss = 682.1455722153187
iteration 50: loss = 642.9254550039768
iteration 75: loss = 639.9548486769199
iteration 100: loss = 633.3055581152439
iteration 125: loss = 634.6444889605045
iteration 150: loss = 634.5578090250492
iteration 175: loss = 636.1768108904362
iteration 200: loss = 633.7074040472507
iteration 225: loss = 634.5738643705845



In [6]:
# Plot results
new_data = test_data[["I", "D"]]
display(new_data)
schema = plots.trajectories(pd.DataFrame(result1["data"]), subset=".*_sol",
                            points=new_data.rename(columns={"I":"I_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

Unnamed: 0,I,D
0,1.0,0.0
1,10.0,0.0
2,21.0,0.0
3,24.0,0.0
4,76.0,0.0
5,104.0,0.0
6,128.0,0.0
7,131.0,0.0
8,192.0,1.0
9,220.0,1.0





In [None]:
num_samples = 100
model_paths = [model6_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1]
# solution_mappings = [{"I": "I", "H": "H", "D": "D"}]
solution_mappings = [{"I": "infected", "H": "hospitalized", "D": "dead"}]

# Run the calibration and sampling
result2 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=100,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result2["data"].to_csv(
#     "partI_ensemble_of_one_results.csv", index=False
# )
# result2["quantiles"].to_csv(
#     "partI_ensemble_of_one_quantiles.csv", index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result2["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 716.8295200169086
iteration 25: loss = 689.5009007751942
iteration 50: loss = 652.5210690796375
iteration 75: loss = 639.1791975796223


In [167]:
num_samples = 100
model_paths = [model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1]
# solution_mappings = [{"I": "I", "H": "H", "D": "D"}]
solution_mappings = [{"I": "infected", "H": "hospitalized", "D": "dead"}]

# Run the calibration and sampling
result3 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=50,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result3["data"].to_csv(
#     "partI_ensemble_of_one_results.csv", index=False
# )
# result3["quantiles"].to_csv(
#     "partI_ensemble_of_one_quantiles.csv", index=False
# )

# Plot results
schema = plots.trajectories(pd.DataFrame(result3["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 711.2050774395466
iteration 25: loss = 678.063215047121



# Calibrate and sample an ensemble of multiple models

In [168]:
num_samples = 100
model_paths = [model2_location, model6_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "H": "H", "D": "D"},
                     {"I": "infected", "H": "hospitalized", "D": "dead"}, 
                     {"I": "infected", "H": "hospitalized", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result4["data"].to_csv(
    "partI_ensemble_of_many_results.csv", index=False
)
result4["quantiles"].to_csv(
    "partI_ensemble_of_many_quantiles.csv", index=False
)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 768.8682887256145
iteration 25: loss = 731.4769603908062
iteration 50: loss = 678.9883939921856
iteration 75: loss = 657.4421293437481
iteration 100: loss = 654.4777183234692
iteration 125: loss = 644.6237823665142
iteration 150: loss = 647.8274113833904
iteration 175: loss = 640.107507199049
iteration 200: loss = 642.5151107013226
iteration 225: loss = 644.8859026134014
iteration 250: loss = 647.4060680568218
iteration 275: loss = 642.2508830726147
iteration 300: loss = 635.978728145361
iteration 325: loss = 637.8785563409328
iteration 350: loss = 639.0462138354778
iteration 375: loss = 636.831598252058
iteration 400: loss = 637.7531786859035
iteration 425: loss = 639.2653596103191
iteration 450: loss = 635.7033906877041
iteration 475: loss = 635.7857587039471



In [169]:
num_samples = 100
model_paths = [model2_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "H": "H", "D": "D"},
                     {"I": "infected", "H": "hospitalized", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result4["data"].to_csv(
    "partI_ensemble_of_two_results.csv", index=False
)
result4["quantiles"].to_csv(
    "partI_ensemble_of_two_quantiles.csv", index=False
)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 732.6240921020508
iteration 25: loss = 707.3158521056175
iteration 50: loss = 660.7171763777733
iteration 75: loss = 651.2766153216362
iteration 100: loss = 637.714672267437
iteration 125: loss = 644.9947382807732
iteration 150: loss = 645.9765561819077
iteration 175: loss = 643.8260254263878
iteration 200: loss = 640.565835416317
iteration 225: loss = 635.7575448155403
iteration 250: loss = 637.0664694905281
iteration 275: loss = 638.3980574607849
iteration 300: loss = 637.4650293588638
iteration 325: loss = 636.3265872597694
iteration 350: loss = 637.4065206646919
iteration 375: loss = 635.2618921399117
iteration 400: loss = 637.2882766127586
iteration 425: loss = 637.6753106713295
iteration 450: loss = 635.5941507816315
iteration 475: loss = 638.0811821818352



In [184]:
num_samples = 100
model_paths = [model2_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "D": "D"},
                     {"I": "infected", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result4["data"].to_csv(
    "partI_ensemble_of_two_results.csv", index=False
)
result4["quantiles"].to_csv(
    "partI_ensemble_of_two_quantiles.csv", index=False
)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 731.6282063126564
iteration 25: loss = 712.2264596223831
iteration 50: loss = 643.2775917649269
iteration 75: loss = 643.4467373490334
iteration 100: loss = 643.8508707880974
iteration 125: loss = 638.0158843994141
iteration 150: loss = 639.4444034695625
iteration 175: loss = 641.6682159304619
iteration 200: loss = 636.3009222745895
iteration 225: loss = 638.3280690312386
iteration 250: loss = 638.1422500014305
iteration 275: loss = 639.6344323754311
iteration 300: loss = 636.7406968474388
iteration 325: loss = 638.3488594889641
iteration 350: loss = 638.5027226805687
iteration 375: loss = 637.2335525155067
iteration 400: loss = 639.5806286931038
iteration 425: loss = 635.3791239857674
iteration 450: loss = 633.2959784865379
iteration 475: loss = 641.0948089957237



In [185]:
num_samples = 100
model_paths = [model2_location, model6_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "D": "D"},
                     {"I": "infected", "D": "dead"}, 
                     {"I": "infected", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result4["data"].to_csv(
    "partI_ensemble_of_many_results.csv", index=False
)
result4["quantiles"].to_csv(
    "partI_ensemble_of_many_quantiles.csv", index=False
)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 765.0988219678402
iteration 25: loss = 727.8856188952923
iteration 50: loss = 668.121911495924
iteration 75: loss = 654.3243019282818
iteration 100: loss = 641.7556810081005
iteration 125: loss = 646.8317455947399
iteration 150: loss = 643.6816661059856
iteration 175: loss = 641.1758306920528
iteration 200: loss = 649.2690155208111
iteration 225: loss = 645.6392317712307
iteration 250: loss = 636.9966904819012
iteration 275: loss = 637.8782330453396
iteration 300: loss = 639.6409922540188
iteration 325: loss = 643.2566699683666
iteration 350: loss = 636.4147016704082
iteration 375: loss = 635.8384027183056
iteration 400: loss = 636.7175833880901
iteration 425: loss = 632.5201697051525
iteration 450: loss = 642.9727003276348
iteration 475: loss = 636.7774510085583

