# Ensemble Evaluation: Timepoint 1

Location: New York State

Timepoint 1: April 3, 2020. Setting: New York State at the beginning of the pandemic when masking was the main preventative measure. No vaccines available. 

Using case and death data for calibration (hospitalization data not available for this timepoint). No vaccination, no variants, and reinfection is not considered.

## Set up for ensemble modeling

### Load dependencies
Import functionality from the pyciemss library to allow for model sampling and calibration.

In [1]:
import os
import pandas as pd
import numpy as np
from pyciemss.Ensemble.interfaces import (
    load_and_sample_petri_ensemble, load_and_calibrate_and_sample_ensemble_model
)
from pyciemss.PetriNetODE.interfaces import (
    load_and_sample_petri_model,
    load_and_calibrate_and_sample_petri_model,
    load_and_optimize_and_sample_petri_model,
    load_and_calibrate_and_optimize_and_sample_petri_model
)
from pyciemss.visuals import plots

### Collect relevant models
<!-- We have chosen x number of models to capture the relevant COVID-19 dynamics for this setting. 
 - `model1` contains compartments SEIRHD, and is stratified by age into four groups.
 - `model2` is the same as `model1`, but allows for reinfection
 - `model3` is the same as `model1`, but with a variation in transmission rate to account for masking efficacy and compliance. -->

In [2]:
# model1_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_base_model_ee.json"
model2_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_ee.json"
# model3_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_beta_c_varying_ee.json"
# model4_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_k_varying_ee.json"
# model5_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_age_stratified_v1.json"
model6_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_age_stratified_v2.json"
model7_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRHD_npi1_age_stratified_v3.json"
# model8_location = "../../notebook/ensemble_eval_sa/operative_models/SEIRD_ymo_age_strat.json"

### Gather data, and set training and forecast intervals
For this timepoint, only case and death data is available for calibration. 

We take the total population of New York State to be 19,340,000. Population age-structure estimates for New York State were taken (and scaled appropriately) from [here](https://www.health.ny.gov/statistics/vital_statistics/2016/table01.htm).

First recorded case in New York State: March 1, 2020.

Also relevant is that a statewide stay-at-home order for non-essential workers was implemented on March 22, 2020, and masking policy was implemented on April 15, 2020. A nice list of COVID-19 policy interventions for New York City and State has been compiled [here](https://www.investopedia.com/historical-timeline-of-covid-19-in-new-york-city-5071986![image.png](attachment:d380d245-11ca-41f1-9863-e2a2dcf5ebce.png)![image.png](attachment:b30f6e31-bc7f-43a8-b7c8-b1df78db3b86.png)![image.png](attachment:b5109588-59ed-447c-9271-50eb56fe9081.png)![image.png](attachment:2c42138a-696d-4cef-9b79-4843d38ff029.png)).

In [3]:
url = 'https://raw.githubusercontent.com/DARPA-ASKEM/experiments/main/thin-thread-examples/milestone_12month/evaluation/ensemble_eval_SA/datasets/aabb3684-a7ea-4f60-98f1-a8e673ad6df5/dataset.csv'
ny_data = pd.read_csv(url)

# Grab test data for four-week forecast (04/03/2020 - 05/01/2020)
test_data = ny_data[41:101].reset_index()
test_data = test_data.drop(columns="timestep")
test_data = test_data.drop(columns="index")

# Select historical data from 03/01/2020 up to Timepoint 1, 04/02/2020 (the first 73 rows)
# No hospitalization data at this point
ny_data = ny_data[41:72].reset_index(drop=True)
ny_data1 = ny_data.assign(timepoints=[float(i) for i in range(len(ny_data))])
ny_data = ny_data1[["timepoints", "I", "D"]]
ny_data[["I", "D"]].to_csv("NY_data1.csv")

# Set timepoints
start_timepoint = 0
stop_timepoint = len(ny_data) + 28 # simulate for four weeks after end of data
timepoints = [float(i) for i in range(stop_timepoint + 1)]

# Calibrate and sample an ensemble of one model

In [12]:
num_samples = 100
model_paths = [model2_location]
data_path = "../../notebook/ensemble_eval_sa/datasets/NY_data1.csv"
weights = [1]
solution_mappings = [{"I": "I", "D": "D"}]

# Run the calibration and sampling
result1 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=250,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result1["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_one_results.csv", index=False)
result1["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_one_quantiles.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result1["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 703.9979826509953
iteration 25: loss = 696.8435081541538
iteration 50: loss = 660.5078044235706
iteration 75: loss = 640.0858406126499
iteration 100: loss = 637.7395190298557
iteration 125: loss = 636.8753542006016
iteration 150: loss = 635.3679219782352
iteration 175: loss = 634.8731047213078
iteration 200: loss = 636.0222593843937
iteration 225: loss = 633.6642631590366



In [10]:
num_samples = 100
model_paths = [model6_location]
data_path = "../../notebook/ensemble_eval_sa/datasets/NY_data1.csv"
weights = [1]
solution_mappings = [{"I": "infected", "D": "dead"}]

# Run the calibration and sampling
result2 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=100,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result2["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_one_results.csv", index=False)
# result2["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_one_quantiles.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result2["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 715.3255967199802
iteration 25: loss = 675.6244948208332
iteration 50: loss = 638.4774799644947
iteration 75: loss = 642.9715199768543



In [None]:
num_samples = 100
model_paths = [model7_location]
data_path = "../../notebook/ensemble_eval_sa/datasets/NY_data1.csv"
weights = [1]
solution_mappings = [{"I": "infected", "D": "dead"}]

# Run the calibration and sampling
result3 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=50,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# # Save results
# result3["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_one_results.csv", index=False)
# result3["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_one_quantiles.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result3["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

# Calibrate and sample an ensemble of multiple models

In [9]:
num_samples = 100
model_paths = [model2_location, model6_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/datasets/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "D": "D"},
                     {"I": "infected", "D": "dead"}, 
                     {"I": "infected", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=300,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
result4["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_many_results.csv_day0_Mar012020", index=False)
result4["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_many_quantiles_day0_Mar012020.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

iteration 0: loss = 756.1536805331707
iteration 25: loss = 724.2042917907238
iteration 50: loss = 668.4412952363491
iteration 75: loss = 657.3987860381603
iteration 100: loss = 649.1626441180706
iteration 125: loss = 650.6445612609386
iteration 150: loss = 644.6379594504833
iteration 175: loss = 639.5945779979229
iteration 200: loss = 644.4095016419888
iteration 225: loss = 638.1881293952465
iteration 250: loss = 638.655171841383
iteration 275: loss = 637.6174668967724



In [None]:
num_samples = 100
model_paths = [model2_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "D": "D"},
                     {"I": "infected", "D": "dead"}
                     ]

# Run the calibration and sampling
result5 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
# result5["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_two_results.csv", index=False)
# result5["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_two_quantiles.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

In [None]:
num_samples = 100
model_paths = [model2_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "D": "D"},
                     {"I": "infected", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
# result4["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_many_results.csv", index=False)
# result4["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_many_quantiles.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)

In [None]:
num_samples = 100
model_paths = [model2_location, model6_location, model7_location]
data_path = "../../notebook/ensemble_eval_sa/NY_data1.csv"
weights = [1/len(model_paths) for i in model_paths]
solution_mappings = [{"I": "I", "D": "D"},
                     {"I": "infected", "D": "dead"}, 
                     {"I": "infected", "D": "dead"}
                     ]

# Run the calibration and sampling
result4 = load_and_calibrate_and_sample_ensemble_model(
    model_paths,
    data_path,
    weights,
    solution_mappings,
    num_samples,
    timepoints,
    verbose=True,
    total_population=19340000,
    num_iterations=500,
    time_unit="days",
    visual_options={"title": "Calibrated Ensemble", "subset":".*_sol"}
)

# Save results
# result4["data"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_many_results.csv", index=False)
# result4["quantiles"].to_csv("../../notebook/ensemble_eval_sa/ensemble_results/partI_ensemble_of_many_quantiles.csv", index=False)

# Plot results
schema = plots.trajectories(pd.DataFrame(result4["data"]), subset=".*_sol",
                            points=test_data.reset_index(drop=True).rename(columns={"I":"I_data", "H":"H_data", "D":"D_data"}),
                           )
schema = plots.pad(schema, 5)
plots.ipy_display(schema)