# This is a notebook for synthesizing data to test calibration

In order to check that `calibrate` is returning a result that makes sense, we are going to:  

1. `sample` a model
2. use that output to generate synthetic data
3. then calibrate the model to that synthetic dataset
4. sanity check that the parameters/results are reasonable compared to the parameters used to create the synthetic data

See [this issue](https://github.com/ciemss/pyciemss/issues/448).

### Load dependencies

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyciemss
from pyciemss.interfaces import calibrate

### Collect model and data paths

In [38]:
MODEL_PATH = "https://raw.githubusercontent.com/DARPA-ASKEM/simulation-integration/main/data/models/"
DATA_PATH = "../../docs/source/"

# Models
petri1 = os.path.join(MODEL_PATH, "SEIRHD_with_reinfection01_petrinet.json")
regnet1 = os.path.join(MODEL_PATH, "LV_rabbits_wolves_model02_regnet.json")
stock1 = os.path.join(MODEL_PATH, "SIR_stockflow.json")
stock2 = os.path.join(MODEL_PATH, "SEIRHDS_stockflow.json")

### Set parameters for sampling

In [20]:
start_time = 0.0
end_time = 150.0
logging_step_size = 10.0

### Define functions for generating synthetic data

In [64]:
# Function to add Gaussian noise to `sample` results
def add_gaussian_noise(data: pd.DataFrame, std_dev: float, col_state_map: dict) -> pd.DataFrame:
    noise = np.random.normal(0, std_dev, size=data.shape)
    noisy_data = data + noise
    noisy_data.insert(0, 'Timestamp', noisy_data.index.astype(float))
    col_state_map = {'Timestamp': 'Timestamp', **col_state_map}
    noisy_data = noisy_data.rename(columns=col_state_map)
    return noisy_data

# Function to sample from a model and generate synthetic data
def synthetic_data(model, col_state_map, end_time, logging_step_size, noise_level):
    num_samples = 1
    result = pyciemss.sample(model, end_time, logging_step_size, num_samples)
    data_df = result["data"][list(col_state_map.keys())]
    noisy_data = add_gaussian_noise(data_df, noise_level, col_state_map)
    petri_noisy_data.to_csv('noisy_data.csv', index=False)
    return petri_noisy_data

## (1) Create synthetic data from a given model

In [68]:
col_state_map = {'I_state_state': 'Cases', 'H_state_state': 'Hosp', 'D_state_state': 'Deaths'}
noise_level = 0.0
synthetic_data(petri1, col_state_map, end_time, logging_step_size, noise_level)

Unnamed: 0,Timestamp,Cases,Hosp,Deaths
0,0.0,34.715595,1.218024,0.051347
1,1.0,44.880062,1.869008,0.178859
2,2.0,57.4282,2.43583,0.353242
3,3.0,73.474136,3.122504,0.577923
4,4.0,94.002594,3.995762,0.865594
5,5.0,120.266144,5.11226,1.23367
6,6.0,153.865646,6.540554,1.704584
7,7.0,196.849976,8.367775,2.30706
8,8.0,251.838593,10.705339,3.077846
9,9.0,322.181671,13.695654,4.063943


## (2) Calibrate the model to the synthetic data

In [70]:
data_mapping = {"Cases": "I", "Hosp": "H", "Deaths": "D"} # data_mapping = "column_name": "observable/state_variable"
num_iterations = 100
dataset = DATA_PATH + "noisy_data.csv"

calibrated_results = calibrate(petri1, dataset, data_mapping=data_mapping, num_iterations=num_iterations)
parameter_estimates = calibrated_results["inferred_parameters"]
calibrated_results

ERROR:root:
                ###############################

                There was an exception in pyciemss

                Error occured in function: calibrate

                Function docs : 
    Infer parameters for a DynamicalSystem model conditional on data.
    This uses variational inference with a mean-field variational family to infer the parameters of the model.

    Args:
        - model_path_or_json: Union[str, Dict]
            - A path to a AMR model file or JSON containing a model in AMR form.
        - data_path: str
            - A path to the data file.
        - data_mapping: Dict[str, str]
            - A mapping from column names in the data file to state variable names in the model.
                - keys: str name of column in dataset
                - values: str name of state/observable in model
            - If not provided, we will assume that the column names in the data file match the state variable names.
        - noise_model: str
            - The 

ValueError: Expected parameter scale (Tensor of shape (14,)) of distribution Normal(loc: torch.Size([14]), scale: torch.Size([14])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([0.0000, 0.0334, 0.0719, 0.1091, 0.1422, 0.1702, 0.1930, 0.2111, 0.2250,
        0.2352, 0.2425, 0.2473, 0.2500, 0.2511], grad_fn=<MulBackward0>)
             Trace Shapes:  
              Param Sites:  
             Sample Sites:  
      persistent_beta dist |
                     value |
     persistent_gamma dist |
                     value |
      persistent_hosp dist |
                     value |
persistent_death_hosp dist |
                     value |
        persistent_I0 dist |
                     value |

In [None]:
parameter_estimates()

In [None]:
calibrated_sample_results = pyciemss.sample(model1, end_time, logging_step_size, num_samples, 
                start_time=start_time, inferred_parameters=parameter_estimates)
calibrated_sample_results

In [None]:
# Sanity check: compare calibrated parameters to original