# Patient calibration workflow using AutoEmulate

## Introduction


## The Nagavi model

<!-- <b>In this workflow we demonstrate the integration of a Cardiovascular simulator, Naghavi Model from ModularCirc in an end-to-end AutoEmulate workflow.</b>  -->

The Nagavi lumped parameter model is a mathematical model of the human cardiovascular system, designed to simulate the dynamics of blood flow and pressure throughout the heart and circulatory system using lumped parameter modeling. 
A **lumped parameter model** simplifies the cardiovascular system by dividing it into compartments (or "lumps") such as:

- Heart chambers (left and right atria and ventricles)
- Major blood vessels (aorta, vena cava, pulmonary arteries and veins)
- Systemic and pulmonary circulations

Each compartment is modeled using analogies to electrical circuits:

- Pressure ↔ Voltage
- Flow ↔ Current
- Resistance ↔ Vascular resistance (R\)
- Compliance ↔ Vessel elasticity or capacitance (C\)
- Inertance ↔ Blood inertia (L)

This approach allows simulation of the time-dependent relationships between pressure, volume, and flow rate across the entire cardiovascular system using ordinary differential equations (ODEs).

The Nagavi lumped parameter model is a mathematical model of the human cardiovascular system, designed to simulate the dynamics of blood flow and pressure throughout the heart and circulatory system using lumped parameter modeling. 
A **lumped parameter model** simplifies the cardiovascular system by dividing it into compartments (or "lumps") such as:

## Patient calibration workflow

In this tutorial, we present a three-stage workflow for calibrating the Nagavi model to patient-specific clinical data using AutoEmulate. The process has the following stages:

- First we perform a global sensitivity analysis, which identifies the most influential parameters affecting model outputs and reduces the dimensionality of the calibration problem. 
- Next, we apply history matching, a sequential uncertainty quantification technique that uses emulators to efficiently rule out implausible regions of the parameter space based on observed patient data. This results in a restricted, plausible region—known as the NROY (Not Ruled Out Yet) space—where parameters are consistent with the clinical measurements within acceptable uncertainty bounds. 
- Finally, we perform Bayesian inference within this NROY region to estimate the full posterior distribution of the remaining parameters, capturing the most likely values and their associated uncertainty. 

### Global sensitivity analysis

The Nagavi model has 16 parameters which makes individual patient calibration challenging. To address this we use a emulator-based global sensitivity analysis to quantify the influence each parameter on features derived from left ventricle artery pressure. This approach reduces the parameters that will be used in model personalization from 16 to 5.

In [None]:
import pandas as pd
import torch

from autoemulate.data.utils import set_random_seed
seed = 42
set_random_seed(seed)


#### Set up simulator and generate data

For this tutorial we use `ModularCirc` a package that providse a framework for building 0D models and simulating cardiovascular flow and mechanics. The `NaghaviSimulator` simulates pressure traces, we then choose to output summary statistics for each of the simulated traces.

In [None]:
from cardiac_simulator import NaghaviSimulator

simulator = NaghaviSimulator(
    output_variables=['lv.P'],  # We simulate the left ventricle pressure
    n_cycles=300, 
    dt=0.001,
)

The simulator comes with predefined input parameters ranges. 

In [None]:
simulator.parameters_range

We can sample from those using Latin Hypercube Sampling to generate data to train the emulator with.

In [None]:
N_samples = 1024
x = simulator.sample_inputs(N_samples,random_seed=42)

We can now use the simulator to generate predictions for the sampled parameters. Alternatively, for convenience. we can load already simulated data.

In [None]:
import os
save = True

if not os.path.exists(f'simulator_results_{N_samples}.csv'):
    # Run batch simulations with the samples generated in Cell 1
    y, x = simulator.forward_batch_skip_failures(x)
    
    # Convert results to DataFrame for analysis
    results_df = pd.DataFrame(y)
    inputs_df = pd.DataFrame(x)
    
    if save:
        # Save the results to a CSV file
        results_df.to_csv(f'simulator_results_{N_samples}.csv', index=False)
        inputs_df.to_csv(f'simulator_inputs_{N_samples}.csv', index=False)

else:
    # Read the results from the CSV file
    results_df = pd.read_csv(f'simulator_results_{N_samples}.csv')
    inputs_df = pd.read_csv(f'simulator_inputs_{N_samples}.csv')

    y = torch.tensor(results_df.to_numpy())
    x = torch.tensor(inputs_df.to_numpy())

These are the output summary variables we've simulated.

In [None]:
simulator.output_names

#### Train emulator with AutoEmulate
 
To perform sensitivity analysis efficiently, we first need to construct an emulator—a fast, surrogate model that approximates the output of the full simulator. The simulated inputs and outputs from the cell above are  used to train the emulator, in this case we choose to use neural networks.

In [None]:
from autoemulate.core.compare import AutoEmulate

from autoemulate.emulators.nn.mlp import MLP

ae = AutoEmulate(
    x, 
    y, 
    models=[MLP],  
    model_tuning=True
)

In [None]:
ae.summarise()

Extract the best performing emulator.

In [None]:
model = ae.best_result().model

#### Run Sensitivity Analysis 

The emulator trained above can predict model outputs rapidly across the entire parameter space, allowing us to estimate global sensitivity measures like Sobol’ indices or Morris elementary effects without repeatedly calling the full simulator. This approach enables scalable and accurate sensitivity analysis, especially in high-dimensional or computationally intensive settings.

Here we use AutoEmulate to perform sensitivity analysis. 

In [None]:
from autoemulate.core.sensitivity_analysis import SensitivityAnalysis

# Define the problem dictionary for Sobol sensitivity analysis
problem = {
    'num_vars': simulator.in_dim,
    'names': simulator.param_names,
    'bounds': simulator.param_bounds
}

si = SensitivityAnalysis(model, problem=problem)

In [None]:
si_df = si.run(method='sobol')


In [None]:
si.plot_sobol(si_df)

In [None]:
si.plot_sa_heatmap(si_df, index='ST', cmap='coolwarm', normalize=True, figsize=(10, 6))

We can select the top 5 parameters that have the biggest influcence on the pressure wave summary statistics extracted from the Nagavi Model.

In [None]:
top_parameters_sa = si.top_n_sobol_params(si_df,top_n=3)
top_parameters_sa

The parameters that are found to be less influential are fixed to a mid point value within its range.

In [None]:
updated_range = {}
for param_name, (min_val, max_val) in simulator.parameters_range.items():
    if param_name not in top_parameters_sa:
        print(f"Fixing parameter {param_name} to a value within its range ({min_val}, {max_val})")
        midpoint_value = (max_val + min_val) / 2.0
        updated_range[param_name] = (midpoint_value,midpoint_value)
    else:
        updated_range[param_name] = simulator.parameters_range[param_name]# Fix to a value
        

In [None]:
print("Updated parameters range with fixed values for non-sensitive parameters:")
print(updated_range)
simulator.parameters_range = updated_range

### Patient level calibration

To refine our emulator, we need real-world observations to compare against. These observations can come from experiments reported in the literature. 

In this example, we'll generate synthetic "observations" by running the simulator at the midpoint of each parameter range, treating these as our "ground truth" values for calibration. Note that in a real world example one can have multiple observations.

In [None]:
# Calculate midpoint parameters
midpoint_params_patient = []
patient_true_values = {}
for param_name in simulator.parameters_range:
    # Calculate the midpoint of the parameter range
    min_val, max_val = simulator.parameters_range[param_name]
    midpoint_params_patient.append((max_val + min_val) / 2.0)
    patient_true_values[param_name] = midpoint_params_patient[-1]

# Run the simulator with midpoint parameters
midpoint_results = simulator.forward(torch.tensor(midpoint_params_patient).reshape(1, -1))

In [None]:
# Create observations dictionary
observations = {
    name: (val.item(), max(abs(val.item()) * 0.01, 0.01)) for
    name, val in 
    zip(simulator.output_names, midpoint_results[0])}
observations


### History Matching

Once the influential parameters have been selected with sensitivity analysis, we want to find which values of those parameters are consistent with the clinical data for a specific patient. Rather than directly estimating the parameters, history matching first focuses on excluding regions of the parameter space that are not plausible.

AutoEmulate has the history matching workflow where we use the simulator and a fast emulator to generate model predictions for many parameter combinations.

For each simulation, 

- Compare the model output f(θ) to the observed data $y_{obs}$.  
- Compute an implausibility measure for each parameter set: $I_i(\overline{x_0}) = \frac{|z_i - \mathbb{E}(f_i(\overline{x_0}))|}{\sqrt{\text{Var}[z_i - \mathbb{E}(f_i(\overline{x_0}))]}}$
- Rule out all θ such that I(θ)>threshold (e.g., 3).

Repeat this in waves:

- After each wave, retrain the emulator on the non-implausible region (NROY).
- Stop when the NROY region changes little between waves (e.g., <10% of new points are excluded).


We now need to train a Gaussian Process emulator as we need uncertainty quantification for History Matching. Let's start generated a new dataset only sampling the most sensitive parameters and use this to train the GP emulator.

In [None]:
x = simulator.sample_inputs(N_samples,random_seed=seed)
y, x = simulator.forward_batch_skip_failures(x)


In [None]:
x

In [None]:
from autoemulate.emulators.gaussian_process.kernel import matern_3_2_kernel


ae_hm = AutoEmulate(
    x, 
    y, 
    models=["GaussianProcess"],  
    model_tuning=False,
    model_params = {
        'covar_module': matern_3_2_kernel,
        'standardize_x': True,
        'standardize_y': True
        
    }
)

res = ae_hm.best_result()
gp_matern = res.model

Create a HistoryMatchingWorkflow object.

In [None]:
from autoemulate.calibration.history_matching import HistoryMatchingWorkflow

hmw = HistoryMatchingWorkflow(
    simulator=simulator,
    result=res,
    observations=observations,
    threshold=3.0,
    train_x=x.float(),
    train_y=y.float()
)

Run waves.

In [None]:
# Save the results
history_matching_results = hmw.run_waves(
    n_waves=6, 
    n_simulations=N_samples, 
    n_test_samples=2000,
    max_retries=1000,
    # only refit the emulator on the latest simulated data from the most recent wave
    refit_on_all_data=True,
    refit_emulator_on_last_wave= True,
)

In [None]:
import pandas as pd
import numpy as np

all_df = []
sa_parameter_idx = [simulator.get_parameter_idx(param) for param in top_parameters_sa]


for wave_idx, (test_parameters, impl_scores) in enumerate(history_matching_results):
    test_parameters_plausible = hmw.get_nroy(impl_scores,test_parameters) 
    impl_scores_plausible = hmw.get_nroy(impl_scores,impl_scores)
    
    # Create DataFrame
    df = pd.DataFrame(
        test_parameters_plausible[:, sa_parameter_idx], 
        columns=top_parameters_sa
    )
    df["Implausibility"] = impl_scores_plausible.mean(axis=1)
    df["Wave"] = wave_idx

    all_df.append(df)

# Concatenate all waves into a single DataFrame
result_df = pd.concat(all_df, ignore_index=True)


This figure shows the implausibility scores for each parameter combination, allowing us to visualize which regions of the parameter space are plausible (i.e., not ruled out) based on the observed data. The NROY region is highlighted, showing the parameters that remain after history matching.

In [None]:
import pandas as pd
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import warnings

warnings.filterwarnings("ignore", category=FutureWarning, module="seaborn")


# Filter if needed
wave = 0

df = result_df[result_df['Wave'] == wave]


g = sns.PairGrid(df, vars=top_parameters_sa, corner=True)

# Normalization + colormap for continuous values
norm = Normalize(vmin=df["Implausibility"].min(), vmax=df["Implausibility"].max())
cmap = plt.cm.viridis

def scatter_continuous(x, y, color=None, **kwargs):
    ax = plt.gca()
    sc = ax.scatter(
        x, y,
        c=df.loc[x.index, "Implausibility"],
        cmap=cmap,
        norm=norm,
        s=15,
        alpha=0.7
    )

g.map_lower(scatter_continuous)
g.map_diag(sns.histplot, kde=False, color="gray")
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array([])

plt.colorbar(sm, ax=plt.gcf().axes, shrink=0.7, label="Implausibility")
plt.show()

In [None]:
import pandas as pd
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import warnings

warnings.filterwarnings("ignore", category=FutureWarning, module="seaborn")


# Filter if needed
wave = 5

df = result_df[result_df['Wave'] == wave]


g = sns.PairGrid(df, vars=top_parameters_sa, corner=True)

# Normalization + colormap for continuous values
norm = Normalize(vmin=df["Implausibility"].min(), vmax=df["Implausibility"].max())
cmap = plt.cm.viridis

def scatter_continuous(x, y, color=None, **kwargs):
    ax = plt.gca()
    sc = ax.scatter(
        x, y,
        c=df.loc[x.index, "Implausibility"],
        cmap=cmap,
        norm=norm,
        s=15,
        alpha=0.7
    )

g.map_lower(scatter_continuous)
g.map_diag(sns.histplot, kde=False, color="gray")
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array([])

plt.colorbar(sm, ax=plt.gcf().axes, shrink=0.7, label="Implausibility")
plt.show()

In [None]:
#Looking a evolution of distribution of NROY as function of the waves

for param in top_parameters_sa:
    plt.figure(figsize=(8, 5))
    sns.boxplot(data=result_df, x="Wave", y=param)
    plt.title(f"Distribution of {param} by Wave")
    plt.xlabel("Wave")
    plt.ylabel(param)
    plt.tight_layout()
    plt.show()


In [None]:
# get the last wave results
test_parameters, impl_scores = hmw.wave_results[-1]
nroy_points = hmw.get_nroy(impl_scores,test_parameters) # Implausibility < 3.0
# Get exact min/max bounds for the parameters from the NROY points
params_post_hm = hmw.generate_param_bounds(nroy_x=nroy_points,param_names=simulator.param_names, buffer_ratio=0.0)

In [None]:
for param_name, bounds in params_post_hm.items():
    
    print (f"Pre HM parameter bounds for {param_name}: {simulator.parameters_range[param_name]}")
    print (f"Post HM parameter bounds for {param_name}: {bounds}")
    

### Bayesian calibration
With the reduced and plausible parameter space from history matching, we now perform Bayesian inference to estimate the posterior distribution of parameters given patient data. We apply the following steps:

- Define a prior over parameters using the NROY region from history matching.

- Define a likelihood function that compares model predictions to patient data, including observation and model error.

- Use a Bayesian method (MCMC) to sample from the posterior.



In [None]:
from autoemulate.calibration.bayes import BayesianCalibration

model_post_hm = hmw.emulator  # Use the emulator from history matching

bc = BayesianCalibration(
    emulator=model_post_hm,
    parameter_range=params_post_hm,
    observations = {k: torch.tensor(v[0]) for k,v in observations.items()},
    observation_noise={k: v[1] for k,v in observations.items()},
    calibration_params = top_parameters_sa
)

mcmc = bc.run_mcmc(warmup_steps=250, num_samples=1000, sampler='nuts')


In [None]:
mcmc.summary()


We can check if the posterior samples are consistent with the true values of the parameters.

In [None]:
print(patient_true_values)


In [None]:
import arviz as az
import matplotlib.pyplot as plt
idata = bc.to_arviz(mcmc)

# add patient observations as a ref_val: list of floats in the order of top_parameters_sa
# {param: float(val) for (param, val) in patient_true_values.items() if param in top_parameters_sa}
ref_val = [float(patient_true_values[param]) for param in top_parameters_sa]

az.plot_posterior(
    idata, 
    var_names=top_parameters_sa, 
    kind='hist', 
    figsize=(10, 6), 
    ref_val=ref_val
)
plt.tight_layout()
plt.show()
