Skip to content

sequential parameter space search method based on global sensitivity analysis

Notifications You must be signed in to change notification settings

MG-Choi/sequentPSS

Repository files navigation

sequentPSS

sequential parameter space search method based on sensitivity analysis

License

sequentPSS / version 0.1.3

  • install:
!pip install sequentPSS

Usage (using sample simulation in library)

1. Preprocessing

The SPS algorithm consists of preprocessing and sequential calibration stages, with validation being optional. In this study, k number of parameters are denoted as X , while d number of outcomes are denoted as Y . The mathematical representation is:

$$ X = {X_1, X_2, \cdots X_i, \cdots, X_k} \in \mathbb{R}^k $$

$$ Y = {Y_1, Y_2, \cdots Y_j, \cdots, Y_d} \in \mathbb{R}^d $$

Each parameter ( X ) takes a parameter value ( x ) in the parameter space.

1.1 set parameter and hyperparameter

import sequentPSS as sqp

# set parameter spaces
x1_list = [1,2,3,4,5]
x2_list = [1,2,3,4,5]
x3_list = [1,2,3,4,5]

# set hyper parameters
M = 150
k = 3

# ---  run simulations for M(2k+2) times with random parameter values---
multi_simul_df = sqp.multiple_simple_simulation(x1_list, x2_list, x3_list, M, k) 

multi_simul_df.head()

df result of simulation

Here's the DataFrame representing the simulation results with three parameters (x1, x2, x3) and three simulation outcomes (y1, y2, y3)

1.2 determining rmse_sel for calibration

Algorithm 1. Preprocessing (1): Determining a Criterion for Calibration

Preprocessing(1): determining a criterion for calibration

In the preprocessing step, the criterion for calibration, RMSEsel, is determined as illustrated in Algorithm 1. During process (1), a parameter value x is randomly selected for each Xi based on a uniform distribution. These values are then combined to compute RMSEtem in each iteration. This procedure continues until reaching M(2k+2) iterations, as outlined in equation 1.
RMSE, a widely-used metric for model calibration, is employed here to assess the discrepancy between simulated outcomes and observed data. The threshold RMSEsel is set for each Yj as the upper limit RMSE from any parameter combination. Users can adjust the leniency index μ to control the calibration rigor. For instance, with a μ value of 0.1, the lower 10% of all RMSE values become the RMSEsel criteria. Setting μ too low might lead to overfitting, while a higher value can introduce undue uncertainty.
# --- preprocessing 1: determining a criterion for calibration

O_list = [sqp.O1, sqp.O2, sqp.O3] # observed data to list -
u = 0.1
rmse_sel_df, multi_simul_df_rmse_sel = sqp.prep1_criterion(O_list, multi_simul_df, u, k)

# now, we have the rmse_sel for all O (observed data O1, O2, O3 corresponding to y1, y2, y3).
rmse_sel_df

rmse_sel_df

1.3 sorting Y and X for calibration

Algorithm 2. preprocessing (2): sorting X and Y

preprocessing (2): sorting X and Y

Algorithm 2 details the procedure for ordering j and i before calibration. Utilizing simulations from Algorithm 1, data generated from X to Y are used in processes (2) and (3).
In process (2), c(Yj) showcases the proportion of instances where RMSEtem is smaller than RMSEsel compared to all cases n. A prominent c(Yj) indicates a broad parameter space apt for calibration. Therefore, j is arranged in descending order of c(Yj) for subsequent calibration phases.
In process (3), the first-order sensitivity index of each Xi in relation to Yj (denoted as Sji) is organized in descending order. This index gauges the extent to which Xi uniquely impacts Yj. If Xi has a significantly low sensitivity index, it might not notably influence the outcome variance and can be skipped from calibration. Calibration begins by focusing on the most critical parameters based on their sensitivity indices.
# --- preprocessing 2: sorting Y for calibration

y_seq_df = sqp.sorting_Y(multi_simul_df_rmse_sel)
y_seq_df

y_seq_df

# --- preprocessing 3: sorting X based on sensitivity analysis for calibration
problem = {
    'num_vars': 3,
    'names': ['x1', 'x2', 'x3'],
    'bounds': [[1, 5],
               [1, 5],
               [1, 5]]
}

x_seq_df = sqp.sorting_X(problem, multi_simul_df_rmse_sel, SA = 'RBD-FAST') 
x_seq_df

x_seq_df

Now we have rmse_sel, sorted y and sorted x, we can run sequential calibration.

2. sequential calibration

Algorithm 3. parameter space searching and calibration for each yj

parameter space searching and calibration for each Yj

Figure 1. parameter space search and calibration process yj

parameter space search and calibration process

Algorithm 3 details sequential calibration using the ordered Y and X values from Algorithm 2. In process (4), γ stores selected parameter combinations for each X. Initially, a fixed parameter v from X1 is chosen with random selections from other X sets excluding X1. If the RMSE of the γ combination is under the RMSEsel threshold, γ is added to C with its RMSE value recorded in R. This iterates M times.
Subsequently, parameter space of X1 is reduced by eliminating v if many combinations involving v don't meet the RMSEsel threshold. If v occurrences in C is below τ * M, v is removed from X1. τ, a user-defined tolerance index, dictates the parameter space reduction intensity. High τ values can lead to significant parameter space reduction and stringent calibration, while low values keep more of v in C, making space reduction inefficient. This process repeats for X2 using the already reduced X1 space. Ultimately, we get the condensed parameter space for the X sets, displayed in Figure 1. If additional Y outcomes exist, the loop reinitiates with the shrunk X parameter spaces.
Equation 1. uncertainty of the calibrated parameter combination yj

uncertainty of the calibrated parameter combination

After completing sequential calibration for all Y, the refined parameter combinations are the final output. The Uncertainty index U aids in identifying the optimal parameter set, with lower indices indicating higher trustworthiness. Based on Equation 4, R encompasses all RMSE outcomes, while C captures only those RMSEs falling beneath RMSEsel. The ratio of C to R illustrates the reliability of each parameter set. Deducting this ratio from 1 produces the uncertainty measure U. For instance, if a parameter set yields an acceptable RMSE in 7 out of 10 instances, there's a 70% confidence in that result, yielding an uncertainty of 0.3 for that combo.

2.1 round 1: calibrate parameters with y1

# -- now we need to run sequential calibration with the previous sequence of y and x (y1 -> y3 -> y2 / x3 -> x2 -> x1) --
# First round of y1: fix x3
x1_list = [1,2,3,4,5]
x2_list = [1,2,3,4,5]
x3_list = [1,2,3,4,5]

fix_x3_y1_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x3', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x3_list, result_df = sqp.seqCalibration(fix_x = 'x3', fix_y = 'y1', rmse_sel = 401.295316, simul_result_df = fix_x3_y1_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x3 parameter space:', x3_list)
reliability of 'x3' for 'y1' (1 - uncertainty degree):  {3: 0.59, 4: 0.91, 5: 1.0}
updated x3 parameter space: [3, 4, 5]
Sequential calibration is conducted in the order of sorted y and x values. The first step involves fixing x3 (and calibrate with y1). The RMSE_sel value corresponding to y1 and its matching O1 values are used (401.295316), along with the tolerance index (t).

# Second round of y1: fix x2

fix_x2_y1_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x2', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x2_list, result_df = sqp.seqCalibration(fix_x = 'x2', fix_y = 'y1', rmse_sel = 401.295316, simul_result_df = fix_x2_y1_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x2 parameter space:', x2_list)
reliability of 'x2' for 'y1' (1 - uncertainty degree):  {1: 0.93, 2: 0.88, 3: 0.79, 4: 0.79, 5: 0.58}
updated x2 parameter space: [1, 2, 3, 4, 5]

# Third round of y1: fix x1

fix_x1_y1_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x1', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x1_list, result_df = sqp.seqCalibration(fix_x = 'x1', fix_y = 'y1', rmse_sel = 401.295316, simul_result_df = fix_x1_y1_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x1 parameter space:', x1_list)
reliability of 'x1' for 'y1' (1 - uncertainty degree):  {1: 0.726, 2: 0.869, 4: 0.909, 3: 0.85, 5: 0.729}
updated x1 parameter space: [1, 2, 3, 4, 5]

2.2 round 2: calibrate parameters with y3

# First round of y3: fix x3

fix_x3_y3_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x3', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x3_list, result_df = sqp.seqCalibration(fix_x = 'x3', fix_y = 'y3', rmse_sel = 3.176924, simul_result_df = fix_x3_y3_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x3 parameter space:', x3_list)
reliability of 'x3' for 'y3' (1 - uncertainty degree):  {4: 0.41, 5: 0.62}
updated x3 parameter space: [4, 5]

# second round of y3: fix x2

fix_x2_y3_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x2', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x2_list, result_df = sqp.seqCalibration(fix_x = 'x2', fix_y = 'y3', rmse_sel = 3.176924, simul_result_df = fix_x2_y3_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x2 parameter space:', x2_list)
reliability of 'x2' for 'y3' (1 - uncertainty degree):  {1: 0.689, 5: 0.531, 4: 0.515, 3: 0.657, 2: 0.478}
updated x2 parameter space: [1, 2, 3, 4, 5]

# second round of y3: fix x1

fix_x1_y3_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x1', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x1_list, result_df = sqp.seqCalibration(fix_x = 'x2', fix_y = 'y3', rmse_sel = 3.176924, simul_result_df = fix_x1_y3_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x1 parameter space:', x1_list)
reliability of 'x2' for 'y3' (1 - uncertainty degree):  {1: 0.67, 2: 0.43, 3: 0.68, 4: 0.65, 5: 0.56}
updated x1 parameter space: [1, 2, 3, 4, 5]

2.3 round 3: calibrate parameters with y2

# First round of y2: fix x3

fix_x3_y2_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x3', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x3_list, result_df = sqp.seqCalibration(fix_x = 'x3', fix_y = 'y2', rmse_sel = 50.487752, simul_result_df = fix_x3_y2_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x3 parameter space:', x3_list)
reliability of 'x3' for 'y2' (1 - uncertainty degree):  {5: 0.678, 4: 0.429}
updated x3 parameter space: [4, 5]

# second round of y2: fix x2

fix_x2_y2_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x2', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x2_list, result_df = sqp.seqCalibration(fix_x = 'x2', fix_y = 'y2', rmse_sel = 50.487752, simul_result_df = fix_x2_y2_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x2 parameter space:', x2_list)
reliability of 'x2' for 'y2' (1 - uncertainty degree):  {3: 0.25, 1: 0.421, 4: 0.396, 2: 0.333}
updated x2 parameter space: [1, 2, 3, 4]

# second round of y2: fix x1

fix_x1_y2_simul_result_df = sqp.fix_param_simple_simulation(x1_list, x2_list, x3_list, fix_x = 'x1', M = 100) # fix x3: fix each x3 value one by one and run 100 times of simulation

x1_list, result_df = sqp.seqCalibration(fix_x = 'x1', fix_y = 'y2', rmse_sel = 50.487752, simul_result_df = fix_x1_y2_simul_result_df,  O_list = O_list, t = 0.2, df_return = True)

print('updated x1 parameter space:', x1_list)
reliability of 'x1' for 'y2' (1 - uncertainty degree):  {3: 0.443, 1: 0.34, 2: 0.634, 4: 0.541}
updated x1 parameter space: [1, 2, 3, 4]

The calibration results are as follows:

  1. Calibration based on y1 in round 1 led to the following outcomes:
  • x1: [1,2,3,4,5] -> [3,4,5]
  • x2: [1,2,3,4,5] -> [1,2,3,4,5]
  • x3: [1,2,3,4,5] -> [1,2,3,4,5]
  1. Calibration based on y3 in round 2 led to the following outcomes:
  • x1: [3,4,5] -> [4,5]
  • x2: [1,2,3,4,5] -> [1,2,3,4,5]
  • x3: [1,2,3,4,5] -> [1,2,3,4,5]
  1. Calibration based on y2 in round 3 led to the following outcomes:
  • x1: [4,5] -> [4,5]
  • x2: [1,2,3,4,5] -> [1,2,3,4]
  • x3: [1,2,3,4,5] -> [1,2,3,4]

Related Document:

Moongi Choi, Andrew Crooks, Neng Wan, Simon Brewer, Thomas J. Cova & Alexander Hohl (2024) Addressing equifinality in agent-based modeling: a sequential parameter space search method based on sensitivity analysis, International Journal of Geographical Information Science, DOI: 10.1080/13658816.2024.2331536

Author