Surrogate Library

The SGTELIB library is a dynamic surrogate modelling library. It is used in the Search step of Mads to dynamically construct models from the previous evaluations. During a Search step that uses SGTELIB, models of the objective and the constraints are constructed and a surrogate subproblem involving these models is optimized. The resulting solutions are the next candidates for evaluation by the true problem.

Models from the SGTELIB library can be used by setting the parameter SGTELIB_MODEL_SEARCH to yes or true.

Models

Models in SGTELIB are defined by using a succession of field names and field values. To choose a model, the parameter SGTELIB_MODEL_DEFINITION must be used followed by the field name TYPE, and then by the model type. The subsequent fields enable to define the settings of the model. Each field name is made of one single word and each field value is made of one single word or numerical value.

Example : SGTELIB_MODEL_DEFINITION TYPE <model type> FIELD1 <field 1 value> FIELD2 <field 2 value>

The section below describes the models and settings available.

Types of models

Below is the list of all possible models and their authorized fields.

`PRS`

PRS (Polynomial Response Surface) is a type of model.
Authorized fields:

degree (Can be optimized)
ridge (Can be optimized)
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Examples:
TYPE PRS DEGREE 2
TYPE PRS DEGREE OPTIM RIDGE OPTIM

`PRS_EDGE`

PRS_EDGE (Polynomial Response Surface EDGE) is a type of model that allows to model discontinuities at 0 by using additional basis functions.
Authorized fields:

degree (Can be optimized)
ridge (Can be optimized)
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Examples:
TYPE PRS_EDGE DEGREE 2
TYPE PRS_EDGE DEGREE OPTIM RIDGE OPTIM

`PRS_CAT`

PRS_CAT (Categorical Polynomial Response Surface) is a type of model that allows to build one PRS model for each different value of the first component of x.
Authorized fields:

degree (Can be optimized)
ridge (Can be optimized)
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE PRS_CAT DEGREE 2
TYPE PRS_CAT DEGREE OPTIM RIDGE OPTIM

`RBF`

RBF (Radial Basis Function) is a type of model.
Authorized fields:

kernel_type (Can be optimized)
kernel_shape (Can be optimized)
distance_type (Can be optimized)
ridge (Can be optimized)
preset: Defines the type of RBF model used.
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE RBF KERNEL_TYPE D1 KERNEL_SHAPE OPTIM DISTANCE TYPE NORM2

`KS`

KS (Kernel Smoothing) is a type of model.
Authorized fields:

kernel_type (Can be optimized)
kernel_shape (Can be optimized)
distance_type (Can be optimized)
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE KS KERNEL_TYPE OPTIM KERNEL_SHAPE OPTIM

`KRIGING`

KRIGING is a type of model.
Authorized fields:

ridge (Can be optimized)
distance_type (Can be optimized)
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE KRIGING

`LOWESS`

LOWESS (Locally Weighted Regression) is a type of model (from [TaAuKoLed2016]).
Authorized fields:

degree: Must be 1 (default) or 2 (Can be optimized).
ridge (Can be optimized)
kernel_type (Can be optimized)
kernel_shape (Can be optimized)
distance_type (Can be optimized)
preset: Defines how the weight of each data point is computed.
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE LOWESS DEGREE 1
TYPE LOWESS DEGREE OPTIM KERNEL_SHAPE OPTIM KERNEL_TYPE D1
TYPE LOWESS DEGREE OPTIM KERNEL_SHAPE OPTIM KERNEL_TYPE OPTIM DISTANCE TYPE OPTIM

`CN`

CN (Closest Neighbours) is a type of model.
Authorized fields:

distance_type (Can be optimized)
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE CN

`ENSEMBLE`

ENSEMBLE is a type of model that uses multiple models simultaneously.
Authorized fields:

weight: Defines how the ensemble weights are computed.
metric: Defines which metric is used to compute the weights.
distance_type: This parameter is transfered to the models contained in the Ensemble.
preset: Defines the selection of models in the ensemble.
budget: Defines the budget allocated for parameter optimization.
output: Defines the output text file.

Example:
TYPE ENSEMBLE WEIGHT SELECT METRIC OECV
TYPE ENSEMBLE WEIGHT OPTIM METRIC RMSECV DISTANCE TYPE NORM2 BUDGET 100

`ENSEMBLE_STAT`

ENSEMBLE_STAT is a type of model (from [AuLedSa2021]).
Authorized fields:

all the fields from ensemble (with different default values though).
uncertainty: Selects an alternative for the uncertainty (smooth or nonsmooth).
size_param: Defines the size parameter (different meaning depending on the value of UNCERTAINTY).
sigma_mult: Defines the scaling factor of the uncertainty.
lambda_p: Defines the shape parameter of the probability of feasibility.
lambda_pi: Defines the shape parameter of the probability of improvement.

Example:
TYPE ENSEMBLE_STAT UNCERTAINTY SMOOTH WEIGHT SELECT5 METRIC RMSECV SIZE_PARAM 15

The following table summarizes the possible fields for every model.

Model authorized fields

Model type	`degree`	`ridge`	`kernel_type`	`kernel_shape`	`distance_type`	`preset`	`weight`	`metric`	`uncertainty`	`budget`	`output`
`prs`	✔	✔								✔	✔
`prs_edge`	✔	✔								✔	✔
`prs_cat`	✔	✔								✔	✔
`rbf`		✔	✔	✔	✔	✔				✔	✔
`ks`			✔	✔	✔					✔	✔
`kriging`		✔			✔					✔	✔
`lowess`	✔	✔	✔	✔	✔	✔				✔	✔
`cn`					✔					✔	✔
`ensemble`					✔	✔	✔	✔		✔	✔
`ensemble_stat`					✔	✔	✔	✔	✔	✔	✔

Main model parameters

Below is the list of fields and their descriptions.

`DEGREE`

The field name DEGREE defines the degree of a polynomial response surface. The value must be an integer ≥ 1.
Allowed for models of type: prs, prs_edge, prs_cat and lowess.
Default value: 5

For PRS models, the default degree is 2.
For LOWESS models, the degree must be 1 (default) or 2.

Example:
TYPE PRS DEGREE 3 defines a PRS model of degree 3.
TYPE PRS_EDGE DEGREE 2 defines a PRS_EDGE model of degree 2.
TYPE LOWESS DEGREE OPTIM defines a LOWESS model where the degree is optimized.

`RIDGE`

The field name RIDGE defines the regularization parameter of the model.
Allowed for models of type: prs, prs_edge, prs_cat, rbf, kriging and lowess.
Possible values: Real value ≥ 0. Recommended values are 0 and 0.001.
Default value: 0.001.

Example:
TYPE PRS DEGREE 3 RIDGE 0 defines a PRS model of degree 3 with no ridge.
TYPE PRS DEGREE OPTIM RIDGE OPTIM defines a PRS model where the degree and ridge coefficient are optimized.

`KERNEL_TYPE`

The field name KERNEL_TYPE defines the type of kernel used in the model. The field name KERNEL is equivalent.
Allowed for models of type: rbf, lowess and ks.
Possible values:

D1: Gaussian kernel
D2: Inverse Quadratic Kernel
D3: Inverse Multiquadratic Kernel
D4: Bi-quadratic Kernel
D5: Tri-cubic Kernel
D6: Exponential Sqrt Kernel
D7: Epanechnikov Kernel
I0: Multiquadratic Kernel
I1: Polyharmonic splines, degree 1
I2: Polyharmonic splines, degree 2
I3: Polyharmonic splines, degree 3
I4: Polyharmonic splines, degree 4
OPTIM: The type of kernel is optimized

Default value: D1, except for RBF models where it is I2.

Example:
TYPE KS KERNEL_TYPE D2 defines a KS model with Inverse Quadratic Kernel.
TYPE KS KERNEL_TYPE OPTIM KERNEL_SHAPE OPTIM defines a KS model with optimized kernel shape and type.

`KERNEL_SHAPE`

The field name KERNEL_SHAPE defines the shape coefficient of the kernel function. The field name KERNEL_COEF is equivalent. Note that this field name has no impact for kernel types I1, I2, I3 and I4 because these kernels do not include a shape parameter.
Allowed for models of type: rbf, ks and lowess.
Possible values: Real value ≥ 0. Recommended range is [0.1; 10]. For KS and LOWESS model, small values lead to smoother models.
Default value: By default, the kernel coefficient is optimized.

Example:
TYPE RBF KERNEL_TYPE D4 KERNEL_SHAPE 10 defines a RBF model with an inverse bi-quadratic kernel of shape coefficient 10.
TYPE KS KERNEL_TYPE OPTIM KERNEL_SHAPE OPTIM defines a KS model with optimized kernel shape and type.

`DISTANCE_TYPE`

The field name DISTANCE_TYPE defines the distance function used in the model.
Allowed for models of type: rbf, ks, kriging, lowess, cn, ensemble and ensemble_stat.
Possible values:

NORM1: Euclidian distance
NORM2: Distance based on norm 1
NORMINF: Distance based on norm 1
NORM2_IS0: Tailored distance for discontinuity in 0
NORM2_CAT: Tailored distance for categorical models

Default value: NORM2.

Example:
TYPE KS DISTANCE NORM2_IS0 defines a KS model tailored for VAN optimization.

`PRESET`

The field name PRESET defines the type of model used when applicable.
Allowed for models of type: rbf, lowess, ensemble and ensemble_stat.

When applied to rbf models, PRESET defines the type of RBF.
Possible values:
- O: RBF with linear terms and orthogonal constraints
- R: RBF with linear terms and regularization term
- I: RBF with incomplete set of basis functions (see [AuKoLedTa2016] for RBFI models)
Default value: I.

Example:
TYPE RBF PRESET O
When applied to lowess models [TaAuKoLed2016], PRESET defines how the weight w_i of each data point x_i is computed.
Possible values:
- D: w_i = ϕ(d_i) where ϕ is the kernel of type and shape defined by the fields kernel_type and kernel_shape, respectively, and d_i is the distance between the prediction point and the data point x_i
- DEN: :math:w_i=phi(d_i/d_q) where d_q is the distance between the prediction point and the q^th closest data point, and d_q is computed with an empirical method
- DGN: :math:w_i=phi(d_i/d_q) where d_q is computed with the Gamma method
- RE: w_i = ϕ(r_i) where r_i is the rank of x_i in terms of distance to the prediction point, and r_i is computed with empirical method
- RG: w_i = ϕ(r_i) where r_i is computed with the Gamma method
- REN: same as RE but the ranks are normalized in [0, 1]
- RGN: same as RG but the ranks are normalized in [0, 1]
Default value: DGN.

Example:
TYPE LOWESS PRESET RE
When applied to ensemble or ensemble_stat models, PRESET determines the selection of models in the ensemble.
Possible values:
- DEFAULT: selection of 18 models of types prs, ks, rbf and cn with various settings
- KS: selection of 7 models of type ks with various kernel shapes
- PRS: selection of 7 models of type prs with various degrees
- IS0: selection of 30 models of type prs_edge, ks, rbf with various settings and DISTANCE_TYPE set to NOMR2_IS0
- CAT: selection of 30 models of type prs_edge, ks, rbf with various settings and DISTANCE_TYPE set to NOMR2_CAT
- SUPER1: selection of 4 models of types prs, ks, rbf and lowess
- SMALL: selection of 3 models of types prs, ks and rbf
Default value: DEFAULT.

Example:
TYPE ENSEMBLE PRESET SUPER1

`WEIGHT`

The field name WEIGHT defines the method used to compute the weights w of the ensemble of models. The field name WEIGHT_TYPE is equivalent.
Allowed for models of type: ensemble and ensemble_stat.
Possible values:

WTA1: w_k ∝ ℰ_sum − ℰ_k
WTA3: w_k ∝ (ℰ_k + αℰ_mean)^β
SELECT: w_k ∝ 1 if ℰ_k = ℰ_min (only the best model is selected)
SELECTN: w_k ∝ ℰ_sum^N − ℰ_k (for N = 1, 2, …, 6)
OPTIM: w minimizes ℰ(w)

Where ℰ_k is the error metric (defined by the field name metric) of the k^th model in the ensemble, ℰ_sum is the cumulated error of all models, ℰ_min is the minimal error, ℰ_mean is the average error, α = 0.05, β = − 1, and ℰ_sum^N is the cumulated error metric of the N best models.

Default value: SELECT for ensemble models, SELECT3 for ensemble_stat models with uncertainty set to SMOOTH, and SELECT4 for ensemble_stat models with uncertainty set to NONSMOOTH.

Example:
TYPE ENSEMBLE WEIGHT SELECT METRIC RMSECV defines an ensemble of models which selects the model that has the best RMSECV.
TYPE ENSEMBLE WEIGHT OPTIM METRIC RMSECV defines an ensemble of models where the weights w are computed to minimize the RMSECV of the model.
TYPE ENSEMBLE WEIGHT SELECT3 METRIC OECV defines an ensemble of models which selects the 3 models that have the best OECV.

`UNCERTAINTY`

(specific to ensemble_stat models)

The field name UNCERTAINTY defines the type of uncertainty used in ENSEMBLE_STAT models.
Possible values:

SMOOTH: Smooth alternative of the uncertainty (default)
NONSMOOTH: Nonmooth alternative of the uncertainty

Example:
TYPE ENSEMBLE_STAT UNCERTAINTY NONSMOOTH

`SIZE_PARAM`

(advanced parameter specific to ensemble_stat models)

The field name SIZE_PARAM defines the size of the directions of either :

the simplex used to compute the simplex gradients of the models if the field uncertainty is set to SMOOTH
the positive spanning set used to compare models values if the field uncertainty is set to NONSMOOTH

Possible values: Real value ≥ 0. Recommended range is [0.001; 0.1].
Default value: 0.001 if the field UNCERTAINTY is set to SMOOTH, 0.005 if the field UNCERTAINTY is set to NONSMOOTH.

Example:
TYPE ENSEMBLE_STAT UNCERTAINTY SMOOTH SIZE_PARAM 0.003

`SIGMA_MULT`

(advanced parameter specific to ensemble_stat models)

The field name SIGMA_MULT defines the scaling factor of the uncertain to be multiplied by the variance of already sampled function values.

Possible values: Real value ≥ 0. Recommended range is [1; 100].
Default value: 10.

Example:
TYPE ENSEMBLE_STAT UNCERTAINTY NONSMOOTH SIGMA_MULT 30

`LAMBDA_P`

(advanced parameter specific to ensemble_stat models)

The field name LAMBDA_P defines the shape parameter of the probability of feasibility (P).

Possible values: Real value ≥ 0. Recommended range is [0.1; 10].
Default value: 3 if the field UNCERTAINTY is set to SMOOTH, 1 if the field UNCERTAINTY is set to NONSMOOTH.

Example:
TYPE ENSEMBLE_STAT UNCERTAINTY NONSMOOTH LAMBDA_P 1.5

`LAMBDA_PI`

(advanced parameterspecific to ensemble_stat models)

The field name LAMBDA_PI defines the shape parameter of the probability of improvement (PI).

Possible values: Real value ≥ 0. Recommended range is [0.01; 3].
Default value: 0.1 if the field UNCERTAINTY is set to SMOOTH, 0.5 if the field UNCERTAINTY is set to NONSMOOTH.

Example:
TYPE ENSEMBLE_STAT UNCERTAINTY NONSMOOTH LAMBDA_PI 0.3

`OUTPUT`

Defines a text file in which model information are recorded. Allowed for ALL types of model.

Parameter optimization and selection

Below is the list of some field names and values that influence the behaviour of other fields.

`OPTIM`

The field value OPTIM indicates that the model parameter must be optimized. The default optimization criteria is the AOECV error metric (except for ENSEMBLE_STAT models where it is OECV).
Parameters that can be optimized:

degree
ridge
kernel_type
kernel_shape
distance_type

Example:
TYPE PRS DEGREE OPTIM
TYPE LOWESS DEGREE OPTIM KERNEL_TYPE OPTIM KERNEL_SHAPE OPTIM METRIC ARMSECV

`METRIC`

The field name METRIC defines the metric used to select the parameters of the model (including the weights of Ensemble models).
Allowed for ALL types of model.
Possible values:

EMAX: Error Max
EMAXCV: Error Max with Cross-Validation
RMSE: Root Mean Square Error
RMSECV: RMSE with Cross-Validation
OE: Order Error
OECV: Order Error with Cross-Validation [AuKoLedTa2016]
LINV: Invert of the Likelihood
AOE: Aggregate Order Error
AOECV: Aggregate Order Error with Cross-Validation [TaAuKoLed2016]

Default value: AOECV, except for ensemble_stat models where it is OECV.

Example:
TYPE ENSEMBLE WEIGHT SELECT METRIC RMSECV defines an ensemble of models which selects the model that has the best RMSECV.

`BUDGET`

Budget for model parameter optimization. The number of sets of model parameters that are tested is equal to the optimization budget multiplied by the number of parameters to optimize.
Allowed for ALL types of model.
Default value: 20

Example:
TYPE LOWESS KERNEL_SHAPE OPTIM METRIC AOECV BUDGET 100
TYPE ENSEMBLE WEIGHT OPTIM METRIC RMSECV BUDGET 50

Surrogate subproblem formulations

The SGTELIB library offers different formulations of the surrogate subproblem to be optimized at the Search step (see [TaLeDKo2014]). The SGTELIB_MODEL_FORMULATION parameter enables to choose a formulation, and the parameter SGTELIB_MODEL_DIVERSIFICATION enables to adjust a diversification parameter.

`SGTELIB_MODEL_FORMULATION`

The formulations of the surrogate subproblem involve various quantities.
f̂ denotes a model of the objective f and ĉ_j a model of the constraint c_j, j = 1, 2, …, m. For x ∈ X, σ_f(x) denotes the uncertainty associated with the prediction f̂(x), and σ_j(x) denotes the uncertainty associated with the prediction ĉ_j(x), j = 1, 2, …, m. This uncertainty depends on the model chosen.

For a kriging model, σ_f(x) (or σ_j(x)) is readily available through the standard deviation that the model natively produces.
For an ensemble_stat model, the uncertainty is constructed by comparing the predictions of the ensemble models (see [AuLedSa2021]).
For any other model except ENSEMBLE, σ_f(x) (or σ_j(x)) is computed with the distance from x to previously evaluated points.
Finally, for an ensemble model, the uncertainty is computed through a weighted sum of the squared uncertainties of the ensemble models.

There are eight different formulations that can be chosen with the parameter SGTELIB_MODEL_FORMULATION. Some formulations involve a parameter λ that is described later.

FS (default):

$$\begin{aligned} \min_{x\in X}&\ \ \hat f(x)-\lambda\hat\sigma_f(x) \\\ \mathrm{s.t.}&\ \ \hat c_j(x)-\lambda\hat\sigma_j(x)\leq0,\ \ j=1,2,\dots,m \end{aligned}$$

FSP:

$$\begin{aligned} \min_{x\in X}&\ \ \hat f(x)-\lambda\hat\sigma_f(x) \\\ \mathrm{s.t.}&\ \ \mathrm{P}(x)\geq 0.5 \end{aligned}$$

where P is the probability of feasibility which is the probability that a given point is feasible.

EIS:

$$\begin{aligned} \min_{x\in X}&\ -\mathrm{EI}(x)-\lambda\hat\sigma_f(x) \\\ \mathrm{s.t.}&\ \ \hat c_j(x)-\lambda\hat\sigma_j(x)\leq0,\ \ j=1,2,\dots,m \end{aligned}$$

where EI is the expected improvement that takes into account the probability of improvement and the expected amplitude thereof.

EFI:

min_x ∈ X − EFI(x)

where EFI is the expected feasible improvement : EFI = EI × P.

EFIS:

min_x ∈ X − EFI(x) − λσ̂_f(x)

EFIM:

min_x ∈ X − EFI(x) − λσ̂_f(x)μ(x)

where μ is the uncertainty in the feasibility : μ = 4P × (1 − P).

EFIC:

min_x ∈ X − EFI(x) − λ(EI(x)μ(x) + P(x)σ̂_f(x))

PFI:

min_x ∈ X − PFI(x)

where PFI is the probability of improvement : PFI = PI × P, with PI being the probability of improvement which is the probability that the objective decreases from the best known value at a given point.

Example:
SGTELIB_MODEL_DEFINITION TYPE KRIGING
SGTELIB_MODEL_FORMULATION EFIC
The two lines above define a surrogate subproblem based on the EFIC formulation that will involve kriging models.

`SGTELIB_MODEL_DIVERSIFICATION`

The exploration parameter λ enables to control the exploration of the search space against the intensification in the most promising areas. A higher λ favors exploration whereas a lower λ favors intensification.

λ is a real value in [0, 1] defined by the parameter SGTELIB_MODEL_DIVERSIFICATION.
Default value : 0.01.

Example:
SGTELIB_MODEL_DEFINITION TYPE ENSEMBLE
SGTELIB_MODEL_FORMULATION FSP
SGTELIB_MODEL_DIVERSIFICATION 0.1
The three lines above define a surrogate subproblem based on the FSP formulation with an exploration parameter equals to 0.1 that will involve ensemble models.

References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SgteLib.rst

SgteLib.rst

Surrogate Library

Models

Types of models

`PRS`

`PRS_EDGE`

`PRS_CAT`

`RBF`

`KS`

`KRIGING`

`LOWESS`

`CN`

`ENSEMBLE`

`ENSEMBLE_STAT`

Main model parameters

`DEGREE`

`RIDGE`

`KERNEL_TYPE`

`KERNEL_SHAPE`

`DISTANCE_TYPE`

`PRESET`

`WEIGHT`

`UNCERTAINTY`

`SIZE_PARAM`

`SIGMA_MULT`

`LAMBDA_P`

`LAMBDA_PI`

`OUTPUT`

Parameter optimization and selection

`OPTIM`

`METRIC`

`BUDGET`

Surrogate subproblem formulations

`SGTELIB_MODEL_FORMULATION`

`SGTELIB_MODEL_DIVERSIFICATION`

Files

SgteLib.rst

Latest commit

History

SgteLib.rst

File metadata and controls

Surrogate Library

Models

Types of models

PRS

PRS_EDGE

PRS_CAT

RBF

KS

KRIGING

LOWESS

CN

ENSEMBLE

ENSEMBLE_STAT

Main model parameters

DEGREE

RIDGE

KERNEL_TYPE

KERNEL_SHAPE

DISTANCE_TYPE

PRESET

WEIGHT

UNCERTAINTY

SIZE_PARAM

SIGMA_MULT

LAMBDA_P

LAMBDA_PI

OUTPUT

Parameter optimization and selection

OPTIM

METRIC

BUDGET

Surrogate subproblem formulations

SGTELIB_MODEL_FORMULATION

SGTELIB_MODEL_DIVERSIFICATION

`PRS`

`PRS_EDGE`

`PRS_CAT`

`RBF`

`KS`

`KRIGING`

`LOWESS`

`CN`

`ENSEMBLE`

`ENSEMBLE_STAT`

`DEGREE`

`RIDGE`

`KERNEL_TYPE`

`KERNEL_SHAPE`

`DISTANCE_TYPE`

`PRESET`

`WEIGHT`

`UNCERTAINTY`

`SIZE_PARAM`

`SIGMA_MULT`

`LAMBDA_P`

`LAMBDA_PI`

`OUTPUT`

`OPTIM`

`METRIC`

`BUDGET`

`SGTELIB_MODEL_FORMULATION`

`SGTELIB_MODEL_DIVERSIFICATION`