# Uncertainty Sampling (US) for a High-Throughput Reactor
* A specific architecture of multi-channel (16 channel) reactor is considered in this example.
* For example, a reactor is composed of 4 reactor blocks, each including 4 individual reactors.
* The temperature of each reactor block is controlled by a single thermocouple.
* Thus, the US algorithm in this case needs to select 4 temperatures based on posterior uncertainty first, and then suggest 4 experimental conditions by q-batch sampling approach for each selected temperature.

## 1. Set path to a data file

In [2]:
import catdegus.active_learning.acquisition as aq
import catdegus.active_learning.gaussian_process as gpc
import catdegus.visualization.plot as pl

# Define the home directory and path to data
# Target metric: initial CO2 conversion
path = "./20250228_sheet_for_ML_unique.xlsx"

  from .autonotebook import tqdm as notebook_tqdm


## 2. Preprocess data and train a Gaussian process model 
* `path`: path to data file (excel)

In [10]:
# Train the Gaussian Process model
GP = gpc.GaussianProcess()
GP.preprocess_data_at_once(path=path,
                           target='CO2 Conversion (%)_initial value',
                           x_range_min=[300, 0.1, 0.005, 0], x_range_max=[550, 1.0, 0.02, 1])
GP.train_gp()

self.df.dtypes: reaction_temp                         int64
Rh_weight_loading                   float64
Rh_total_mass                       float64
synth_method                          int64
CO2 Conversion (%)_initial value    float64
dtype: object
numerical_features (selected):  ['reaction_temp', 'Rh_weight_loading', 'Rh_total_mass', 'synth_method']
categorical_features (selected):  []


  self.df.replace(
  check_min_max_scaling(


SingleTaskGP(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): LogNormalPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (mean_module): ConstantMean()
  (covar_module): RBFKernel(
    (lengthscale_prior): LogNormalPrior()
    (raw_lengthscale_constraint): GreaterThan(2.500E-02)
  )
  (outcome_transform): Standardize()
)

## 3. Construct a discrete grid for the optimization of an acquisition function

In [11]:
# Construct the discrete grid for optimization
Grid = aq.DiscreteGrid(
    GP=GP,
    x_range_min=[300, 0.1, 0.005, 0], x_range_max=[550, 1.0, 0.02, 1], x_step=[50, 0.1, 0.0025, 1]
)
Grid.construct_grid()

840 combinations are possible in the constructed grid.


## 4. Suggestion of 16 experimental conditions by US

* Selection of 4 temperatures with highest uncertainties averaged over the other features.

In [12]:
# Select the top 4 uncertain temperatures based on the NP synthesis method
top_temps = Grid.select_uncertain_temperatures(synth_method='NP', n_temperatures=4)

Average Std. Dev. for each temperature:
Temperature: 300.0 C, Average Std. Dev.: 0.7316451870272337
Temperature: 350.0 C, Average Std. Dev.: 0.7431894608164192
Temperature: 400.0 C, Average Std. Dev.: 0.719630454017343
Temperature: 450.0 C, Average Std. Dev.: 0.5931891681930836
Temperature: 500.0 C, Average Std. Dev.: 0.43990275692642694
Temperature: 550.0 C, Average Std. Dev.: 0.681404212131157


* Using `optimize_posterior_std_dev_discrete_batch()`, q-batch sampling for specific synthesis method and temperature (two equality contraints) can be performed. Sampled points are approximated to the closest grid points.
* Four batch-sampled conditions are suggested for each selected temperature, resulting in 16 conditions in total.

In [13]:
for i, temp in enumerate(top_temps):
    print(f'{i+1}. Selected temperature with high uncertainty: {temp} C')
    # 2) batch sampling with two equality constraints (synthesis method and temperature)
    display(
        Grid.optimize_posterior_std_dev_discrete_batch(
        synth_method='NP',
        temperature=temp,
        n_candidates=4,
        )
    )

1. Selected temperature with high uncertainty: 350.0 C
Temperature 350.0 C is transformed to 0.2.

Batch candidates shape: torch.Size([4, 4])
Acquisition values shape: torch.Size([])
Acquisition values: 1.49676142981457

Batch candidates:
tensor([[2.0000e-01, 6.1851e-01, 4.6427e-15, 1.0000e+00],
        [2.0000e-01, 3.0485e-01, 1.0000e+00, 1.0000e+00],
        [2.0000e-01, 3.1947e-01, 0.0000e+00, 1.0000e+00],
        [2.0000e-01, 6.1720e-01, 1.0000e+00, 1.0000e+00]])

Batch candidates (closest grid points):
tensor([[0.2000, 0.6667, 0.0000, 1.0000],
        [0.2000, 0.3333, 1.0000, 1.0000],
        [0.2000, 0.3333, 0.0000, 1.0000],
        [0.2000, 0.6667, 1.0000, 1.0000]], dtype=torch.float64)


Unnamed: 0,reaction_temp,Rh_weight_loading,Rh_total_mass,synth_method
112,350,0.7,0.005,1
97,350,0.4,0.02,1
91,350,0.4,0.005,1
118,350,0.7,0.02,1


2. Selected temperature with high uncertainty: 300.0 C
Temperature 300.0 C is transformed to 0.0.

Batch candidates shape: torch.Size([4, 4])
Acquisition values shape: torch.Size([])
Acquisition values: 1.5324495186741025

Batch candidates:
tensor([[0.0000e+00, 3.7753e-01, 1.0000e+00, 1.0000e+00],
        [0.0000e+00, 4.8473e-01, 1.0750e-16, 1.0000e+00],
        [0.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00],
        [0.0000e+00, 6.7328e-01, 1.0000e+00, 1.0000e+00]])

Batch candidates (closest grid points):
tensor([[0.0000, 0.3333, 1.0000, 1.0000],
        [0.0000, 0.4444, 0.0000, 1.0000],
        [0.0000, 1.0000, 1.0000, 1.0000],
        [0.0000, 0.6667, 1.0000, 1.0000]], dtype=torch.float64)


Unnamed: 0,reaction_temp,Rh_weight_loading,Rh_total_mass,synth_method
27,300,0.4,0.02,1
28,300,0.5,0.005,1
69,300,1.0,0.02,1
48,300,0.7,0.02,1


3. Selected temperature with high uncertainty: 400.0 C
Temperature 400.0 C is transformed to 0.4.

Batch candidates shape: torch.Size([4, 4])
Acquisition values shape: torch.Size([])
Acquisition values: 1.4504838237775561

Batch candidates:
tensor([[4.0000e-01, 1.0665e-16, 5.6509e-15, 1.0000e+00],
        [4.0000e-01, 5.5549e-01, 1.0387e-16, 1.0000e+00],
        [4.0000e-01, 1.0000e+00, 4.2738e-16, 1.0000e+00],
        [4.0000e-01, 4.0327e-01, 1.0000e+00, 1.0000e+00]])

Batch candidates (closest grid points):
tensor([[0.4000, 0.0000, 0.0000, 1.0000],
        [0.4000, 0.5556, 0.0000, 1.0000],
        [0.4000, 1.0000, 0.0000, 1.0000],
        [0.4000, 0.4444, 1.0000, 1.0000]], dtype=torch.float64)


Unnamed: 0,reaction_temp,Rh_weight_loading,Rh_total_mass,synth_method
140,400,0.1,0.005,1
175,400,0.6,0.005,1
203,400,1.0,0.005,1
174,400,0.5,0.02,1


4. Selected temperature with high uncertainty: 550.0 C
Temperature 550.0 C is transformed to 1.0.

Batch candidates shape: torch.Size([4, 4])
Acquisition values shape: torch.Size([])
Acquisition values: 1.5221659698050514

Batch candidates:
tensor([[1.0000e+00, 6.9227e-01, 1.0890e-14, 1.0000e+00],
        [1.0000e+00, 1.0000e+00, 3.9371e-16, 1.0000e+00],
        [1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00],
        [1.0000e+00, 4.1390e-16, 9.5716e-15, 1.0000e+00]])

Batch candidates (closest grid points):
tensor([[1.0000, 0.6667, 0.0000, 1.0000],
        [1.0000, 1.0000, 0.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64)


Unnamed: 0,reaction_temp,Rh_weight_loading,Rh_total_mass,synth_method
392,550,0.7,0.005,1
413,550,1.0,0.005,1
419,550,1.0,0.02,1
350,550,0.1,0.005,1
