Import the optimizer

In [1]:
from edbo.plus.optimizer_botorch import EDBOplus

Import and generate the reaction scope

In [2]:
reaction_components = {
    'solvent': ['THF', 'Toluene', 'DMSO'],
    'T': [-10, 0, 10, 25],
    'concentration': [0.1, 0.2, 1.0]
}

In [3]:
EDBOplus().generate_reaction_scope(
    components=reaction_components, 
    filename='my_optimization.csv',
    check_overwrite=False
)

Generating a reaction scope...
The scope was generated and contains 36 possible reactions!


Unnamed: 0,solvent,T,concentration
0,THF,-10,0.1
1,THF,-10,0.2
2,THF,-10,1.0
3,THF,0,0.1
4,THF,0,0.2
5,THF,0,1.0
6,THF,10,0.1
7,THF,10,0.2
8,THF,10,1.0
9,THF,25,0.1


Run without prior experimental observations, leading to samples being suggested by (pseudo-)random sampling.

In [4]:
EDBOplus().run(
    filename='my_optimization.csv',  # Previously generated scope.
    objectives=['yield'],  # Objectives to be optimized.
    objective_mode=['max'],  # Maximize yield and ee but minimize side_product.
    batch=3,  # Number of experiments in parallel that we want to perform in this round.
    columns_features='all', # features to be included in the model.
    init_sampling_method='cvt'  # initialization method.
)

There are no experimental observations yet. Random samples will be drawn.
The following columns are categorical and will be encoded using One-Hot-Encoding: ['solvent']
Generated 3 initial samples using cvt sampling (seed = 0). Run finished!


Unnamed: 0,solvent,T,concentration,yield,priority
32,DMSO,10,1.0,PENDING,1
8,THF,10,1.0,PENDING,1
19,Toluene,10,0.2,PENDING,1
0,THF,-10,0.1,PENDING,0
26,DMSO,-10,1.0,PENDING,0
21,Toluene,25,0.1,PENDING,0
22,Toluene,25,0.2,PENDING,0
23,Toluene,25,1.0,PENDING,0
24,DMSO,-10,0.1,PENDING,0
25,DMSO,-10,0.2,PENDING,0


Add experimental data

In [5]:
import pandas as pd
df_edbo = pd.read_csv('my_optimization.csv')

df_edbo.loc[0, 'yield'] = 20.5
# df_edbo.loc[0,'ee'] = 70


df_edbo.loc[1, 'yield'] = 50.3
# df_edbo.loc[1,'ee'] = 30


df_edbo.to_csv('my_optimization.csv', index=False)

df_edbo.head(5)


Unnamed: 0,solvent,T,concentration,yield,priority
0,DMSO,10,1.0,20.5,1
1,THF,10,1.0,50.3,1
2,Toluene,10,0.2,PENDING,1
3,THF,-10,0.1,PENDING,0
4,DMSO,-10,1.0,PENDING,0


Run with the added experimental observations

In [6]:
EDBOplus().run(
    filename='my_optimization.csv',  # Previous scope (including observations).
    objectives=['yield'],  # Objectives to be optimized.
    objective_mode=['max'],  # Maximize yield and ee but minimize side_product.
    batch=3,  # Number of experiments in parallel that we want to perform in this round.
    columns_features='all', # features to be included in the model.
    init_sampling_method='cvtsampling',  # initialization method.
    acquisition_function='NoisyEHVI'
)

This run will optimize for the following objectives: ['yield']
The following features will be used: ['concentration', 'T', 'solvent']
The following columns are categorical and will be encoded using One-Hot-Encoding: ['solvent']
Generating surrogate model...
Model generated!
Optimizing acqusition function...
Acquisition function optimized.
Predictions and expected improvement obtained.
Run finished!


Unnamed: 0,solvent,T,concentration,yield,priority
19,THF,-10,0.2,PENDING,1.0
14,DMSO,10,0.2,PENDING,1.0
8,DMSO,-10,0.1,PENDING,1.0
7,Toluene,25,1.0,PENDING,0.0
6,Toluene,25,0.2,PENDING,0.0
5,Toluene,25,0.1,PENDING,0.0
17,Toluene,10,1.0,PENDING,0.0
2,Toluene,10,0.2,PENDING,0.0
18,Toluene,10,0.1,PENDING,0.0
28,Toluene,0,1.0,PENDING,0.0


Looking at the model predictions.

In [7]:
df_predictions_round0 = pd.read_csv('pred_my_optimization.csv')
df_predictions_round0.style.background_gradient(subset=['priority'], cmap='plasma')

Unnamed: 0,solvent,T,concentration,yield,priority,yield_predicted_mean,yield_predicted_variance,yield_expected_improvement
0,THF,-10,0.2,PENDING,1.0,35.401515,106.348013,77.612274
1,DMSO,10,0.2,PENDING,1.0,34.316046,22.042096,10.738592
2,DMSO,-10,0.1,PENDING,1.0,35.398487,106.348019,77.61085
3,Toluene,25,1.0,PENDING,0.0,35.389097,106.282733,77.554453
4,Toluene,25,0.2,PENDING,0.0,35.389187,106.283698,77.555264
5,Toluene,25,0.1,PENDING,0.0,35.389211,106.283951,77.555477
6,Toluene,10,1.0,PENDING,0.0,34.34167,23.938856,12.172579
7,Toluene,10,0.2,PENDING,0.0,34.383269,27.152537,14.630186
8,Toluene,10,0.1,PENDING,0.0,34.393614,27.974466,15.263226
9,Toluene,0,1.0,PENDING,0.0,35.328145,104.684494,76.253683
