# Introduction

This project will focus on exploring the capabilities of Bayesian optimization, specifically employing BayBE, in the discovery of novel corrosion inhibitors for materials design. Initially, we will work with a randomly chosen subset from a comprehensive database of electrochemical responses of small organic molecules. Our goal is to assess how Bayesian optimization can speed up the screening process across the design space to identify promising compounds. We will compare different strategies for incorporating alloy information, while optimizing the experimental parameters with respect to the inhibitive performance of the screened compounds.

# Initizalization

Loading libraries and data files:

In [1]:
import pandas as pd
import numpy as np
from baybe import Campaign

df_AA2024 = pd.read_excel('data/filtered_AA2024.xlsx')
df_AA1000 = pd.read_excel('data/filtered_AA1000.xlsx')
df_Al = pd.read_excel('data/filtered_Al.xlsx')

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
df_AA1000.describe()

Unnamed: 0,Time_h,pH,Inhib_Concentrat_M,Salt_Concentrat_M,Efficiency
count,1966.0,1966.0,1966.0,1966.0,1966.0
mean,71.332309,3.626585,0.02918525,0.077655,48.38354
std,154.594325,4.089542,0.2442926,0.217774,163.164194
min,0.0,-0.6,1e-07,0.0,-4834.0
25%,1.67,0.0,0.0001,0.0,35.6025
50%,24.0,1.0,0.001,0.0,61.49
75%,24.0,7.0,0.002,0.1,82.0
max,720.0,14.30103,3.28,2.0,100.0


In [3]:
df_AA2024.describe()

Unnamed: 0,Time_h,pH,Inhib_Concentrat_M,Salt_Concentrat_M,Efficiency
count,611.0,611.0,611.0,611.0,611.0
mean,135.801964,6.342062,0.006808,0.14545,26.736841
std,201.683867,2.52908,0.014059,0.200575,288.788317
min,0.5,0.0,1e-05,0.0,-4834.0
25%,24.0,4.0,0.0005,0.01,30.0
50%,24.0,7.0,0.001,0.1,58.0
75%,144.0,7.0,0.003,0.1,87.95
max,672.0,10.0,0.1,0.6,100.0


In [4]:
df_Al.describe()

Unnamed: 0,Time_h,pH,Inhib_Concentrat_M,Salt_Concentrat_M,Efficiency
count,1966.0,1966.0,1966.0,1966.0,1966.0
mean,71.332309,3.626585,0.02918525,0.077655,48.38354
std,154.594325,4.089542,0.2442926,0.217774,163.164194
min,0.0,-0.6,1e-07,0.0,-4834.0
25%,1.67,0.0,0.0001,0.0,35.6025
50%,24.0,1.0,0.001,0.0,61.49
75%,24.0,7.0,0.002,0.1,82.0
max,720.0,14.30103,3.28,2.0,100.0


# Data Processing

In [8]:
df = df_AA2024

# Data Anaylsis

# Bayesian Optimization

## Search Space

Define parameters

In [23]:
from baybe.parameters import NumericalDiscreteParameter, NumericalContinuousParameter
from baybe.searchspace import SearchSpace

parameters = [
NumericalContinuousParameter(
    name="Time (h)",
    bounds=(df['Time_h'].min(), df['Time_h'].max()),
),
NumericalContinuousParameter(
    name="pH",
    bounds=(1, 14),
    ),  
NumericalContinuousParameter(
    name="Inhibitor Concentration (M)",
    bounds=(df['Inhib_Concentrat_M'].min(), df['Inhib_Concentrat_M'].max()),
    ),
NumericalContinuousParameter(
    name="Salt_Concentrat_M",
    bounds=(df['Salt_Concentrat_M'].min(), df['Salt_Concentrat_M'].max()),
    )
]

TypeError: NumericalContinuousParameter.__init__() got an unexpected keyword argument 'tolerance'

## Objective

In [19]:
from baybe.targets import NumericalTarget
from baybe.objective import Objective

target = NumericalTarget(
    name="Efficiency",
    mode="MAX",
)
objective = Objective(mode="SINGLE", targets=[target])


In [20]:
from baybe.searchspace import SearchSpace

searchspace = SearchSpace.from_product(parameters)

## Recommender

In [21]:
from baybe import Campaign

campaign = Campaign(searchspace, objective)

df_recommend = campaign.recommend(batch_size=3)
print(df_recommend)

     Time (h)        pH  Inhibitor Concentration (M)  Salt_Concentrat_M
0  139.744749  9.907613                     0.003735           0.418879
1  516.446510  8.229525                     0.009470           0.463482
2   99.505625  7.063403                     0.030336           0.543470


ValueError: operands could not be broadcast together with shapes (6,) (3,4) 

# Benchmarking

# Transfer Learning