# 1. Introduction

This notebook demonstrates how to build a Bayesian optimization task with manual acquisition. Manual acquisition means that although this notebook will propose what kind of new samples should be created, NOMAD users will need to create them manually. This is required when the sample creation cannot be automated or fully controlled by the notebook, e.g. they require manual work in a lab.

## What is Bayesian Optimization?

Bayesian Optimization is a technique for finding the best solution to a problem when testing every possible option is too time-consuming or expensive. Imagine you're trying to make the perfect cake, but each ingredient combination takes a lot of time and effort to try. Instead of baking every possible cake, Bayesian Optimization helps you decide which combinations to try next, based on what you’ve learned so far.

It does this by building a "probability model" that predicts how good different options might be. After trying a few options, the model suggests the next best option to test—one that has a good chance of being the best or teaches you something new. Over time, this approach zeroes in on the best solution without needing to try everything.

This method is widely used in areas like machine learning, where testing models can be very expensive, or in engineering, where designing experiments can be costly. It’s like having a smart assistant that helps you explore the most promising paths first, saving you time and resources.

# 2. Defining the search space

The first task is to define the search space for the optimization. This simply means that we give a list of the parameters that can be controlled, and some kind of reasonable limits for them.

In the NOMAD context, this means that we define a list of `quantities` that are used as the inputs, and give them some ranges. These quantities need to be part of some existing schema that describes the samples that will be created. E.g. for an experiment, they could be quantities in a schema that describes how the sample is created. If you do not already have a schema for the samples you are working with, please head over to our documentation on creating new schemas.

In [1]:
from baybe.parameters import CategoricalParameter, NumericalContinuousParameter
from baybe.searchspace import SearchSpace

api_url = ''
schema_name = 'nomad_bayesian_optimization.schema_packages.experiments.CVDExperiment'

parameters = [
    CategoricalParameter(
        name='substrate',
        values=['Silicon carbide', 'Silicon', 'Gallium nitride'],
        encoding='OHE',  # one-hot encoding of categories
    ),
    NumericalContinuousParameter(
        name='temperature',
        bounds=(300, 600),
    ),
    NumericalContinuousParameter(
        name='gas_flow_rate',
        bounds=(0.2, 5),
    ),
]

searchspace = SearchSpace.from_product(parameters)

# 3. Define the optimization target

Next we need to define the optimization target. Depending on the use case, it might be that we wish for a certain property to achive a specific value, or we wish to minimize/maximize some value.

In the NOMAD context, we need to define a quantity in our sample schema that we wish to use as an optimization target, and then specify what an optimal value is.

In [2]:
from baybe.objectives import SingleTargetObjective
from baybe.targets import NumericalTarget

refractive_index_target = 2.6473
refractive_index_sigma = 0.2
target = NumericalTarget(
    name='refractive_index',
    mode='MATCH',
    bounds=(
        refractive_index_target - refractive_index_sigma,
        refractive_index_target + refractive_index_sigma,
    ),
    transformation='BELL',
)
objective = SingleTargetObjective(target=target)

# 4. Define acquisition strategy

Acquisition is the process of suggesting new samples that should be tried out. Here you can control how this acquisition step should be performed by the algorithm by e.g. balancing between "exploration" and "exploitation": we can control how adventurously we wish the algorithm to pick new samples, taking into account the existing samples. If starting a new acquisition from scratch, we often need to make the process in two steps: First we pick samples more or less randomly from the search space, after which we start another approach that takes the gained knowledge into account.

Fully understanding and properly tuning this step requires knowledge about the Bayesian Optimization theory. We do, however, give some reasonable defaults here for different optimization use cases.

In [3]:
from baybe.recommenders import NaiveHybridSpaceRecommender, TwoPhaseMetaRecommender

recommender = TwoPhaseMetaRecommender(recommender=NaiveHybridSpaceRecommender())

# 5. Define how new samples are fetched

In [4]:
import numpy as np

from nomad_bayesian_optimization.schema_packages.experiments import CVDExperiment


def get_samples(recommendations):
    """In this function you can decide how the actual experiment/simulation is
    performed. There are several alternatives:

    - Maybe you can control measurement devices directly through API calls.
    - Maybe you create a loop that waits until someone manually inserts the
      experiment results into NOMAD, and then query the results from it using
      the NOMAD API.
    - Maybe you run a simulation in this notebook
    - Maybe you run a simulation using an HPC batch system

    In this example, we will create entries by sampling from a
    fake model.
    """
    for _, row in recommendations.iterrows():
        cvd_experiment = CVDExperiment().m_from_dict(row.to_dict())
        temp_mu = 400
        temp_sigma = 200
        gas_flow_mu = 2
        gas_flow_sigma = 3
        ideal_substrate = 'Silicon carbide'
        refractive_index = float(
            2.6473
            * np.exp(-((cvd_experiment.temperature.m - temp_mu) ** 2 / temp_sigma**2))
            * np.exp(
                -(
                    (cvd_experiment.gas_flow_rate.m - gas_flow_mu) ** 2
                    / gas_flow_sigma**2
                )
            )
        )
        if cvd_experiment.substrate != ideal_substrate:
            refractive_index *= 0.9
        cvd_experiment.refractive_index = refractive_index
        return cvd_experiment

  m_env = Environment()


# 6. Start the optimization loop

In [None]:
import json

from baybe import Campaign
from nomad.datamodel import EntryArchive

from nomad_bayesian_optimization.schema_packages.bayesian_optimization import (
    BayesianOptimization,
)

# Start a new optimization task. This task will run until the desired accuracy
# has been achieved. You can leave this running as a NORTH tool and come back to
# it later.
campaign = Campaign(searchspace, objective, recommender)
i = 0
result = 0
threshold = 0.05
while abs(refractive_index_target - result) > threshold:
    df = campaign.recommend(batch_size=1)
    print('New recommendation:')
    print(df)
    print('Start testing recommendation...')
    archive = get_samples(df)
    result = archive.refractive_index
    print(f'Testing finished, refractive_index: {result}')
    df['refractive_index'] = [result]
    campaign.add_measurements(df)
print('Optimization finished!')

# At the end of the run, lets store the whole optimization run into an entry
archive = EntryArchive()
bopt = BayesianOptimization.from_baybe(campaign)
archive.data = bopt
bopt.normalize(archive, None)
with open('example.archive.json', 'w') as fout:
    json.dump(archive.m_to_dict(), fout, indent=2)

New recommendation:
         substrate  gas_flow_rate  temperature
0  Gallium nitride       0.388846    466.73735
Start testing recommendation...
Testing finished, refractive_index: 1.597450224449405
Optimization finished!
