# Self-Driving Lab Demo: Light Mixing

[Pico W microcontroller setup](https://projects.raspberrypi.org/en/projects/get-started-pico-w/1)

[Youtube tutorial](https://www.youtube.com/watch?v=D54yfxRSY6s&t=557s)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sparks-baird/self-driving-lab-demo/blob/main/notebooks/4.2-paho-mqtt-colab-sdl-demo-test.ipynb)

In [None]:
!pip3 install self-driving-lab-demo

Next, we'll import the `SelfDrivingLabDemo` class and
instantiate it. We'll pass an observation function compatible with MQTT (i.e. the interface
that makes this demo cloud-accessible), the pico ID (`pico_id`), and a session ID (`session_id`).

The pico ID is a unique identifer for the microcontroller (e.g. `a123b456`), but is hardcoded to the
value of `'test'` for both the remotely accessible SDL-Demo and this notebook (i.e. just
for this public demonstration). If you run into problems with using the physical test demo, you can also check the box for `dummy` (i.e., set `dummy=True`), which will run a very basic simulation in place of running a physical experiment on the hardware. Normally, the results are logged to a database, but to keep the time per iteration smaller, we skip that in this tutorial. To turn this back on, check the `log_to_database` variable (i.e., set `log_to_database=True`).

To make sure that experiments that are requested at the
same time don't get mixed up, an experiment ID (`experiment_id`) is generated
internally for each experiment. We can also pass a session ID (`session_id`) to make it easier to
distinguish experiments from multiple sessions, though in-depth treatment of database management is planned for a separate tutorial (TBD).



<!-- Calling it `client_id` could be confusing, since this has a separate meaning in the MQTT style of things. Maybe I should call this something else, like `client_key`, or `username`. -->

In [3]:
from uuid import uuid4
import random
from self_driving_lab_demo import (
    SelfDrivingLabDemoLight,
    mqtt_observe_sensor_data,
    get_paho_client,
)

PICO_ID = 'e6616407e352322c' # Change to 'test' for remote device
dummy = False
log_to_database = False 
SESSION_ID = str(uuid4())
print(f'session ID: {SESSION_ID}')

R_target = 60 
G_target = 15 
B_target = 20 

target_inputs = {'R': R_target, 'G': G_target, 'B': B_target}

# Instantiate client once and reuse (to avoid opening too many connections)
client = get_paho_client(f'sdl-demo/picow/{PICO_ID}/as7341/')

sdl = SelfDrivingLabDemoLight(
    autoload=True,  # perform target data experiment automatically
    target_inputs=target_inputs, # if None, then defaults to random color using `target_seed` attribute
    simulation=dummy, # run simulation instead of physical experiment
    observe_sensor_data_fn=mqtt_observe_sensor_data,  # (default)
    observe_sensor_data_kwargs=dict(
        pico_id=PICO_ID, session_id=SESSION_ID, client=client, mongodb=log_to_database,
    ),
    target_seed=random.randint(0, 100000)
)

session ID: 7034207d-0bcc-4c09-bbb3-1f91afedaef3


Next, we'll observe the sensor data for the following red/green/blue (RGB) values. The microcontroller will briefly turn the LED green and collect the data from the spectrophotometer, then publish this data to the HiveMQ MQTT server, the go-between for the microcontroller and this notebook.

In [4]:
sdl.observe_sensor_data({'R': 0, 'G': 55, 'B': 0})

{'utc_time_str': '2024-3-31 20:13:22',
 'utc_timestamp': 1711916002,
 'ch470': 6845,
 'ch550': 1453,
 'ch670': 754,
 'ch410': 293,
 'logged_to_mongodb': False,
 'background': {'ch583': 300,
  'ch670': 539,
  'ch510': 493,
  'ch410': 119,
  'ch620': 392,
  'ch470': 2363,
  'ch550': 490,
  'ch440': 549},
 'ch620': 611,
 'sd_card_ready': True,
 'ch510': 9474,
 'ch583': 498,
 'device_nickname': 'For MongoDB, enter whatever name you want here (optional)',
 'ch440': 633,
 'onboard_temperature_K': 303.9395,
 'encrypted_device_id_truncated': 'test'}

## 'Hello, World!' of Optimization

Now, let's do the 'Hello, World!' of optimization tasks and compare grid search vs.
random search vs. Bayesian optimization. If you don't know what those are, see [this
Towards Data Science
Post](https://towardsdatascience.com/grid-search-vs-random-search-vs-bayesian-optimization-2e68f57c3c46).
This is the artificial intelligence (though grid and random are 'uninformed' methods) that suggests what experiment to run next. We will use the predefined search space shown below with the RGB values are capped to 35% power since 100% power can be painful to look directly at for a Neopixel LED, but you can still manually send commands that use RGB values up to 255. Note that `atime`, `astep`, and `gain` (sensor parameters) are fixed for the following search campaigns.

In [7]:
sdl.bounds

{'R': [0, 89],
 'G': [0, 89],
 'B': [0, 89],
 'atime': [0, 255],
 'astep': [0, 65534],
 'gain': [0.5, 512]}

### Run Search Algorithms

Next, we'll use some convenience functions to run each of the searches. The following cell may take approximately 20 minutes to run. The `ax_bayesian_optimization` function uses the [Ax Platform](https://ax.dev/) to run Bayesian optimization.

In [None]:
from self_driving_lab_demo.utils.search import (
    grid_search,
    random_search,
    ax_bayesian_optimization,
)

num_iter = 27

grid, grid_data = grid_search(sdl, num_iter)
random_inputs, random_data = random_search(sdl, num_iter)
best_parameters, values, experiment, model = ax_bayesian_optimization(sdl, num_iter)

## Results

### Best error so far vs. iteration

Let's compare how each optimization algorithm did as a function of the number of iterations. The faster the error goes down, the better.

In [50]:
import plotly.express as px
import pandas as pd
grid_input_df = pd.DataFrame(grid)
grid_output_df = pd.DataFrame(grid_data)[['frechet']]
grid_df = pd.concat([grid_input_df, grid_output_df], axis=1)
grid_df['best_so_far'] = grid_df['frechet'].cummin()

random_input_df = pd.DataFrame(random_inputs, columns=['R', 'G', 'B'])
random_output_df = pd.DataFrame(random_data)[['frechet']]
random_df = pd.concat([random_input_df, random_output_df], axis=1)
random_df['best_so_far'] = random_df['frechet'].cummin()

trials = list(experiment.trials.values())
bayes_input_df = pd.DataFrame([t.arm.parameters for t in trials])
bayes_output_df = pd.Series([t.objective_mean for t in trials], name='frechet').to_frame()
bayes_df = pd.concat([bayes_input_df, bayes_output_df], axis=1)
bayes_df['best_so_far'] = bayes_df['frechet'].cummin()

grid_df['type'] = 'grid'
random_df['type'] = 'random'
bayes_df['type'] = 'bayesian'
df = pd.concat([grid_df, random_df, bayes_df], axis=0)
px.line(df, x=df.index, y='best_so_far', color='type').update_layout(
    xaxis_title='iteration',
    yaxis_title='Best error so far',
)

### Observed Points and Corresponding Errors

Let's take a look at the points that were observed for each of the search algorithms. The axes correspond to red (R), green (G), and blue (B) input values, and the color corresponds to 'Fréchet distance' (pronounced like freh-shay). Fréchet distance is a measure of how close the measured spectrum is to the target and
should be considered simply as an error metric for this demo. Lower Fréchet distance is better,
and zero Fréchet distance is perfect.

In [42]:
px.scatter_3d(grid_df, x='R', y='G', z='B', color='frechet', title='grid')

In [43]:
px.scatter_3d(random_df, x='R', y='G', z='B', color='frechet', title='random')

In [45]:
px.scatter_3d(bayes_df, x='R', y='G', z='B', color='frechet', title='Bayesian')

### Accuracy

Finally, we can take a look at how close the best experiments from each algorithm compare to the true target inputs. You may need to rotate the image to get a better view.

In [46]:
target_inputs = sdl.get_target_inputs()
true_inputs = pd.DataFrame(
    {key: target_inputs[key] for key in target_inputs}, index=[0]
)
true_inputs['type'] = 'true'
best_grid_inputs = grid_df.iloc[grid_df['frechet'].idxmin()][['R', 'G', 'B', 'type']]
best_random_inputs = random_df.iloc[random_df['frechet'].idxmin()][
    ['R', 'G', 'B', 'type']
]
best_bayes_inputs = bayes_df.iloc[bayes_df['frechet'].idxmin()][['R', 'G', 'B', 'type']]

best_df = pd.concat([best_grid_inputs, best_random_inputs, best_bayes_inputs], axis=1).T
best_df['marker'] = 'observed'
true_inputs['marker'] = 'target'
best_df = pd.concat([best_df, true_inputs], axis=0)
bnds = sdl.bounds
fig = px.scatter_3d(
    best_df, x='R', y='G', z='B', color='type', symbol='marker', title='best'
).update_layout(
    scene=dict(
        xaxis=dict(
            nticks=4,
            range=[bnds['R'][0], bnds['R'][1]],
        ),
        yaxis=dict(
            nticks=4,
            range=[bnds['G'][0], bnds['G'][1]],
        ),
        zaxis=dict(
            nticks=4,
            range=[bnds['B'][0], bnds['B'][1]],
        ),
    ),
)
fig.update_traces(marker={'opacity': 0.75})
fig.data[-1].marker.symbol = 'diamond-open'
fig