# Two-Interface Nanoparticle Discovery with updated BO agent

**Notebook last update: 3/27/2021** (clean up)

This notebook contains the two-interface NP discovery with BO through SPBCL synthesis, STEM-EDS characterization, as reported in Wahl et al. *to be submitted* 2021. This is the last exploratory acquisition step in the manuscript and used all date collected upto this point.

In [1]:
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
import numpy as np
import os
import itertools
import io
from matminer.featurizers.composition import ElementProperty
from pymatgen import Composition

from nanoparticle_project import EmbedCompGPUCB, get_comps, compare_to_seed

path = os.getcwd()

Let's load the NP data that includes those collected in prior BO steps in this study:

In [2]:
df = pd.read_pickle('megalibray_updated_dec15-2020.pickle')
df = df.sample(frac=1, random_state=42) # shuffling the dataframe
df['target'] = -1*np.abs(df["Interfaces"]-2) # target is two interfaces
df = df[~df.duplicated()]
df

Unnamed: 0,Au%,Ag%,Cu%,Co%,Ni%,Pt%,Pd%,Sn%,Phases,Interfaces,Composition,n_elems,target
mirkin_r4_4,0.14,0.00,0.15,0.44,0.18,0.00,0.09,0.00,2,1,"(Co, Ni, Cu, Pd, Au)",5,-1
mirkin_r4_16,0.54,0.10,0.06,0.08,0.23,0.00,0.00,0.00,2,1,"(Co, Ni, Cu, Ag, Au)",5,-1
108,0.27,0.00,0.00,0.34,0.00,0.00,0.27,0.12,3,2,"(Au, Co, Pd, Sn)",4,0
mirkin_r2_12,0.31,0.00,0.15,0.43,0.10,0.00,0.00,0.00,2,1,"(Ni, Cu, Au, Co)",4,-1
41,0.32,0.19,0.27,0.00,0.22,0.00,0.00,0.00,3,2,"(Au, Ag, Cu, Ni)",4,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
117,0.00,0.00,0.00,0.50,0.00,0.00,0.32,0.18,2,1,"(Co, Pd, Sn)",3,-1
166,0.54,0.00,0.00,0.00,0.00,0.46,0.00,0.00,2,1,"(Au, Pt)",2,-1
98,0.33,0.00,0.00,0.34,0.00,0.00,0.18,0.15,3,3,"(Au, Co, Pd, Sn)",4,-1
mirkin_r3_0,0.08,0.06,0.02,0.30,0.55,0.00,0.00,0.00,2,1,"(Ni, Co, Ag, Au, Cu)",5,-1


Let's also create our featurized composition space:

In [3]:
ep = ElementProperty.from_preset(preset_name='magpie')
featurized_df = ep.featurize_dataframe(df[ ['Composition','target'] ],'Composition').drop('Composition',axis=1)

ElementProperty:   0%|          | 0/215 [00:00<?, ?it/s]

Create the search space (we will load and featurize it):

In [4]:
elements = ['Au%', 'Ag%', 'Cu%', 'Co%', 'Ni%', 'Pd%', 'Sn%'] # We'll make Pt-free acquisitions
candidate_data = pd.read_pickle('megalibray_updated_candidate_data_dec15-2020.pickle')
candidate_data['Composition'] = candidate_data.apply(get_comps,axis=1)
candidate_feats = ep.featurize_dataframe(candidate_data, 'Composition')
candidate_feats = candidate_feats.drop(elements+['Pt%']+['Composition'],axis=1)
candidate_data = candidate_data.drop(['Composition'],axis=1)

ElementProperty:   0%|          | 0/7581 [00:00<?, ?it/s]

Now we will partition our search to ternary, quaternary and pentanary NPs, and make suggestions in each space for two interface particles using the BO agent. The research team will then select NP compositions that are of interest out of the top 10 suggestions we make for each:

In [5]:
seed_data = featurized_df
ternaries = candidate_data[ ((candidate_data != 0).sum(axis=1) == 3)]
ternary_feats = candidate_feats.loc[ternaries.index]
quaternaries = candidate_data[ ((candidate_data != 0).sum(axis=1) ==4)]
quaternary_feats = candidate_feats.loc[quaternaries.index]
pentanaries = candidate_data[ ((candidate_data != 0).sum(axis=1) == 5)]
pentanary_feats = candidate_feats.loc[pentanaries.index]

In [6]:
# ternaries
agent = EmbedCompGPUCB(n_query=10)
suggestions_ternary = agent.get_hypotheses(candidate_data=ternary_feats, seed_data=seed_data)
display(ternaries.loc[ suggestions_ternary.index])
compare_to_seed(ternaries.loc[ suggestions_ternary.index ], df)

- beta**0.5:0:  0.32091850005741746
- beta**0.5:1:  0.3209907903268685
- beta**0.5:2:  0.32106273052665363
- beta**0.5:3:  0.3211343239607329
- beta**0.5:4:  0.32120557388686616
- beta**0.5:5:  0.3212764835174676
- beta**0.5:6:  0.3213470560204396
- beta**0.5:7:  0.3214172945199882
- beta**0.5:8:  0.32148720209741977
- beta**0.5:9:  0.32155678179191993


Unnamed: 0,Au%,Ag%,Cu%,Co%,Ni%,Pt%,Pd%,Sn%
2458,0.0,0.3,0.5,0.2,0.0,0,0.0,0.0
2145,0.0,0.2,0.6,0.2,0.0,0,0.0,0.0
1656,0.0,0.1,0.6,0.3,0.0,0,0.0,0.0
2115,0.0,0.2,0.4,0.4,0.0,0,0.0,0.0
2431,0.0,0.3,0.3,0.4,0.0,0,0.0,0.0
2400,0.0,0.3,0.2,0.5,0.0,0,0.0,0.0
2084,0.0,0.2,0.3,0.5,0.0,0,0.0,0.0
4584,0.1,0.3,0.6,0.0,0.0,0,0.0,0.0
2598,0.0,0.4,0.1,0.5,0.0,0,0.0,0.0
5977,0.2,0.4,0.0,0.4,0.0,0,0.0,0.0


           Au%   Ag%   Cu%   Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.30  0.50  0.20  0.0  0.0  0.0  0.0     NaN
inseed     0.0  0.39  0.45  0.16  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%  Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.20  0.60  0.2  0.0  0.0  0.0  0.0     NaN
inseed     0.0  0.23  0.47  0.3  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%  Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.10  0.60  0.3  0.0  0.0  0.0  0.0     NaN
inseed     0.0  0.23  0.47  0.3  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%   Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.20  0.40  0.40  0.0  0.0  0.0  0.0     NaN
inseed     0.0  0.14  0.45  0.41  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%   Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.30  0.30  0.40  0.0  0.0  0.0  0.0     NaN
inseed     0.0  0.33  0.33  0.33  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%   Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.30  0.20  0.

In [7]:
# quaternaries
agent = EmbedCompGPUCB(n_query=10)
suggestions_quaternary = agent.get_hypotheses(candidate_data=quaternary_feats, seed_data=seed_data)
display(quaternaries.loc[ suggestions_quaternary.index])
compare_to_seed(quaternaries.loc[ suggestions_quaternary.index ], df)

- beta**0.5:0:  0.3275876537150563
- beta**0.5:1:  0.32765847259318764
- beta**0.5:2:  0.32772894916479933
- beta**0.5:3:  0.3277990866569795
- beta**0.5:4:  0.32786888825172794
- beta**0.5:5:  0.32793835708678953
- beta**0.5:6:  0.3280074962564675
- beta**0.5:7:  0.32807630881241917
- beta**0.5:8:  0.3281447977644324
- beta**0.5:9:  0.3282129660811855


Unnamed: 0,Au%,Ag%,Cu%,Co%,Ni%,Pt%,Pd%,Sn%
2134,0.0,0.2,0.5,0.2,0.1,0,0.0,0.0
2113,0.0,0.2,0.4,0.2,0.2,0,0.0,0.0
2446,0.0,0.3,0.4,0.1,0.2,0,0.0,0.0
2429,0.0,0.3,0.3,0.2,0.2,0,0.0,0.0
1638,0.0,0.1,0.5,0.3,0.1,0,0.0,0.0
2080,0.0,0.2,0.3,0.3,0.2,0,0.0,0.0
5805,0.2,0.2,0.3,0.3,0.0,0,0.0,0.0
2399,0.0,0.3,0.2,0.4,0.1,0,0.0,0.0
2390,0.0,0.3,0.2,0.2,0.3,0,0.0,0.0
2131,0.0,0.2,0.5,0.1,0.2,0,0.0,0.0


           Au%   Ag%   Cu%  Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.20  0.50  0.2  0.1  0.0  0.0  0.0     NaN
inseed     0.0  0.23  0.47  0.3  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%  Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.20  0.40  0.2  0.2  0.0  0.0  0.0     NaN
inseed     0.0  0.23  0.47  0.3  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%   Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.30  0.40  0.10  0.2  0.0  0.0  0.0     NaN
inseed     0.0  0.39  0.45  0.16  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%   Co%   Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.30  0.30  0.20  0.20  0.0  0.0  0.0     NaN
inseed     0.0  0.27  0.22  0.35  0.16  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%  Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.10  0.50  0.3  0.1  0.0  0.0  0.0     NaN
inseed     0.0  0.23  0.47  0.3  0.0  0.0  0.0  0.0     0.0
           Au%   Ag%   Cu%   Co%   Ni%  Pt%  Pd%  Sn%  target
suggested  0.0  0.20  0.30  0

In [8]:
# pentanaries
agent = EmbedCompGPUCB(n_query=10)
suggestions_pentanary = agent.get_hypotheses(candidate_data=pentanary_feats, seed_data=seed_data)
display(pentanaries.loc[ suggestions_pentanary.index])
compare_to_seed(pentanaries.loc[ suggestions_pentanary.index ], df)

- beta**0.5:0:  0.32696697701551375
- beta**0.5:1:  0.32703793029896167
- beta**0.5:2:  0.32710854056840444
- beta**0.5:3:  0.32717881105792557
- beta**0.5:4:  0.3272487449564198
- beta**0.5:5:  0.32731834540842725
- beta**0.5:6:  0.32738761551494955
- beta**0.5:7:  0.32745655833424636
- beta**0.5:8:  0.32752517688261473
- beta**0.5:9:  0.32759347413515016


Unnamed: 0,Au%,Ag%,Cu%,Co%,Ni%,Pt%,Pd%,Sn%
5803,0.2,0.2,0.3,0.1,0.2,0,0.0,0.0
5910,0.2,0.3,0.1,0.3,0.1,0,0.0,0.0
5901,0.2,0.3,0.1,0.1,0.3,0,0.0,0.0
6679,0.3,0.2,0.2,0.1,0.2,0,0.0,0.0
5605,0.2,0.1,0.4,0.1,0.2,0,0.0,0.0
730,0.0,0.0,0.3,0.2,0.1,0,0.2,0.2
5582,0.2,0.1,0.3,0.1,0.3,0,0.0,0.0
6682,0.3,0.2,0.2,0.2,0.1,0,0.0,0.0
5940,0.2,0.3,0.3,0.1,0.1,0,0.0,0.0
5588,0.2,0.1,0.3,0.2,0.2,0,0.0,0.0


            Au%   Ag%   Cu%   Co%   Ni%  Pt%  Pd%  Sn%  target
suggested  0.20  0.20  0.30  0.10  0.20  0.0  0.0  0.0     NaN
inseed     0.19  0.24  0.28  0.14  0.15  0.0  0.0  0.0     0.0
            Au%   Ag%   Cu%   Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.20  0.30  0.10  0.30  0.1  0.0  0.0  0.0     NaN
inseed     0.15  0.29  0.18  0.38  0.0  0.0  0.0  0.0     0.0
            Au%   Ag%  Cu%   Co%   Ni%  Pt%  Pd%  Sn%  target
suggested  0.20  0.30  0.1  0.10  0.30  0.0  0.0  0.0     NaN
inseed     0.26  0.22  0.0  0.16  0.36  0.0  0.0  0.0    -1.0
           Au%  Ag%  Cu%  Co%  Ni%  Pt%  Pd%  Sn%  target
suggested  0.3  0.2  0.2  0.1  0.2  0.0  0.0  0.0     NaN
inseed     0.2  0.2  0.2  0.2  0.2  0.0  0.0  0.0     0.0
            Au%  Ag%   Cu%   Co%   Ni%  Pt%  Pd%  Sn%  target
suggested  0.20  0.1  0.40  0.10  0.20  0.0  0.0  0.0     NaN
inseed     0.23  0.0  0.42  0.16  0.19  0.0  0.0  0.0    -1.0
           Au%  Ag%   Cu%   Co%   Ni%  Pt%  Pd%   Sn%  target
suggested  0.0  0