## Example for cystal_morphing_bayesian_vs_xrd_csv

In [4]:
ls

INPUT.csv           [0m[01;32mbo-opt-pdf-vs-exp-csv.py[0m*
INPUT_allpairs.csv  crystal_interpolation_bayesian_vs_exp_csv_pdf.ipynb
INPUT_limit.csv     [01;34mpdf_func[0m/
INPUT_seq.csv       [01;34mstruct[0m/


#### Prepare INPUT.csv
- **target** : insert path of target csv file including XRD (2theta, XRD intensity)
- **input_struct** : insert path of structures
- (optional : structure_selection_pair is "Limited" case) <br>
  **input_struct1** & **input_struct2** : insert path of structures seperated into two groups
<br>

- **num_init_data** : number of initial random data samples for Bayesian optimization
- **num_core** : number of data for validation for Bayesian optimization, if >1, do parallel
- **num_iter** : number of iteration number of Bayesian optimization
- **initial_step_boolean** : generate initial points for Bayesian (0%, 25%, 50%, 75%, 100%)
- **random_seed** : random seed for initial points (if initial_step_boolean exists, second step)
<br>
- **soap_max_steps** : maximum iteration of soap (default optimizer is steepest descent, if >15, it change to L-BFGS)
- **multi_species** : if chemical elements >=2, set True
<br>
- **structure_selection_pair** : selection of input structures for exploration path <br>
  True : All-pairs investgation, False : greedy algorithm, Limited : Limited-pairs investigation (see input_struct part)
<br>
- **xray_wavelength** : xray_wavelength : XRD source wavelength for pymatgen option
- **volume_list** : insert volume list for cossine similarity, such as 0.90,0.95, 1.00, 1.05, 1.10
- **xrd_sigma** : smearing factor (default : 0.5)
<br>
- **bool_gsas** : XRD is generated from GSAS-II (True). Otherwise, it is generated from pymatgen (False)
<br>


#### Prepare INST_XRD.PRM (necessary when bool_gsas is True)
  5th row includes the information of x-ray source wavelength. This is prior to info. of additional_INPUT.csv.

In [1]:
cat INPUT.csv

### structures ###
target,struct/Li5BiO5.csv
input_struct,struct/Li5BiO5_1.cif,struct/Li5BiO5_2.cif,struct/Li5BiO5_3.cif

### bayesian parameter ###
num_init_data,2
num_core,4
num_iter,4
initial_step_boolean,True
random_seed,99

### soap parameter ###
soap_max_steps,15
multi_species,True

### structure selection ###
structure_selection_pair,False

### xrd paprameter ####
xray_wavelength,CuKa
xrd_sigma,0.5
volume_list,0.80,0.82,0.85,0.87,0.90,0.92,0.94,0.96,0.97,0.98,0.99,1.00,1.01,1.02,1.03,1.04,1.06,1.08,1.10,1.15,1.17,1.20

### gsas use ###
bool_gsas,True


#### Run Bayesian

> python morphing_with_bo_vs_xrd_csv.py |tee log

### Example of structure_selection_pair,False : greedy algorithm

Choose two input structures with the highest cosine similarity of XRD. <br>
Then, morphing with bayesian is performed. <br>
Then, the newly generated intermediate structure is choosen as the another input structure, and the morphing with bayesian is performed.

In [3]:
!cp INPUT_greedy.csv INPUT.csv   ### prepare INPUT.csv
!python ./morphing_with_bo_vs_xrd_csv.py > log-greedy   ### run

#### Check result
Cosine similarity of the input structure can be simply checked. (for example, 79.1%, 76.8%, 73.8%) <br>
Results are at folder of output files. <br>
Cosine similarity of the output structures were improved. (for example, 87.4%, 89.5%)

In [8]:
!grep 'cs_list of input' log-greedy|head -1

-----cs_list of input structures :  [0.7914279417872684, 0.768256946541049, 0.7383764065271623]


In [11]:
!tail log-greedy

gpx file saved as /home/e1739/crystal_interpolation_bayesian_vs_exp_csv_cleaning_230614_Li5BiO5_ver/gsas-tmp/32736_sim.gpx
volume, volratio, cs :  156.65879583262577 1.2 0.299492853237875
---cs-matrix-max-and-vol--- 0.89539197631936 1.0
Found : cos_similarity = 0.89539197631936 1.0 at ratio =  [13.23333011]
--------
struct_path
['struct/Li5BiO5_1.cif', 'struct/Li5BiO5_2.cif', 'struct/Li5BiO5_3.cif', 'output/Li5BiO5_search/Li5BiO5_A_0_vol.cif', 'output/Li5BiO5_search/Li5BiO5_B_1_vol.cif']

===== Ending : output/Li5BiO5_search =====



In [14]:
!cat output/Li5BiO5_search/output_list.csv

A,output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_2_4944_0
B,output/Li5BiO5_search/Li5BiO5_A_0_vol_to_Li5BiO5_3_1323_1


In [15]:
!cat output/Li5BiO5_search/all_cossim_output.csv

output/Li5BiO5_search/Li5BiO5_A_0_vol.cif,0.8739169011832949
output/Li5BiO5_search/Li5BiO5_B_1_vol.cif,0.89539197631936


In [16]:
ls output/Li5BiO5_search/found_structure

Li5BiO5_A_0_vol.cif  Li5BiO5_B_1_vol.cif


In [17]:
!mv output output-greedy

- Check result

Results are at folder of output files


### Example of structure_selection_pair, True : all pairs investigation

Change INPUT.csv  file => "structure_selection_pair,True" using editor (INPUT_allpairs.csv is prepared.)

Among all the pairs of input structures, morphing with bayesian will be performed.

In [18]:
!cp INPUT_allpairs.csv INPUT.csv
!python ./morphing_with_bo_vs_xrd_csv.py > log-allpairs

In [19]:
!tail log-allpairs

gpx file saved as /home/e1739/crystal_interpolation_bayesian_vs_exp_csv_cleaning_230614_Li5BiO5_ver/gsas-tmp/5227_sim.gpx
volume, volratio, cs :  148.20574916758338 1.2 0.4499551854675649
---cs-matrix-max-and-vol--- 0.768256946541049 1.0
Found : cos_similarity = 0.768256946541049 1.0 at ratio =  [100.]
--------
struct_path
['struct/Li5BiO5_1.cif', 'struct/Li5BiO5_2.cif', 'struct/Li5BiO5_3.cif', 'output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_2_4944_0_vol.cif', 'output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_3_0000_1_vol.cif', 'output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_1_10000_2_vol.cif', 'output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_3_0640_3_vol.cif', 'output/Li5BiO5_search/Li5BiO5_3_to_Li5BiO5_1_9941_4_vol.cif', 'output/Li5BiO5_search/Li5BiO5_3_to_Li5BiO5_2_10000_5_vol.cif']

===== Ending : output/Li5BiO5_search =====



In [20]:
!cat output/Li5BiO5_search/all_cossim_output.csv

output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_2_4944_0_vol.cif,0.8739169011832949
output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_3_0000_1_vol.cif,0.7914279417872684
output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_1_10000_2_vol.cif,0.7914279417872684
output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_3_0640_3_vol.cif,0.770830595696563
output/Li5BiO5_search/Li5BiO5_3_to_Li5BiO5_1_9941_4_vol.cif,0.820871445193972
output/Li5BiO5_search/Li5BiO5_3_to_Li5BiO5_2_10000_5_vol.cif,0.768256946541049


In [21]:
!ls output/Li5BiO5_search/found_structure

Li5BiO5_1_to_Li5BiO5_2_4944_0_vol.cif	Li5BiO5_2_to_Li5BiO5_3_0640_3_vol.cif
Li5BiO5_1_to_Li5BiO5_3_0000_1_vol.cif	Li5BiO5_3_to_Li5BiO5_1_9941_4_vol.cif
Li5BiO5_2_to_Li5BiO5_1_10000_2_vol.cif	Li5BiO5_3_to_Li5BiO5_2_10000_5_vol.cif


In [22]:
!mv output output-allpairs

- Check result

Results are at folder of output files

### Example of structure_selection_pair, Limited : Limited pairs

Change INPUT.csv  file => "structure_selection_pair,limited" using editor <br>
In addition, input_struct => input_struct1 & input_struct2  (INPUT_limit.csv is prepared.)

Among selected pairs betwen input1 and input2 structures, morphing with bayesian will be performed.

In [24]:
!cp INPUT_limitedpairs.csv INPUT.csv
!python ./morphing_with_bo_vs_xrd_csv.py > log-limited-pairs

In [25]:
!tail log-limited-pairs

gpx file saved as /home/e1739/crystal_interpolation_bayesian_vs_exp_csv_cleaning_230614_Li5BiO5_ver/gsas-tmp/19801_sim.gpx
volume, volratio, cs :  148.20574916758338 1.2 0.4499551854675649
---cs-matrix-max-and-vol--- 0.768256946541049 1.0
Found : cos_similarity = 0.768256946541049 1.0 at ratio =  [100.]
--------
struct_path
['struct/Li5BiO5_2.cif', 'struct/Li5BiO5_1.cif', 'struct/Li5BiO5_3.cif', 'output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_1_10000_0_vol.cif', 'output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_2_4944_1_vol.cif', 'output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_3_0640_2_vol.cif', 'output/Li5BiO5_search/Li5BiO5_3_to_Li5BiO5_2_10000_3_vol.cif']

===== Ending : output/Li5BiO5_search =====



In [26]:
!cat output/Li5BiO5_search/all_cossim_output.csv

output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_1_10000_0_vol.cif,0.7914279417872684
output/Li5BiO5_search/Li5BiO5_1_to_Li5BiO5_2_4944_1_vol.cif,0.8739169011832949
output/Li5BiO5_search/Li5BiO5_2_to_Li5BiO5_3_0640_2_vol.cif,0.770830595696563
output/Li5BiO5_search/Li5BiO5_3_to_Li5BiO5_2_10000_3_vol.cif,0.768256946541049


In [27]:
!ls output/Li5BiO5_search/found_structure

Li5BiO5_1_to_Li5BiO5_2_4944_1_vol.cif	Li5BiO5_2_to_Li5BiO5_3_0640_2_vol.cif
Li5BiO5_2_to_Li5BiO5_1_10000_0_vol.cif	Li5BiO5_3_to_Li5BiO5_2_10000_3_vol.cif


In [28]:
!mv output output-limited-pairs

- Check result

Results are at folder of output files