## Evolutionary algorithm (USPEX) for target XRD csv file (without DFT)

In [1]:
!ls

INPUT.txt	    additional_INPUT.csv
INPUT_with_DFT.txt  evolutionary_algorithm_vs_target_xrd_csv_with_DFT.ipynb
INST_XRY.PRM	    evolutionary_algorithm_vs_target_xrd_csv_without_DFT.ipynb
Seeds		    ux-test.sh
Specific


### Check INPUT.txt

INPUT.txt includes various input parameters for USPEX code.<br>

For the detail, read original USPEX manual. <br>

- **optType** : optmization function. <br>
  negative sign : maximization, positive sign : minimization <br>
  -104 : optimizing (maximizing) cos-similarity of XRD similarity vs. target XRD from cif (./Specific/xrd_target.cif) <br> 
  -105 : optimizing (maximizing) cos-similarity of XRD similarity vs. target XRD csv (./Specific/xrd_target.csv) <br>
  1 : enthalpy, ... (see USPEX manual)
<br>
- **atomType**
  Element1 Element2 ... Element-N
  ex) Mg O
<br>
- **numSpecies**
  #ofElement1 #ofElement2 ... #ofElement-N
  ex) 4 4
<br>
- **populationSize** : # of structures in one generation
<br>
- **initialPopSize** : # of structures in 1st generation
<br>
- **numGenerations** : # of generations
<br>
- **stopCrit** : Stop crieterion for # of generations when the best fitness is not changed.
<br>
- **fracGene** : fraction for heredity
<br>
- **fracRand** : fraction of random consruction
<br>
- **fracAtomsMut** : fraction of atom mutation (without DFT case, this must be 0 (loop error). Only turn on for with DFT case.)
<br>
- **fracPerm** : fraction of permutation <br>
  Total of above genetic operation is 1.00.
<br>
- **bestFrac** : choosing best fraction for using next generation
<br>
- **IonDistances** : minimum interatomic distance matrix
<br>
- **symmetries** : space group for random structures
<br>
- **abinitioCode** : set 0 <br>
  0 : no DFT, 1 : VASP
<br>
- **whichCluster** : set 0
  0 : local PC, 2 : supercomputer (remote)

### Prepare additional_INPUT.csv

- **xray_wavelength** : XRD source wavelength for pymatgen option.
- **xrd_sigma** : smearing factor (default : 0.5)
- **bool_gsas** : XRD is generated from GSAS-II (True). Otherwise, it is generated from pymatgen (False)
- **volume_list** : insert volume list for cosine similarity, such as 0.90, 0.95, 1.00, 1.05, 1.10

#### INST_XRD.PRM (necessary when bool_gsas is True)

### Check Specific folder

In Specific folder, xrd target csv file is located. (xrd_target.csv)

### Check Seeds folder

In Seeds folder, seed file (structure for inserting as good candidate) is located with name of POSCARS_N (inserted in N-th cycle).
- ex) POSCARS_3 : This means the seed structure will be inserted at the 3rd generation cycle.


In [2]:
!cat Seeds/POSCARS_3

Seed
1.0
5.630000 0.000000 0.000000
0.000000 5.630000 0.000000
0.000000 0.000000 5.630000
Na Cl
4 4
direct
0.000000 0.000000 0.000000 
0.000000 0.500000 0.500000 
0.500000 0.000000 0.500000 
0.500000 0.500000 0.000000 
0.000000 0.500000 0.000000
0.000000 0.000000 0.500000 
0.500000 0.500000 0.500000 
0.500000 0.000000 0.000000 


In [3]:
!cat Specific/xrd_target.csv |head -20

1.000000000000000021e-02,1.000000000000000000e+00
2.000000000000000042e-02,1.000000000000000000e+00
2.999999999999999889e-02,1.000000000000000000e+00
4.000000000000000083e-02,1.000000000000000000e+00
5.000000000000000278e-02,1.000000000000000000e+00
6.000000000000000472e-02,1.000000000000000000e+00
6.999999999999999278e-02,1.000000000000000000e+00
8.000000000000000167e-02,1.000000000000000000e+00
8.999999999999999667e-02,1.000000000000000000e+00
9.999999999999999167e-02,1.000000000000000000e+00
1.100000000000000006e-01,1.000000000000000000e+00
1.199999999999999956e-01,1.000000000000000000e+00
1.300000000000000044e-01,1.000000000000000000e+00
1.400000000000000133e-01,1.000000000000000000e+00
1.500000000000000222e-01,1.000000000000000000e+00
1.600000000000000033e-01,1.000000000000000000e+00
1.700000000000000122e-01,1.000000000000000000e+00
1.800000000000000211e-01,1.000000000000000000e+00
1.900000000000000022e-01,1.000000000000000000e+00
2.000000000000000111e-01,1.0000


### Run USPEX

Execute USPEX.

> ./ux-test.sh &

It runs until USPEX_IS_DONE file is generated.


In [1]:
%%bash
./ux-test.sh &

### Check result.

If USPEX_IS_DONE file is generated, USPEX is finished well.


### Check results_N folder

At ./results_N folder, result files are located.

- BESTIndividuals file in results_N folder : It shows the best structure for each generation.
- goodStructures file in results_N folder : It shows fitness in order.
- ./CalcFoldTemp-allraw folder : raw result files and all structures are located.

### Check rawdata

Rawdata is generated/located at ./CalcFoldTemp-allraw
The best structure can be extracted from CalcFoldTemp-allraw folder.

In [2]:
!ls ./results1/

AuxiliaryFiles		   Parameters.txt	    gatheredPOSCARS_unrelaxed
BESTIndividuals		   Properties		    generation1
BESTgatheredPOSCARS	   Seeds_history	    generation2
BESTgatheredPOSCARS_order  USPEX.mat		    generation3
Individuals		   USPEX.mat.backup	    goodStructures
OUTPUT.txt		   compositionStatistic     goodStructures_POSCARS
POOL.mat		   enthalpies_complete.dat  origin
POOL.mat.backup		   gatheredPOSCARS	    symmetrized_structures.cif
POSCAR			   gatheredPOSCARS_order


In [3]:
!cat ./results1/BESTIndividuals

Gen   ID    Origin   Composition    Enthalpy   Volume  Density   Fitness   KPOINTS  SYMM  Q_entr A_order S_order
                                      (eV)     (A^3)  (g/cm^3)
  1    1   Random    [     4  4  ] 100000.000   231.446   1.677     -0.712 [ 1  1  1] 186  0.148  3.623  3.122
  2   11 keptBest    [     4  4  ] 100000.000   231.446   1.677     -0.712 [ 1  1  1] 186  0.148  3.623  3.122
  3   18   Seeds     [     4  4  ] 100000.000   178.454   2.175     -1.000 [ 1  1  1] 225 -0.000  6.275  6.278


In [4]:
!cat ./CalcFoldTemp-allraw/CONTCAR-volchanged-18.cif

# generated using pymatgen
data_NaCl
_symmetry_space_group_name_H-M   'P 1'
_cell_length_a   5.63000000
_cell_length_b   5.63000000
_cell_length_c   5.63000000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.00000000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   1
_chemical_formula_structural   NaCl
_chemical_formula_sum   'Na4 Cl4'
_cell_volume   178.45354700
_cell_formula_units_Z   4
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Na  Na0  1  0.00000000  0.00000000  0.00000000  1.0
  Na  Na1  1  0.00000000  0.50000000  0.50000000  1.0
  Na  Na2  1  0.50000000  0.00000000  0.50000000  1.0
  Na  Na3  1  0.50000000  0.50000000  0.00000000  1.0
  Cl  Cl4  1  0.00000000  0.50000000  0.00000000  1.0
  Cl  Cl5  1  0.00000000  0.00000000  0.500000

Note that seed file with cosine similarity of XRD of 100% was prepared for this test calculation for the quick test. 