# `SnB` Python API

## `ShakeNBreak` applied to Te interstitials in CdTe ($Te_{i}$)

In this notebook we show how to generate interstitial defects using `Doped`, and apply `ShakeNBreak` (`SnB`) to them. To see the full `ShakeNBreak` workflow, have a look at the notebook `ShakeNBreak_Example_Workflow.ipynb` (recommended to view it in the [SnB Python API tutorial](https://shakenbreak.readthedocs.io/en/latest/ShakeNBreak_Example_Workflow.html) docs page).

## Table of contents
* [Generate defects with doped/pymatgen](#generate)
* [Apply SnB to defects](#SnB)
* [Send to HPCs and run calculations](#HPCs)

In [2]:
import os
import sys

import ase
import numpy as np
import pymatgen
from importlib_metadata import version

import shakenbreak

# Check versions
print("Pymatgen version:", version('pymatgen') )
print("Pymatgen-analysis-defects version:", version('pymatgen-analysis-defects') )
print("Ase version:", version('ase') )
print("ShakeNBreak version:", version('shakenbreak') )

Pymatgen version: 2022.11.7
Pymatgen-analysis-defects version: 2022.11.21
Ase version: 3.22.1
ShakeNBreak version: 22.11.19


#### Rationale for `SnB`   

Defect distortions often follow the change in electron count when introducing that defect to the system. For the neutral Cd vacancy ($V_{Cd}^0$) for example, the removal of Cd and its two valence electrons means that local distortions are likely to involve two neighbouring Te atoms moving closer/further apart to accommodate the broken bonds. For the singly-charged vacancy, we are likely to have just one neighbouring Te moving, etc. This isn't always the case, but typically points us in the right direction to search the PES, and has been confirmed to yield the best performance (see SI of _Identifying the ground state structures of point defects in solids_ Mosquera-Lois, Kavanagh, Walsh and Scanlon 2022).

So, the `SnB` method involves distorting the initial bond lengths around the defect for a mesh of trial distortions, with the number of neighbours to distort dictated by the change in valence electron count, performing coarse $\Gamma$-only (`vasp_gam`) relaxations and then comparing the final energies, to see if we identify any lower energy defect structures.

<a id='generate'></a>

## 1. Generate defects with `doped/PyCDT`

If you prefer to use `Doped`/`PyCDT` to generate defects, you should do it in a different `Python` environment to the `ShakeNBreak` one, as currently (18/10/22) they require different `pymatgen` versions. 

The easiest way is to generate the defects in a different notebook (using your `doped/PyCDT` environment). 
To do this, you can copy the cell below to another notebook. Alternatively, you can do everything in one notebook and just switch environments after geenrating the defects (saving them to a pickle file before switching!).

In [1]:
# To generate the CdTe vacancies with doped, we can use the lines below.
# As this requires pymatgen < 2022.8.23, we need to do it in a different `Python` environment.

from pymatgen.core.structure import Structure
from doped.pycdt.core.defectsmaker import ChargedDefectsStructures

bulk_supercell = Structure.from_file("../tests/data/vasp/CdTe/CdTe_Bulk_Supercell_POSCAR")

def_structs = ChargedDefectsStructures(
    bulk_supercell,
    cellmax=bulk_supercell.num_sites,
    antisites_flag=False,  # Don't include antisites
    include_interstitials=True,
)

  from tqdm.autonotebook import tqdm


Setting up vacancies
Searching for Voronoi interstitial sites (this can take a while)

Number of jobs created:
    bulk = 1
    vacancies:
        vac_1_Cd = 5 with site multiplicity 32
        vac_2_Te = 5 with site multiplicity 32
    substitutions:
    interstitials:
        Int_Cd_1 = 3 with site multiplicity 32
        Int_Cd_2 = 3 with site multiplicity 128
        Int_Cd_3 = 3 with site multiplicity 1
        Int_Te_1 = 9 with site multiplicity 1
        Int_Te_2 = 9 with site multiplicity 128
        Int_Te_3 = 9 with site multiplicity 32
Total (non dielectric) jobs created = 47



In [2]:
# Run this cell to see the ChargedDefectsStructures output dictionary keys
[key for key in def_structs.defects.keys()]

['bulk', 'vacancies', 'substitutions', 'interstitials']

In [3]:
print("Interstitials generated:", [entry["name"] for entry in def_structs.defects["interstitials"]])

Interstitials generated: ['Int_Cd_1', 'Int_Cd_2', 'Int_Cd_3', 'Int_Te_1', 'Int_Te_2', 'Int_Te_3']


In [4]:
# Look at the PyCDT-proposed likely defect charge states, and think if you want to change it
for val in def_structs.defects["interstitials"]:
    print(f"Defect: {val['name']}")
    print(f"PyCDT-proposed defect charge states: {val['charges']}")
    print("Happy with this? If not look at the next code block \n")

Defect: Int_Cd_1
PyCDT-proposed defect charge states: [0, 1, 2]
Happy with this? If not look at the next code block 

Defect: Int_Cd_2
PyCDT-proposed defect charge states: [0, 1, 2]
Happy with this? If not look at the next code block 

Defect: Int_Cd_3
PyCDT-proposed defect charge states: [0, 1, 2]
Happy with this? If not look at the next code block 

Defect: Int_Te_1
PyCDT-proposed defect charge states: [-2, -1, 0, 1, 2, 3, 4, 5, 6]
Happy with this? If not look at the next code block 

Defect: Int_Te_2
PyCDT-proposed defect charge states: [-2, -1, 0, 1, 2, 3, 4, 5, 6]
Happy with this? If not look at the next code block 

Defect: Int_Te_3
PyCDT-proposed defect charge states: [-2, -1, 0, 1, 2, 3, 4, 5, 6]
Happy with this? If not look at the next code block 



Now, we recommend to do a Gamma point relaxation for the neutral state of these interstitial candidates and select the ones lower in energy.
For demonstration purposes, we'll focus on `Int_Te_3`, which has been found to be the lowest energy Te interstitial.

In [5]:
# only using these species for example purpose:
Te_i_dict = {"interstitials": [def_structs.defects["interstitials"][-1]], "bulk": def_structs.defects["bulk"]} # We need the bulk entry for later
Te_i_dict["interstitials"][0]["charges"] = [0,-1,-2]

In [6]:
import pickle

# Save the defects dictionary to pickle
file = "../tests/data/vasp/CdTe/doped_Te_i_dict.pickle"
with open(file, "wb") as f:
    pickle.dump(Te_i_dict, f)
print(f"Saved doped defects dict to {file}")

Saved doped defects dict to ../tests/data/vasp/CdTe/doped_Te_i_dict.pickle


Now we switch our virtual environment to the `ShakeNBreak` one.

In [1]:
# After generating the Doped defects dict using a different environment and saving it to pickle,
# we load it in the ShakeNBreak environment
import pickle

file = "../tests/data/vasp/CdTe/doped_Te_i_dict.pickle"  # Path of the pickle file where we saved the Doped dictionary
with open(file, "rb") as f:
    Te_i_dict = pickle.load(f)

In [2]:
# Check Doped defects dict was loaded ok
for single_defect_dict in Te_i_dict["interstitials"]:
    print(single_defect_dict["name"])

# Check bulk entry is present
print("Keys of bulk entry:", Te_i_dict["bulk"].keys())

Int_Te_3
Keys of bulk entry: dict_keys(['name', 'supercell'])


<a id='SnB'></a>

## 2. Apply the `SnB` method to your defects

The default settings and parameter choices in this package have been tested and have performed best thus far (i.e. wider distortion ranges leading to the ground-state structure with lowest computational cost) – see SI of _Identifying the ground state structures of point defects in solids_ Mosquera-Lois, Kavanagh, Walsh and Scanlon 2022.

If you encounter improved performance with non-default parameter choices, we'd love to know! Please get in touch via GitHub or by email: sean.kavanagh.19@ucl.ac.uk & i.mosquera-lois22@imperial.ac.uk

If you are investigating defects in hard/ionic/magnetic/correlated materials, or systems involving spectator ions (like A in ABX$_3$), there are some extra considerations for boosting the performance & efficiency of `SnB` listed on the [Miscellaneous Tips & Tricks](https://shakenbreak.readthedocs.io/en/latest/Tips.html) docs page.

### 2.1 Generating distorted structures

In [3]:
from shakenbreak import energy_lowering_distortions, input
from shakenbreak.input import Distortions

In [4]:
# In order to determine the number of the defect nearest neighbours to distort (based on the change 
# in valence electrons mentioned above), SnB uses the oxidation states of atoms in our material:
# If not specified, the code will guess these, otherwise you can specify as such:
# oxidation_states = {"Cd": +2, "Te": -2}  # specify atom oxidation states

# Create an instance of Distortion class with the defects and distortion parameters
# If distortion parameters are not specified, the default values are used
Dist = Distortions(defects=Te_i_dict)

Oxidation states were not explicitly set, thus have been guessed as {'Cd': 2.0, 'Te': -2.0}. If this is unreasonable you should manually set oxidation_states


The `Distortions()` class is flexible to the user input, so can take single `pymatgen` `Defect` objects, a list of `Defect`s, or a dictionary of `Defect`s (in which case the dictionary keys are used as the defect names) as inputs.

The defect dictionary output by `ChargedDefectStructures` in `doped`/`PyCDT` can also be used to initialise `Distortions`, as done in the previous cell.

These possibilities as well as the optional distortion parameters are detailed in the `Distortions` class docstring:

In [7]:
Distortions?

[0;31mInit signature:[0m
[0mDistortions[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdefects[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mlist[0m[0;34m,[0m [0mdict[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moxidation_states[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mdict[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpadding[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdict_number_electrons_user[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mdict[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdistortion_increment[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbond_distortions[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mlist[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;3

In [5]:
# We can check the distortion parameters using some of the class properties
print(f"Bond distortions: {Dist.bond_distortions}")
print(f"Rattle standard deviation: {Dist.stdev:.2f} Å")  # set to 10% of the bulk bond length by default, typically a reasonable value

Bond distortions: [-0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
Rattle standard deviation: 0.28 Å


In [6]:
# You can restrict the ions that are distorted to a certain element using the keyword distorted_elements
# We can check it using the class attribute
print("User defined elements to distort:", Dist.distorted_elements)
# If None, it means no restrictions so nearest neighbours are distorted (recommended default, 
# unless you have reason to suspect otherwise; see shakenbreak.readthedocs.io/en/latest/Tips.html)

User defined elements to distort: None


If we're only interested in generating distorted structures, but not in writing `VASP`/other codes input files, we can use the class method `Distortions.apply_distortions()` to do this.

In [7]:
defects_dict, distortion_metadata = Dist.apply_distortions()

Applying ShakeNBreak... Will apply the following bond distortions: ['-0.6', '-0.5', '-0.4', '-0.3', '-0.2', '-0.1', '0.0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6']. Then, will rattle with a std dev of 0.28 Å 

[1m
Defect: Int_Te_3[0m
[1mNumber of missing electrons in neutral state: 2[0m

Defect Int_Te_3 in charge state: 0. Number of distorted neighbours: 2

Defect Int_Te_3 in charge state: -1. Number of distorted neighbours: 1

Defect Int_Te_3 in charge state: -2. Number of distorted neighbours: 0


In [9]:
defects_dict["Int_Te_3"].keys()

dict_keys(['defect_type', 'defect_site', 'defect_supercell_site', 'defect_multiplicity', 'charges'])

In [11]:
# The output dictionary contains information about each defect:
print("Keys for each defect entry:", defects_dict["Int_Te_3"].keys())

# As well as the distorted structures for each charge state of all defects
# We can access the distorted structures of v_Cd_0 like this:
print("\nUndistorted and distorted structures:")
defects_dict["Int_Te_3"]["charges"][0]["structures"]

Keys for each defect entry: dict_keys(['defect_type', 'defect_site', 'defect_supercell_site', 'defect_multiplicity', 'charges'])

Undistorted and distorted structures:


{'Unperturbed': Structure Summary
 Lattice
     abc : 13.086768 13.086768 13.086768
  angles : 90.0 90.0 90.0
  volume : 2241.2856479961474
       A : 13.086768 0.0 0.0
       B : 0.0 13.086768 0.0
       C : 0.0 0.0 13.086768
     pbc : True True True
 PeriodicSite: Te4+ (9.8151, 3.2717, 9.8151) [0.7500, 0.2500, 0.7500]
 PeriodicSite: Cd2+ (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]
 PeriodicSite: Cd2+ (0.0000, 0.0000, 6.5434) [0.0000, 0.0000, 0.5000]
 PeriodicSite: Cd2+ (0.0000, 6.5434, 0.0000) [0.0000, 0.5000, 0.0000]
 PeriodicSite: Cd2+ (0.0000, 6.5434, 6.5434) [0.0000, 0.5000, 0.5000]
 PeriodicSite: Cd2+ (6.5434, 0.0000, 0.0000) [0.5000, 0.0000, 0.0000]
 PeriodicSite: Cd2+ (6.5434, 0.0000, 6.5434) [0.5000, 0.0000, 0.5000]
 PeriodicSite: Cd2+ (6.5434, 6.5434, 0.0000) [0.5000, 0.5000, 0.0000]
 PeriodicSite: Cd2+ (6.5434, 6.5434, 6.5434) [0.5000, 0.5000, 0.5000]
 PeriodicSite: Cd2+ (0.0000, 3.2717, 3.2717) [0.0000, 0.2500, 0.2500]
 PeriodicSite: Cd2+ (0.0000, 3.2717, 9.8151) [0

_(If you are viewing this on the [SnB Python API tutorial](https://shakenbreak.readthedocs.io/en/latest/ShakeNBreak_Example_Workflow.html) docs page, long output cells like this and printed dictionaries/structures below are scrollable!)_

### 2.2 Generating `VASP` input files for the distorted structures

If we want to generate `VASP` input files, we can use the class method `Distortions.write_vasp_files()` (instead of `Distortions.apply_distortions()`)

In [5]:
defects_dict, distortion_metadata = Dist.write_vasp_files()

Applying ShakeNBreak... Will apply the following bond distortions: ['-0.6', '-0.5', '-0.4', '-0.3', '-0.2', '-0.1', '0.0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6']. Then, will rattle with a std dev of 0.28 Å 

[1m
Defect: Int_Te_3[0m
[1mNumber of missing electrons in neutral state: 2[0m

Defect Int_Te_3 in charge state: 0. Number of distorted neighbours: 2

Defect Int_Te_3 in charge state: -1. Number of distorted neighbours: 1

Defect Int_Te_3 in charge state: -2. Number of distorted neighbours: 0


Using the `incar_settings` optional argument for `Distortions.write_vasp_files()` above, we can also specify some custom `INCAR` tags to match our converged `ENCUT` for this system and optimal `NCORE` for the HPC we will run the calculations on. More information on the distortions generated can be obtained by setting `verbose = True`. Note that any `INCAR` flags that aren't numbers (e.g. `{"IBRION": 1}`) or True/False (e.g. `{"LREAL": False}`) need to be input as strings with quotation marks (e.g. `{"ALGO": "All"}`).

Our distorted structures and VASP input files have now been generated in the `Int_Te_3_X` folders.

For the recommended default coarse structure-searching `INCAR` settings, either have a look at the `incar.yaml` file in the `SnB_input_files` folder or at the generated files:

In [6]:
!cat ./Int_Te_3_0/Bond_Distortion_-10.0%/INCAR

# May want to change NCORE, KPAR, AEXX, ENCUT, NUPDOWN, ISPIN, POTIM = 
# ShakeNBreak INCAR with coarse settings to maximise speed with sufficient accuracy for qualitative structure searching = 
# KPAR = # No KPAR, only one kpoint
ALGO = Normal
EDIFFG = -0.01
ENCUT = 300
HFSCREEN = 0.2
IBRION = 2 # While often slower than '1' (RMM-DIIS), this is more stable and reliable, and vasp_gam relaxations are typically cheap enough to justify it
ISIF = 2
ISMEAR = 0
ISPIN = 2 # Spin polarisation likely for defects
ISYM = 0 # Symmetry breaking extremely likely for defects
LASPH = True
LCHARG = False
LHFCALC = True
LORBIT = 11
LREAL = Auto
LWAVE = False
NCORE = 12
NEDOS = 2000
NELM = 100
NSW = 300
PREC = Accurate
PRECFOCK = Fast
SIGMA = 0.05
NELECT = 582.0
NUPDOWN = 0 # But could be 2 if strong spin polarisation or magnetic behaviour present
EDIFF = 1e-05 # May need to reduce for tricky relaxations
ROPT = 1e-3 1e-3 1e-3


Note that the `NELECT` `INCAR` tag (number of electrons) is automatically determined based on the choice of `POTCAR`s. The default in `ShakeNBreak` is to use the [`MPRelaxSet` `POTCAR` choices](https://github.com/materialsproject/pymatgen/blob/master/pymatgen/io/vasp/MPRelaxSet.yaml), but if you're using different ones, make sure to set `potcar_settings` in `apply_shakenbreak()`, so that NELECT is then set accordingly. This requires the `pymatgen` config file `$HOME/.pmgrc.yaml` to be properly set up as detailed on the [GitHub `README`](https://github.com/SMTG-UCL/ShakeNBreak) and [Installation](https://shakenbreak.readthedocs.io/en/latest/Installation.html) docs page.

For generating the input files for other electronic structure codes (`Quantum Espresso`, `FHI-aims`, `CP2K`, `CASTEP`), see the [(Optional) Generate input files for other codes](#other) section at the end of this notebook.

<a id='HPCs'></a>

## 3. Send to HPCs and run relaxations

Can use the `snb-run` CLI function to quickly run calculations; see the [Submitting the geometry optimisations](https://shakenbreak.readthedocs.io/en/latest/Generation.html#submitting-the-geometry-optimisations) section of the CLI tutorial for this.

#### a) For `VASP` users:

Then parse the energies obtained by running the `snb-parse` command from the top-level folder containing your defect folders (e.g. `Int_Te_3` etc. (with subfolders: `Int_Te_3/Bond_Distortion_10.0%` etc.)). This will parse the energies and store them in a `Int_Te_3.yaml` etc file in the defect folders, to allow easy plotting and analysis.

It is also recommended to parse the final structures (`CONTCAR`s files if using `VASP`) obtained with each distortion relaxation for further structural analysis, which is done automatically when downloaded to your local folders as below. 

Copying these data to your local PC can be done quickly from your local folder top-level folder (containing `Int_Te_3` etc) with the following code:

```bash
for defect in ./*{_,_-}[0-9]/; do cd $defect; scp {remote_machine}:{path to ShakeNBreak folders}/${defect}${defect%?}.yaml .; for distortion in (Bond_Distortion|Unperturbed|Rattled)*/; do scp {remote_machine}:{path to ShakeNBreak folders}/${defect}${distortion}CONTCAR ${distortion}; done; cd ..; done
```
making sure to change `{remote_machine}` and `{path to ShakeNBreak folders}` to the correct values in your case.

#### b) If using `CP2K`, `Quantum Espresso`, `CASTEP` or `FHI-aims`:
Then parse the energies obtained by running the `snb-parse` command from the top-level folder containing your defect folders (e.g. `v_Cd_0` etc. (with subfolders: `v_Cd_0/Bond_Distortion_10.0%` etc.)) and setting the `--code` option (e.g. `snb-parse --code cp2k`). This will parse the energies and store them in a `v_Cd_0.yaml` etc file in the defect folders, to allow easy plotting and analysis. 

It is also recommended to parse the final structures obtained with each relaxation for further structural analysis. Depending on the code, the structure information is read from:
* `CP2K`: restart file
* `Quantum Espresso`: output file
* `CASTEP`: output file (i.e. `.castep`)
* `FHI-aims`: output file