# Welcome to Catplat CLI tutorial

Catplat is a high throughput screening for heterogenous catalytic descriptors, with database storage functionality

## 1. Setting up and installing catplat

### Set up environment
Catplat is developed and build on top of a variety of packages, such as ase, pymatgen, catkit and pyatoms. For convenience, we have set up the environment with the necessary dependencies and packages on ACRC supercomputers (Pluto, Corona, Stratus). When using these supercomputers, we can simply activate the environment to start using catplat.

**1.1 Activate centrally managed conda on you server**

In [None]:
# for Pluto (ACRC)
!source /apps/anaconda3-individual-edition/2020.11/etc/profile.d/conda.sh

# for Corona/ Stratus (ACRC)
!source /apps/anaconda3-individual-edition/install/etc/profile.d/conda.sh

**1.2 Activate catplat environment**

In [None]:
!conda activate ~chenwjb/miniconda3/envs/catplat

**1.3 Project.yaml file config**

The project.yaml file contains the INCAR, KPOINTS and dft functional information for calculations. A template can be found in this repo, named project.yaml.

In [None]:
# 1) Make a copy and rename the template project.yaml file
!cp project.yaml {project_name}.yaml

# 2) Edit the .yaml file with the provided template.
!vi {project_name}.yaml

# 3) Make a copy of the yaml file in the using_settings_dir of pyatoms.
!pyatoms config vasp-project --yaml-file {project_name}.yaml

## 2. Catplat CLI commands

At this stage, you should have access to all of the catplat functionalites and access to the command line interface.

There are three main commands for using catplat. For more details for each command, simply run *!catplat {script_name} --help*.

        2.1 catplat config
        2.2 catplat calculate
        2.3 catplat retrieve

### 2.1  Catplat Config

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_config.PNG?raw=true" alt="flowchart-config" style="width: 500px;"/>

catplat config is used to configure and view project settings.

1. catplat config user -> Sets the main run dir for catplat calculations
2. catplat config project -> Configures the calculation and database path for project
3. catplat config show -> Displays the METADATA file in user_project_settings


**2.1.1 Set up calculation and database path for project**


In [None]:
# create local calculation and database
!catplat config project --name wgs --calc-path ~/catplat/wgs/calc --db-path ~/catplat/wgs/db


# displays config information
!catplat config show

This specifies the path for carrying out calculations and the path where the database will be stored for the wgs Project.

--db-path can also be a path to a mysql database in the format:

    mysql://<user>:<password>@<host>:<port>/<dbname>

In [None]:
!catplat config --project wgs --calc-path ~/catplat/wgs/calc --db-path mysql://wgsuser:wgspassword@127.0.0.1:3306/wgs

### 2.2 Catplat Calculate

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_calculate.PNG?raw=true" alt="flowchart-calculate" style="width: 500px;"/>


Used to retrieve data from the catplat database, and perform any calculations needed if no relevant entries are found. This follows the same syntax as catplat retrieve. There are many options for this command, However, they can be generalised into 5 categories.

The following subsections contain the details and usage of the options:

        2.2.1 Calculation Options
        2.2.2 Bulk Options
        2.2.3 Slab Options
        2.2.4 Adsorbate Options
    

**Output file:**

A sample of the output file can be found in this repo sample_out.txt

1. Header: Contains catplat version information and start time.
2. Inputs: Displays inputs such as project, calculation path, database path, num_nodes, num_job_run_slots, test_mode, job_runner.
3. Query: Displays inputs used for the query.
4. Retrieved data: Displays data retrieved from the database.

#### 2.2.1 Calculation Options



The first group of options we are looking at is the calculation options. These options describe the settings used for calculation (i.e. project name, number of nodes, etc..)

**Options:**

    -p / --project      Name of project to run. Contains information such as database_path, incar settings, etc.
    -n / --num-nodes    Number of nodes to assign for each num-job-run-slots.
    --test              Flag for testing. Results will not be written to the database.
    --fakerunner        Flag to use FakeVaspOverallJobRunner(). Results will be written to the database 
                        with placeholder energy value of 123.0 eV.

**Examples:**

In [None]:
# --project option is mandatory
!catplat calculate --project wgs

# request for more nodes for single job
!catplat calculate --project wgs --num-nodes 4

# request for parallel running of 4 jobs of using a single node
!catplat calculate --project wgs -num-job-run-slots 4

# the total number of nodes requested for the job would be --num-nodes * --num-job-run-slots (default = 1 node)

**Testing**:

It is often useful to preview the outcome of the calculations by looking at the initial structures generated by catplat that have not undergone optimized yet. To do this we can add the --test flag. The output file will look like a normal calculation but the results will not be stored in the database. One can then see how many calculations will likely be needed. Additionally a GUI window will pop up with expected initial structures.

In [None]:
# for testing
!catplat calculate --project wgs --test

In this tutorial, we would be employing the --test flag for all the sample codes.

#### 2.2.2 Bulk Options

The bulk workflow is usually the first workflow to be initiated. The bulk workflow obtains the specified bulk structure and relaxes the structure using cell optimization. The bulk structure can be defined using 2 methods - Reading structure file from user or querying structure from materials project.

**Options:**

    --user-bulk         Name of bulk file in user's bulk directory. When bulk is specified, other bulk attributes such as 'e_above_hull' should not be specified.
    --e-above-hull      Energy above hull of the bulk structure as indicated on Materials Project. Comparator strings are preferred over float unless exact match is desired.
    --bulk-formula      Bulk formula as indicated on Materials Project.
    --chemsys           Chemical system is a string of a list of elements sorted alphabetically and joined by dashes, by convention for use in database keys.
    --spacegroup        Spacegroup number of bulk as indicated on Materials Project.
    --bulk-provenance   String for the origin of bulk atoms.

**Example 1: Bulk structure from User**

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_user_bulk.PNG?raw=true" alt="flowchart-user-bulk" style="width: 500px;"/>

Bulk atoms can be obtained by reading a structure defined by the user. When bulk structure is specified, other bulk attributes should not be specified.



In [2]:
# specifying user bulk
!catplat calculate --project wgs --bulk-atoms valid_bulk

# Try it yourself!
# Try to read the provided structure "POSCAR_Cu_bulk" from path.
# !catplat calculate --project wgs --bulk-atoms /path/to/structure/file

Traceback (most recent call last):
  File "/home/chryston/.local/bin/catplat", line 5, in <module>
    from autopy.cli.main import main
  File "/home/chryston/.local/lib/python3.9/site-packages/autopy/cli/main.py", line 9, in <module>
    from autopy.cli.adsorb import adsorb
  File "/home/chryston/.local/lib/python3.9/site-packages/autopy/cli/adsorb.py", line 5, in <module>
    from autopy.structure.complex import ComplexBuilder
  File "/home/chryston/.local/lib/python3.9/site-packages/autopy/structure/complex.py", line 13, in <module>
    from autopy.io.gratoms import GratomsWrapper
ModuleNotFoundError: No module named 'autopy.io'
[0m

In [None]:
# examples of commands would result in an error
!catplat calculate --project wgs --bulk-atoms user-bulk --e-above-hull 0
!catplat calculate --project wgs --bulk-atoms user-bulk --spacegroup 225
!catplat calculate --project wgs --bulk-atoms user-bulk --chemsys Cu
!catplat calculate --project wgs --bulk-atoms user-bulk --bulk-formula AgPt3

**Example 2: Bulk from Materials Project**

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_mp_bulk.PNG?raw=true" alt="flowchart-user-bulk" style="width: 500px;"/>

Bulk structures can also be obtained by querying the material project (https://materialsproject.org/materials).

In [None]:
# pure metal systems
!catplat calculate --project wgs --chemsys Cu --test # returns 8 bulk Cu structures

**Filtering bulk by energy_above_hull**

Sieving of bulk structures using --e-above-hull is highly encouraged. Otherwise, many bulk structures would be returned. It is also recommended to use --e-above-hull as a comparator string unless exact matching is needed.

In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.01" --test # 3 bulk structures with e_above_hull <0.01
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --test # 1 structure with e_above_hull <0.001

**Filtering bulk by spacegroup**

In [None]:
!catplat calculate --project wgs --chemsys Cu --spacegroup 225 --test # returns 1 bulk structure with spacegroup of 225
!catplat calculate --project wgs --chemsys Cu --spacegroup 194 --test # returns 2 bulk structures with spacegroup of 194 

**Filtering bulk alloy structures with bulk formula**


In [None]:
!catplat calculate --project wgs --chemsys Ag-Pt --test # returns 5 bulk alloy structures of Ag-pt
!catplat calculate --project wgs --chemsys Ag-Pt --bulk-formula AgPt3 --test # returns 1 bulk alloy structure of Ag-pt

#### 2.2.3 Slab Options

The slab workflow is the next workflow to be initiated. The slab workflow gets the structure of the slab and relaxes it.

Similar to bulk, the slab structure can be defined using 2 methods - Reading structure file from user or from the bulk obtained in the bulk workflow.

**Options:**

    --slab-atoms        Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.
    --termination       Slab termination of alloy.
    --overlayer         Replaces top layer atoms with overlayer atoms.
    --miller-index      Miller index of slab to be created.
    --unitcell-size     Size of slab to be created.
    --num-layers        Number of layers of the slab.
    --num-fixed-layers  Number of layers to be fixed.
    --vacuum            Amount of vacuum (in angstroms) to be applied in the z-direction.
    --conventional      Toggle for creation of conventional slabs.
    --slab-provenance   String for the origin of slab atoms.

**Example 1: Slab from User**

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_user_slab.PNG?raw=true" alt="flowchart-user-bulk" style="width: 500px;"/>

Slab atoms can be obtained by reading a structure defined by the user. When user slab structure is specified, other bulk attributes should not be specified and slab attributes should not be specified.

In [None]:
!catplat calculate --project wgs --slab-atoms valid_slab --test
# from path

In [None]:
# examples of commands would result in an error
!catplat calculate --project wgs --bulk-atoms user-bulk --slab-atoms user-slab --test
!catplat calculate --project wgs --chemsys Cu --slab-atoms user-slab --test
!catplat calculate --project wgs --miller-index 1 1 1 --slab-atoms user-slab --test
!catplat calculate --project wgs --overlayer Cu --slab-atoms user-slab --test

**Example 2: Slab creation from bulk workflow structure**

Slab creation is often intiated after the bulk structure has been obtained from the bulk structure workflow.

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_slab.PNG?raw=true" alt="flowchart-user-bulk" style="width: 500px;"/>

In [None]:
# example to continue slab workflow from user's bulk structure
!catplat calculate --project wgs --bulk-atoms Cu-bulk --miller-index 1 0 0 --unitcell-size 4 4 --test

# example to continue slab workflow from Material Project bulk structure 
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --test

The table below illustrates some of the options for slab creation.

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/slab_options.PNG?raw=true" alt="flowchart-user-bulk" style="width: 500px;"/>

**Miller index**

The miller index of the slab must be specified for slab creation. This can be done with the --miller-index option.

In [None]:
# fcc(111)
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 1 1 --unitcell-size 4 4 --test

# fcc(100)
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --test

# fcc(211)
# Due to uneven z-positions of the 211 slab atoms, we would need be required to specify 3 times for --num-layers and --num-fixed-layers.
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --test

# bcc(110)
!catplat calculate --project wgs --chemsys Fe --e-above-hull "<0.001" --miller-index 1 1 0 --unitcell-size 4 2 --test

# Rutile
!catplat calculate --project wgs --chemsys O-Ti --bulk-formula TiO2 --spacegroup 136 --e-above-hull "<0.04" --miller-index 1 1 0 --unitcell-size 4 2 --test

# for hcp bulk structures
!catplat calculate --project wgs --chemsys Co --e-above-hull "<0.001" --miller-index 0 0 1 --unitcell-size 4 4 --test # hcp(0001) surface
!catplat calculate --project wgs --chemsys Co --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --test # 2 terminations for hcp(1010) surface

**Unit cell size**

The default value for the unit cell size is 1,1. Otherwise, the unit cell size can be specified using the --unitcell-size option.

In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 1 1 --unitcell-size 4 4 --test # creates a 4x4 slab

**Number of layers**

The default value for the number of layers is 4. Otherwise, the number of layers can be specified using the --num-layers option.

In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 1 1 --unitcell-size 4 4 --num-layers 6 --test # creates a 6 layered slab

**Number of fixed layers**

The default value for the number of fixed layers is 2. Otherwise, the number of fixed layers can be specified using the --num-fixed-layers option.

In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 1 1 --unitcell-size 4 4 --num-fixed-layers 1 --test # creates a slab with 1 fixed layer

**Vacuum**

Default value for vacuum is 10. However, the default value of the vacuum may be changed using the --vacuum option.


In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 1 1 --unitcell-size 4 4 --vacuum 5.0 --test # creates a slab with vacuum 5

**Overlayer**

Overlayer replaces the chemical species of the surface atoms, which is useful to model core@shell alloys.

In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 1 1 --unitcell-size 4 4 --overlayer Rh --test # creates a slab with Rh overlayer

**Termination**

Termination is used to distinguish different surfaces for alloy slabs.

In [None]:
!catplat calculate --project wgs --chemsys Cu-Pd --bulk-formula CuPd --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --termination Cu --test # filters alloy slabs with Cu termination
!catplat calculate --project wgs --chemsys Cu-Pd --bulk-formula CuPd --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --termination Pd --test # filters alloy slabs with Pd termination

#### 2.2.4 Adsorbate Options

The adsorbate workflow is the final workflow to be initiated. The adsorbate workflow analyzes the unique sites of the relaxed slab and generate monodentate or bidentate complexes on the slab. Adsorbate options describes how the adsorbate binds to the slab (for example, which adsorbate, which atom to bind to the slab, which site of the slab to bind to, rotation of the adsorbate etc.)


Adsorbate structures should first be created and stored in the user's adsorbate folder. During operation, catplat would read the structures from the folder. Otherwise, there are also some default common adsorbates which are available from the ase database.

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/flow_adsorbate.PNG?raw=true" alt="flowchart-user-bulk" style="width: 500px;"/>




**Options:**

    --adsorbate-atoms       Name of adsorbate. Adsorbate can be read from user's adsorbate directory. 
                            Some common adsorbates are available from the ASE database.
    --adsorbate-formula     Formula of adsorbate.
    --bonds                 Atom index of adsorbate to bind to the slab. Default bond = [[0]] for monodentate
                            and [[0,1]] for bidentate adsorbates.
    --connectivity          Connectivity of adsorption site for adsorbate to bind.
    --avg-coord-num         Average coordination of adsorption site for adsorbate to bind.
    --rotation              Rotation of adsorbate. Symmetrical adsorbates (i.e. C) have no rotations.

>**Previewing adsorbate structure**
>
>Prior to specification of --adsorbate-atoms, adsorbate structures and information can be previewed using the *catplat adsorbate* command. This preview would allow you to determine the desired "--bonds" attribute and provide some useful information regarding the adsorbate.
>
>From the ase gui, atoms indices can be visualised by clicking View > Show Labels > Atom Index.

In [None]:
# preview of adsorbate
!catplat adsorbate --adsorbate-atoms CO

>**Previewing surface sites**
>
> To determine the desired "--connectivity" and "--avg-coord-num", the slab surface and slab information can be first previewed using the *catplat surface* command. 

In [None]:
# preview of surface sites
!catplat surface --slab-atoms valid_slab

**Example 1: Monodentate adsorption**

Monodentate adsorption is automatically determined when len(bonds) == 1.

**Connectivity**

FCC(100) slabs have 3 unique sites (top, bridge, hollow). We can only choose to analyze top sites by specifying 


**Bonds**

Bonds is the atom index of the adsorbate which binds to the slab.

In [None]:
# binds to slab through carbon
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms CO --bonds "[0]" --test

# binds to slab through oxygen
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms CO --bonds "[1]" --test

**Average Coordination Number**

Average coordination number of the surrounding atom(s) of the adsorption site. Average coordination number specification is not useful for flat slab surfaces such as fcc100 and fcc111 as the average coordination number is the same for all sites. However, average coordination number is very useful for uneven surfaces such as fcc211.

<img src="https://github.com/chryston/catplat_tutorial/blob/main/pictures/acn_rh2111.png?raw=true" alt="avg_coord_num of 211 slab" style="width: 500px;"/>

In [None]:
# returns 2 complex structures on the step edge 
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --adsorbate-atoms H --avg-coord-num "[7]" --test

# returns 1 complex structure
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --adsorbate-atoms H --avg-coord-num [7.7] --test

# returns 4 complex structures with average coordination number <=8
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --adsorbate-atoms H --avg-coord-num "['<=8']" --test

# further narrow down using connectivity + avg-coord-num
# using these 2 filters, we can select the specific sites of interest
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --adsorbate-atoms H --avg-coord-num "[7]" --connectivity "[1]" --test

**Rotation**

Rotation of adsorbate on the slab. Defaults to rotate from 0° to 360° in steps of 30° (12 structures; rotation: 0==360, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330). Symmetrical structures are filtered off.

When rotation is not required, --rotation 0 is recommended to prevent excessive generation of structures.

Adsorbates and slabs with high symmetry would result in lesser rotational structures.

In [None]:
# Only 2 structures generated after symmetrical structures are filtered.
!catplat calculate --project wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1 --unitcell-size 4 4 --adsorbate-atoms CH3 --connectivity "[1]" --test

# CH3 rotated by 30°
!catplat calculate --project wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1 --unitcell-size 4 4 --adsorbate-atoms CH3 --connectivity "[1]" --rotation 30 --test

**Example 2: Bidentate adsorption**

Likewise, bidentate adsorption is automatically determined when would be selected when len(bonds) == 2.

For bidentate adsorptions, do note on the following:

1. --bonds, --connectivity and --avg-coord-num lists must contain 2 values for bidentate adsorption
2. -- rotation is unavailable for bidentate adsorption
    

In [None]:
# top-top binding of O2 on Cu 100 slab
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms O2 --bonds "[0,1]" --connectivity "[1,1]" --test

# top-top binding on the step-edge of the Cu 211 slab
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --adsorbate-atoms O2  --bonds "[0,1]" --connectivity "[1,1]" --avg-coord-num "[7,7]" --test

# using comparator strings for bidenate avg coord num
!catplat calculate --project wgs --chemsys Cu --e-above-hull "<0.001" --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --adsorbate-atoms O2  --bonds "[0,1]" --avg-coord-num "['<=8','<=8']" --test

### 2.3 Catplat Retrieve

Used to retrieve data from the catplat database.

For example:

In [None]:
!catplat retrieve --p wgs --miller-index 1 0 0 --chemsys Cu

This retrieves all calculations with matching miller indices and chemical system for the wgs Project.