# Welcome to Catplat CLI tutorial

Catplat is a high throughput screening for heterogenous catalytic descriptors, with database storage functionality

## 1. Setting up and installing catplat

### Set up environment
Catplat is developed and build on top of a variety of packages pyatoms. For convenience, we have set up the environment with the necessary dependencies and packages on ACRC and NSCC supercomputers. Simply activate the environment to start using catplat.

1.1 Activate centrally managed conda on you server

In [None]:
# for Pluto (ACRC)
!source /apps/anaconda3-individual-edition/2020.11/etc/profile.d/conda.sh

# for Corona/ Stratus (ACRC)
!source /apps/anaconda3-individual-edition/install/etc/profile.d/conda.sh

1.2 Activate pyatoms library

In [None]:
# for Pluto (ACRC) 
!conda activate ~chenwjb/miniconda3/envs/catplat

# for Corona/ Stratus (ACRC)


## 2. Catplat CLI commands

At this stage, you should have access to all of the catplat functionality and command line interface.

There are three main commands for using catplat. They are catplat config, catplat calculate and catplat retrieve.
catplat config: 
catplat calculate:
catplat retrieve:

For more details for each command, simply run !catplat {script_name} --help

### 2.1  Catplat Config

catplat config is used to configure catplat. Using this command, we can set up settings such as the project name, calculation path, database path.

Options:

In [None]:
!catplat config --project wgs --calc-path ~/catplat/wgs/calc --db-path ~/catplat/wgs/db

This specifies the path for carrying out calculations and the path where the database will be stored for the wgs Project.

--db-path can also be a path to a mysql database in the format:

In [None]:
mysql://<user>:<password>@<host>:<port>/<dbname>

!catplat config --project wgs --calc-path ~/catplat/wgs/calc --db-path mysql://wgsuser:wgspassword@127.0.0.1:3306/wgs

#provide template yaml file
pyatoms config vasp-project --help

### 2.2 Catplat Calculate
Used to retrieve data from the catplat database, and perform any calculations needed if no relevant entries are found. This follows the same syntax as catplat retrieve.

**OPTIONS:**

    **--slab-atoms**
        Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.

    **--termination**
        Slab termination of alloy.
    
    **--overlayer**
        Replaces top layer atoms with overlayer atoms.
    
    **--miller-index**
        Miller index of slab to be created.

    **--unitcell-size**
        Size of slab to be created.
    
    **--num_layers**
        Number of layers of the slab.

    **--num_fixed_layers**
        Number of layers to be fixed.

    **--slab-atoms**
    Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.
        **--slab-atoms**
    Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.
        **--slab-atoms**
    Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.
        **--slab-atoms**
    Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.
        **--slab-atoms**
    Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.
    
    

#### 2.2.1 Calculation options

Before attributes of the system is specified, we must first specify settings used for calculation.

**Options:**

    -p / --project      Name of project to run. Contains information such as database_path, incar settings, etc.

    -n / --num-nodes    Number of nodes to assign for each num-job-run-slots.
        
    --test              Flag for testing. Results will not be written to the database.
    
    --fakerunner        Flag to use FakeVaspOverallJobRunner(). Results will be written to the database with placeholder energy value of 123.0 eV.

**Note:**

Only the --project option is mandatory.

The total number of nodes requested for the job would be --num-nodes * --num-job-run-slots (default = 1 node)

In [None]:
# minimally specify project
!catplat calculate --project wgs

# request for more nodes for single job
!catplat calculate --project wgs --num-nodes 4

# request for parallel running of 4 jobs of using a single node
!catplat calculate --project wgs -num-job-run-slots 4

**Testing**

It is often useful to preview the outcome of the calculations by looking at the initial structures generated by catplat that have not undergone optimized yet. To do this we can add the --test flag. The output file will look like a normal calculation but the results will not be stored in the database. One can then see how many calculations will likely be needed. Additionally a GUI window will pop up with expected initial structures.

In [None]:
# for testing
!catplat calculate --project wgs --test

### 2.2.2 Bulk options

The bulk workflow is usually the first workflow to be initiated. The bulk workflow does a cell optimization and relaxation of the bulk structure of the metal, which would be later used for slab creation.

The bulk structure can be defined using 2 methods - Reading structure file from user or querying structure from materials project.

**Options:**

    --user-bulk         Name of bulk file in user's bulk directory. When bulk is specified, other bulk attributes such as 'e_above_hull' should not be specified.
        
    --e-above-hull      Energy above hull of the bulk structure as indicated on Materials Project. Comparator strings are preferred over float unless exact match is desired.

    --bulk-formula      Bulk formula as indicated on Materials Project.
    
    --chemsys           Chemical system is a string of a list of elements sorted alphabetically and joined by dashes, by convention for use in database keys.

    --spacegroup        Spacegroup number of bulk as indicated on Materials Project.
    
    --bulk-provenance   String for the origin of bulk atoms.

**Bulk from User**

Bulk atoms can be obtained by reading a structure defined by the user. When bulk structure is specified, other bulk attributes should not be specified.

In [None]:
# specifying user bulk
!catplat calculate --project wgs --bulk-atoms valid_bulk

In [None]:
# examples of commands would result in an error
!catplat calculate --p wgs --bulk-atoms user-bulk --e-above-hull 0
!catplat calculate --p wgs --bulk-atoms user-bulk --spacegroup 225
!catplat calculate --p wgs --bulk-atoms user-bulk --chemsys Cu
!catplat calculate --p wgs --bulk-atoms user-bulk --bulk-formula AgPt3

**Bulk from Materials Project**

Alternatively, bulk atoms can be obtained by querying the material project (https://materialsproject.org/materials).

Sieving of bulk structures using --e-above-hull is highly encouraged. Otherwise, many bulk structures would be returned. It is also recommended to use --e-above-hull as a comparator string unless exact matching is needed.

--bulk-formula is mainly used to filter alloy structures.

In [None]:
# pure metal systems
!catplat calculate --p wgs --chemsys Cu # returns 8 bulk Cu structures
!catplat calculate --p wgs --chemsys Cu --e-above-hull <0.001 # returns 1 bulk Cu structure
!catplat calculate --p wgs --chemsys Cu --spacegroup 194 # returns 2 bulk Cu structures

# alloy specification
!catplat calculate --p wgs --chemsys Ag-Pt --e-above-hull <0.1 --bulk-formula AgPt3 # returns 1 bulk AgPt3 structure

Example: Filtering bulk by energy_above_hull

In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull '<0.01' --test # 3 bulk structures with e_above_hull <0.01
!catplat calculate --project wgs --chemsys Cu --e-above-hull '<0.001' --test # 1 structure with e_above_hull <0.001

Example: Filtering bulk by spacegroup

In [None]:
!catplat calculate --project wgs --chemsys Cu --spacegroup 225 --test # returns 1 bulk structure with spacegroup of 225
!catplat calculate --project wgs --chemsys Cu --spacegroup 194 --test # returns 2 bulk structures with spacegroup of 194 

Example: Filtering bulk alloy structures with bulk formula

In [None]:
!catplat calculate --project wgs --chemsys Ag-Pt --test # returns 5 bulk alloy structures of Ag-pt
!catplat calculate --project wgs --chemsys Ag-Pt --bulk-formula AgPt3 --test # returns 1 bulk alloy structure of Ag-pt

### 2.2.3 Slab options

The slab workflow is the next workflow to be initiated. The slab workflow gets the structure of the slab and relaxes it.

Similar to bulk, the slab structure can be defined using 2 methods - Reading structure file from user or from the bulk obtained in the bulk workflow.

**OPTIONS:**

    --slab-atoms        Name of slab file in user's slab directory.When slab is specified, other bulk and slab attributes should not be specified.

    --termination       Slab termination of alloy.
    
    --overlayer         Replaces top layer atoms with overlayer atoms.
    
    --miller-index      Miller index of slab to be created.

    --unitcell-size     Size of slab to be created.
    
    --num-layers        Number of layers of the slab.

    --num-fixed-layers  Number of layers to be fixed.

    --vacuum            Amount of vacuum (in angstroms) to be applied in the z-direction.

    --conventional      Toggle for creation of conventional slabs.

    --slab-provenance   String for the origin of slab atoms.

**Slab from User**

Slab atoms can be obtained by reading a structure defined by the user. When user slab structure is specified, other bulk attributes should not be specified and slab attributes should not be specified.

In [None]:
!catplat calculate --p wgs --slab-atoms user-slab

In [None]:
# examples of commands would result in an error
!catplat calculate --p wgs --bulk-atoms user-bulk --slab-atoms user-slab
!catplat calculate --p wgs --chemsys Cu --slab-atoms user-slab
!catplat calculate --p wgs --miller-index 1 1 1 --slab-atoms user-slab
!catplat calculate --p wgs --overlayer Cu --slab-atoms user-slab

**Slab creation from bulk workflow structure**

Slab creation is often intiated after the bulk structure has been obtained from the bulk structure workflow.

In [None]:
# example to continue slab workflow from user's bulk structure
!catplat calculate --project wgs --bulk-atoms Cu-bulk --miller-index 1 0 0 --unitcell-size 4 4 --test


# example to continue slab workflow from mp
!catplat calculate --project wgs --bulk-atoms Cu-bulk --miller-index 1 0 0 
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1

From this slab, there are a few customization options that we can have

**Example 1: Miller index**

The miller index of the slab must be specified by the --miller-index option. Here we set the unitcell

In [None]:
!catplat calculate --project wgs --bulk-atoms Cu-bulk --miller-index 1 0 0 
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1

# highly customized scenario demonstrating the customisation features
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1 --overlayer Rh --unitcell-size 4 4 --num-layers 4 --num-fixed-layers 2 --vacuum 20 --convetional False

**Example 2: Unit cell size**


In [None]:
!catplat calculate --project wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1 --unitcell-size 3 3 --test
!catplat calculate --project wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 1 1 --unitcell-size 4 4 --test

**Specifying 211 slabs**

When creating 211 slabs, ensure that --conventional is True.

Due to uneven z-positions of the 211 slab atoms, we would need be required to specify 3 times for --num-layers and --num-fixed-layers.

In [None]:
# creates Cu 211 slab with 4 layers and 2 fixed layers
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --conventional True

### 2.2.3 Adsorbate options

The adsorbate workflow is the final workflow to be initiated. The adsorbate workflow analyzes the unique sites of the relaxed slab and generate monodentate or bidentate complexes on the slab. The structures are later relaxed.
Adsorbate structures should first be created and stored in the user's adsorbate folder. During operation, catplat would read the structures from the folder. Otherwise, there are also some default common adsorbates which are available from the ase database.

**OPTIONS:**

    --adsorbate-atoms        Name of adsorbate. Adsorbate can be read from user's adsorbate directory. Some common adsorbates are available from the ASE database.

    --adsorbate-formula     Formula of adsorbate.
    
    --bonds                 Atom index of adsorbate to bind to the slab. Default bond = [[0]] for monodentate and [[0,1]] for bidentate adsorbates
    
    --connectivity          Connectivity of adsorption site for adsorbate to bind.

    --avg-coord-num         Average coordination of adsorption site for adsorbate to bind.
    
    --rotation              Rotation of adsorbate. Symmetrical adsorbates (i.e. C) have no rotations.

**Monodentate adsorption**
Monodentate adsorption is automatically selected when len(bonds) == 1. Likewise, bidentate adsorption would be automatically selected when len(bonds) == 2.

Example 1: H binding on fcc100 top site
fcc 1,0,0 slabs have 3 unique sites (top, bridge, hollow). We can only choose to analyze top sites by specifying 


In [None]:
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms H --connectivity "(1)"

Example 2: Specifying bonds

Bonds is the atom index of the adsorbate which binds to the slab. 
#TODO: add script to view adsorbate.

In [None]:
# binds to slab through carbon
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms H --connectivity "[1]" --bonds "[0]"

# binds to slab through oxygen
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms H --connectivity "[1]" --bonds "[1]"

Example 3: Specifying average coordination number

Average coordination number of the surrounding atom(s) of the adsorption site. Average coordination number specification is not useful for flat slab surfaces such as fcc100 and fcc111 as the average coordination number is the same for all sites. However, average coordination number is very useful for uneven surfaces such as fcc211.

<img src="/home/chryston/catplat_tutorial/pictures/acn_rh2111.png" alt="avg_coord_num of 211 slab" style="width: 300px;"/>

In [None]:
# returns H binding on the 3 sites on the 211 surface
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --conventional True --adsorbate-atoms H --avg-coord-num "[<=7]"

# further narrow down on site of interest using connectivity + avg-coord-num
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 2 1 1 --unitcell-size 1 3 --num-layers 12 --num-fixed-layers 6 --conventional True --adsorbate-atoms H --avg-coord-num "[<=7]" --connectivity "[1]"

Example 4: Specifying rotation

Rotation of adsorbate on the slab. Defaults to rotate from 0° - 360° in steps of 30° (12 structures; rotation: 0==360, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330). Symmetrical structures are filtered off.
specification for --rotation 0 is recommended if not required.
 for symmetrical s  of the surrounding atom(s) of the adsorption site. Average coordination number specification is not useful for flat slab surfaces such as fcc100 and fcc111 as the average coordination number is the same for all sites. However, average coordination number is very useful for uneven surfaces such as fcc211.

In [None]:
!catplat calculate --p wgs --chemsys Cu --e-above-hull '<0.001' --miller-index 1 0 0 --unitcell-size 4 4 --adsorbate-atoms CH3 --connectivity "[1]" --bonds "[0]"

### 2.3 Catplat Retrieve

Used to retrieve data from the catplat database.

For example:

In [None]:
!catplat retrieve --p wgs --miller-index 1 0 0 --chemsys Cu

This retrieves all calculations with matching miller indices and chemical system for the wgs Project.