# <font color='green'> TUTORIAL </font>

## <font color='green'> Table of content </font>

1. **Installation**
2. **Packages**
3. **Settings**
4. **Templates**
5. **Molecules**
6.  ** Runinng a quantum mechanics simulation**
7. ** Restarting a simulation**



## <font color='green'> 1. Installation in Unix </font> 

  - conda installation. Type in your console the following command:   
   ```bash
   wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
   ```
   
  - then add miniconda to your path   
   ```bash
   bash miniconda.sh -b -p $HOME/miniconda
   ```
   
  
  - create new virtual environment
   ```bash
   conda create -q -n qmworks python=3.5
   ```
    
  - Install dependecies
   ```bash 
   conda install --name qmworks -c anaconda hdf5
   conda install --name qmworks -c https://conda.anaconda.org/rdkit rdkit
   ```
    
  - Start environment
   ```bash
   source activate qmworks
   ``` 
   
  - install **qmworks** dependencies
    ```bash
     pip install https://github.com/SCM-NV/qmworks/tarball/master#egg=qmworks https://github.com/SCM-NV/plams/tarball/master#egg=plams --upgrade
    ```
** You are ready to start! **

## <font color='green'> Starting the environment  </font>
Once *QMWORKS*  has been installed the user should run the following command to initialize the environment:

```bash
[user@int1 ~]$ source activate qmworks
discarding /home/user/anaconda3/bin from PATH
prepending /home/user/anaconda3/envs/qmworks/bin to PATH
(qmworks)[user@int1 ~]$ python --version
Python 3.5.2 :: Anaconda custom (64-bit)
```

To leave the environment the following command is used

```bash
(qmworks)[user@int1 ~]$ source deactivate
discarding /home/user/anaconda3/envs/qmworks/bin from PATH
```

## <font color='green'> 2. QMWorks Packages </font> 
Currently `qmworks` offers an interface with the following simulation softwares:
* SCM (ADF and DTFB)
* CP2K
* ORCA
* GAMESS-US
* DIRAC

If you are interested in having support for other packages, request it using the [github-issues](https://github.com/SCM-NV/qmworks/issues) system (Sorry but Gaussian is out of the menu!).

The inteaction between aforementioned packages and *qmworks* is carry out through a set of python functions: 
*adf, dftb, cp2k, orca, gamess and dirac*. These functions invoke the quantum package using 2 mandatory arguments: a **Settings** object describing the input and a **Molecule** object containing the molecular geometry as will be detailed below.

#### <font color='orange'> Technical note: </font> 
It is users responsebility to install or load simulation packages that they want to use, in most supercomputers these simulation packages are available using a command like (consult your system administrator):
```bash
load module superAwesomeQuantumPackage/3.1421
```
Also some simulation packages required that you configure a `scratch` folder. For instance *Orca* requires a ``SCR`` folder to be defined while *ADF*  called it ``SCM_TMPDIR``.

## <font color='green'> 3. QMWorks Settings </font> 
*Settings* are a subclass of python [dictionaries](https://docs.python.org/3.5/tutorial/datastructures.html#dictionaries) to represent herarchical structures, like 

<img src="files/simpleTree.png">

In [30]:
from qmworks import Settings

s = Settings()
s.b.z
s.c.f
s.c.g = 0

These hierachical resemble the input structure used in most quantum simulation package. For instance the basis set section in ADF is given by something like:
```
Basis
  Type DZP
  Core Large
End
```
We can resemble this structure using **Settings**,

In [31]:
s = Settings()
s.specific.adf.basis.basis = "DZP"
s.specific.adf.basis.core = "Large"

We are creating the *adf*  hierarchy under a key called *specific*, this key is used to differentiate keywords that are unique to a certain quantum package from those that can be used in several packages as we will see in the next section.

similarly, we can define ``Settings`` for all the sections
```
Basis
  Type DZP
End

Constraints
  Dist 1 2 1.0
End

Geometry
  Optim delocal
End

Integration
  Accint 6.0
End

Scf
  Converge 1e-06
  Iterations 100
End

Xc
  Lda
End
```
Represented by the following code

In [32]:
s = Settings()

# Basis
s.specific.adf.basis.basis = "DZP"
s.specific.adf.basis.core = "Large"

# Constrains
s.specific.adf.constraints.dist  = "1 2 1.0"

#Geometry
s.specific.adf.geometry.optim = 'delocal'

#Integration
s.specific.adf.integration.accint = 6.0

# SCF
s.specific.adf.scf.converge = 1e-6
s.specific.adf.scf.iterations = 100

# Functional
s.specific.adf.xc.lda

print(s)

specific: 	
         adf: 	
             basis: 	
                   basis: 	DZP
                   core: 	Large
             constraints: 	
                         dist: 	1 2 1.0
             geometry: 	
                      optim: 	delocal
             integration: 	
                         accint: 	6.0
             scf: 	
                 converge: 	1e-06
                 iterations: 	100
             xc: 	
                lda: 	



You don't need to explicitly declare the `end` keyword, *qmworks* knows how to hande them.

### <font color='green'> Generic Keywords </font> 
Many of the quantum chemistry packages use gaussian type orbitals (GTO) to perform the simulation (in contrast to the slater type orbitals). These package use the same standards for the basis set and it will be really handy if we can defined a "generic" keyword for basis sets.
Fortunately ``qmworks`` already offers such keyword that can be used among the packages that use the same basis standard,

In [33]:
s = Settings()
s.basis = "DZP"

Internally **qmworks** will create a hierarchical structure representing basis *DZP* for the packages that can handle that basis set.
Other generic keyowrds like: ``functional``, ``inithess``, etc. have been implemented. 

## <font color='green'>  4. Templates </font> 

As has been shown so far,  **Settings** can be specified in two ways: generic or specific. Generic keywords represent input properties that are present in most simulation packages like a *basis set* while *specific* keywords resemble the input structure of a given package.
 
*Generic* and *Specific* **Settings**  can express both simple and complex simulation inputs,  but it would be nice if we can pre-defined a set of templates for the most common quantum chemistry simulations like: single point calculations, geometry optimizations, transition state optimization, frequency calculations, etc.
*qmworks* already has a pre-defined set of templates  containing some defaults that the user can modify for her/his own purpose. ``Templates`` are stored inside the ``qmworks.templates`` module and are load from *JSON* files. A JSON file is basically a nested dictionary that is translated to a ``Settings`` object by *qmworks*.

Below it is shown the defaults for single point calculation

In [34]:
single_point = {
    "specific": {
        "adf": {
            "basis": {"type": "SZ"},
            "xc": {"lda": ""},
            "integration": {"accint": 4.0},
            "scf": {
            "converge": 1e-6,
            "iterations": 100} },
        "dftb": {
            "task": {"runtype": "SP"},
            "dftb": {"resourcesdir": "DFTB.org/3ob-3-1"} },
        "cp2k" : {
          "force_eval": {
              "dft": {
                  "basis_set_file_name": "",
                  "mgrid": {
                      "cutoff": 400,
                      "ngrids": 4
                  },
                  "potential_file_name": "",
                  "print": {
                      "mo": {
                          "add_last"  : "numeric",
                          "each": {
                              "qs_scf": 0
                          },
                          "eigenvalues" : "",
                          "eigenvectors": "",
                          "filename": "./mo.data",
                          "ndigits": 36,
                          "occupation_numbers": ""
                      }
                  },
                  "qs": {
                      "method": "gpw"
                  },
                  "scf": {
                      "added_mos": "",
                      "eps_scf": 1e-06,
                      "max_scf": 200,
                      "scf_guess": "restart"
                  },
                  "xc": {
                      "xc_functional": "pbe"
                  }
              },
              "subsys": {
                  "cell": {
                      "periodic": "xyz"
                  },
                  "topology": {
                      "coordinate": "xyz",
                      "coord_file_name": ""
                  }
              }
          },
          "global": {
              "print_level": "low",
              "project": "qmworks-cp2k",
              "run_type": "energy_force"
          }
        },
        "dirac": {
            "DIRAC": "WAVEFUNCTION",
            "HAMILTONIAN": "LEVY-LEBLOND",
            "WAVE FUNCTION": "SCF"
        },
        "gamess": {
            "basis": {"gbasis": "sto", "ngauss": 3},
            "contrl": {"scftyp": "rhf", "dfttyp": "pbe"}
        },
        "orca": {
            "method": {
                "method": "dft",
                "functional": "lda"},
            "basis": {
                "basis": "sto_sz"}
        }
    }
}

The question is then, how I can modify a template with my own changes?

Suppose you are perfoming a bunch of  constrained *DFT* optimizations using ` ADF ` . You need first to define a basis set  and the constrains.

In [35]:
s = Settings()
# Basis
s.basis = "DZP"
s.specific.adf.basis.core = "Large"

# Constrain
s.freeze = [1, 2, 3]

We use two *generic*  keywords: `freeze` to indicate a constrain and `basis` to provide the basis set. Also, we introduce an specific `ADF` keywords `core = Large`.
Now you merge your **Settings** with the correspoding template to carry out molecular geometry optimizations, using a method called `overlay`.

In [36]:
from qmworks import templates
inp = templates.geometry.overlay(s)

The ``overlay`` method takes as input a template containing a default set for different packages and also takes the arguments provided by the user, as shown schematically 
<img src="files/merged.png">

This `overlay` method merged the defaults for a given packages (*ADF* in this case) with the input supplied by the user, always given preference to the user input
<img src="files/result_merged.png" width="700">

## <font color='green'> 5. Molecule </font>
The next component to carry out a simulation is a molecular geometry.  *qmworks* offers a convinient way to read Molecular geometries using the [Plams](https://www.scm.com/doc/plams/molecule.html) library in several formats like: *xyz* (default), *pdb*, mol, etc.

In [37]:
from plams import Molecule
acetonitrile = Molecule("files/acetonitrile.xyz")
print(acetonitrile)

  Atoms: 
    1         C      2.419290      0.606560      0.000000 
    2         C      1.671470      1.829570      0.000000 
    3         N      1.065290      2.809960      0.000000 
    4         H      2.000000      0.000000      1.000000 
    5         H      2.000000      0.000000     -1.000000 
    6         H      3.600000      0.800000      0.000000 



You can also create the molecule one atom at a time

In [44]:
from plams import (Atom, Molecule)
m  = Molecule()
m.add_atom(Atom(symbol='C', coords=(2.41929, 0.60656 , 0.0)))
m.add_atom(Atom(symbol='C', coords=(1.67147,  1.82957, 0.0)))
m.add_atom(Atom(symbol='N', coords=(1.06529, 2.80996, 0.0)))
m.add_atom(Atom(symbol='H',  coords=(2.0, 0.0, 1.0)))
m.add_atom(Atom(symbol='H',  coords=(2.0, 0.0, -1.0)))
m.add_atom(Atom(symbol='H',  coords=(3.6, 0.8, 0.0)))
print(m)

  Atoms: 
    1         C      2.419290      0.606560      0.000000 
    2         C      1.671470      1.829570      0.000000 
    3         N      1.065290      2.809960      0.000000 
    4         H      2.000000      0.000000      1.000000 
    5         H      2.000000      0.000000     -1.000000 
    6         H      3.600000      0.800000      0.000000 



## <font color='green'> 6. Runinng a quantum mechanics simulation </font>
We now have our components to perform a calculation: **Settings** and **Molecule**. We can now invoke a quantum chemistry package to perform the computation,

In [38]:
from qmworks import adf
optmized_mol_adf = adf(inp, acetonitrile, job_name='acetonitrile_opt')

the previous code snippet *does not execute the code immediatly*, instead the simulation is started when the user invokes the run function, as shown below
```python
from plams import Molecule
from qmworks import (adf, run, Settings)

# Settings
s = Settings()
s.basis = "DZP"
s.specific.adf.basis.core = "Large"
s.freeze = [1, 2, 3]

# molecule 
from plams import Molecule
acetonitrile = Molecule("files/acetonitrile.xyz")

# Job 
optmized_mol_adf = adf(inp, acetonitrile, job_name='acetonitrile_opt')
# run the  job
results = run(optimized_mol_adf)
```

## <font color='green'>  Extracting Properties </font> 
We can easily extract the *optimized geometry* from the *ADF* calculation using the following notation

In [39]:
optmized_mol_adf = job.molecule

In general, properties are extracted using the standard `Object.attribute` notation in python, as shown in the above example. 

User can request for properties like: energy, frequencies, dipole, etc. The Available properties depends on the package (please have a look at link for more detailed information [package tutorial](https://github.com/SCM-NV/qmworks/tree/develop/jupyterNotebooks))

## <font color='green'>  Communicating different packages </font>

We can use the previous optimized geometry for further calculations using for instance another package like *Orca* to run a frequencies calculation,

In [40]:
from qmworks import orca
s2 = Settings()
s2.specific.orca.main = "freq"
s2.specific.orca.basis.basis = 'sto_sz'
s2.specific.orca.method.functional = 'lda'
s2.specific.orca.method.method = 'dft'

job_freq = orca(s2, optmized_mol_adf)

frequencies = job_freq.frequencies

The whole script is
```python
from qmworks import (adf, orca, run, Settings)
from plams import Molecule
import plams

def main():
    acetonitrile = Molecule("files/acetonitrile.xyz")
    s = Settings()
    s.basis = "DZP"
    s.specific.adf.basis.core = "large"

    acetonitrile = Molecule("files/acetonitrile.xyz")
    job = adf(inp, acetonitrile)
    optmized_mol_adf = job.molecule

    s2 = Settings()
    s2.specific.orca.main = "freq"
    s2.specific.orca.basis.basis = 'sto_sz'
    s2.specific.orca.method.functional = 'lda'
    s2.specific.orca.method.method = 'dft'

    job_freq = orca(s2, optmized_mol_adf)
    frequencies = job_freq.frequencies
    
    print(run(frequencies))
    
if  __name__ == "__main__":
    main()
```

Once you run the script an input file for the *ADF*  and *Orca* jobs are created. The *ADF*  input looks like

```
Atoms
      1         C      2.419290      0.606560      0.000000 
      2         C      1.671470      1.829570      0.000000 
      3         N      1.065290      2.809960      0.000000 
      4         H      2.000000      0.000000      1.000000 
      5         H      2.000000      0.000000     -1.000000 
      6         H      3.600000      0.800000      0.000000 
End

Basis
  Type DZP
End

Constraints
  Atom 1
  Atom 2
  Atom 3
End

Geometry
  Optim cartesian
End

Integration
  Accint 6.0
End

Scf
  Converge 1e-06
  Iterations 100
End
```

## <font color='green'> How the run function works? </font>
### <font color='green'> A little discussion about graphs </font>

*qmworks* is meant to be used for both workflow generation and execution. When you write a python script representing a workflow you are explicitly declaring  set of computations and their dependencies. For instance the following workflow represent *ADF* and *Orca* computations of above mentioned examples. In this [graph](https://en.wikipedia.org/wiki/Graph_theory) the octagons represent quantum simulation using a package, while the ovals represent both user input or data extracted from a simulation. Finally, the arrows (called edges) represent the dependencies between all these objects.
<img src="files/simple_graph.png">

**QMWorks** automatically identify the dependencies between computations and run them in the correct order (if possible in parallel).

## <font color='green'> Running in a supercomputer </font>

Running in **Cartesius** or **Bazis** through the *Slurm* resource manager can be done using and script like

```bash
#!/bin/bash
#SBATCH -t 00:10:00
#SBATCH -N 1
#SBATCH -n 8

module load orca
module load adf/2016.102

source activate qmworks
python optimization_ADF_freq_ORCA.py
```

The Slurm output looks like:

```
load orca/3.0.3 (PATH)
discarding /home/user/anaconda3/envs/qmworks/bin from PATH
prepending /home/user/anaconda3/envs/qmworks/bin to PATH
[11:17:59] PLAMS working folder: /nfs/home/user/orca/Opt/example/plams.23412
+-(running jobs)
| Running adf ...
[11:17:59] Job ADFjob started
[11:18:18] Job ADFjob finished with status 'successful' 
[11:18:18] Job ORCAjob started
[11:18:26] Job ORCAjob finished with status 'successful' 

[    0.           0.           0.           0.           0.           0.
  -360.547382  -360.14986    953.943089   954.3062    1049.2305
  1385.756519  1399.961717  1399.979552  2602.599662  3080.45671
  3175.710785  3177.612274]
  ```

## <font color='green'> 6. Restarting a simulation </font>

If you are running many computationally expensive calculations in a supercomputer, it can happen that the computations take more time than that allowed by the resource manager in your supercomputer and the workflows gets cancel.  But do not worry, you do not need to re-run all the computations. Fortunately, *QMWorks* offers a mechanism to restart the workflow computations.

When running a workflow you will see that *QMWorks* creates a set of files called ``cache``. These files contain the information about the workflow and its calculation. **In order to restart a workflow you only need to relaunch it**, that's it!
