# Computing the molecular orbital overlaps and time-overlaps using Libra/CP2K interface in xTB framework

In this tutorial, we will compute the molecular orbital (MO) overlaps and time-overlaps using the Libra/CP2K interface and Libint2 library in the extenden tight-binding (xTB) framework. Detailed information are given in the [RAEDME.md](../README.md) file. In this notebook, we will test it by using only one job and for the pre-computed MD trajectory obtained from the previous step in [here](../../6_step1_cp2k). 

## Table of contents
<a name="toc"></a>
1. [Importing needed libraries](#import)
2. [Overview of required files](#required_files)
3. [Computing the overlap calculations](#comp_overlap)     
    3.1. [Loading compilers to run CP2K](#load_compilers)\
    3.2. [Starting the calculations](#start_overlap_calculations)
4. [Checking the orthonormality of the wavefunctions](#check_ortho)

### A. Learning objectives

* To be able to run the molecular orbital overlap calculations (running step2)
* To be able to load and read `scipy.sparse` data files (`.npz` files) and check the orthonormality of the wavefunctions

### B. Use cases

* [Running molecular orbital overlap calculations](#comp_overlap)
* [Checking the orthonormality of the wavefunctions](#check_ortho)


### C. Functions

- `libra_py`
  - `CP2K_methods`
    -  [`generate_translational_vectors`](#start_overlap_calculations)
  - `workflows`
    - `nbra`
      - [`step2`](#start_overlap_calculations)


## 1. Importing needed libraries <a name="import"></a>
[Back to TOC](#toc)

Since the data are stored in sparse format using `scipy.sparse` library, we need to load this library so that we can read and check the orthonormality of the data.
Import `numpy`, `scipy.sparse`, `CP2K_methods`, and `step2` modules using the following commands:

In [2]:
import os
import sys
import numpy as np
import scipy.sparse as sp
from libra_py import CP2K_methods
from libra_py.workflows.nbra import step2

## 2. Overview of required files <a name="required_files"></a>
[Back to TOC](#toc)

The following file is needed to run the calculatoins for computing the MO overlaps.

* `es_ot_temp.inp`

A sample CP2K input file to run the electronic structure calculations using orbital transformation(OT) method. This file can be a copy of the [MD input](../../6_step1_cp2k/2_xTB/md.inp) but with `RUN_TYPE ENERGY` in the `&GLOBAL` section.

* `es_diag_temp.inp`

This file is exactly the same as the `es_ot_temp.inp` file but the scf cycle is done using diagonalization method not  OT.

* `../../../6_step1_cp2k/2_xTB/c3n4_1x1_MD_xTB-pos-1.xyz`

The MD trajectory `.xyz` file obtained from [step1](../../6_step1_cp2k/2_xTB/tutorial.ipynb). CP2K stores the MD trajectory data in `*-pos-*.xyz` files.


# 3. Computing the overlap calculations <a name="comp_overlap"></a>
[Back to TOC](#toc)

## 3.1 Loading compilers to run CP2K <a name="load_compilers"></a>


We start by loading the necessary compilers to be able to run CP2K on the background. These include compilers used to compile the CP2K. Here we have compiled it using the Intel paraller studio 2020. More information on the compilation can be found in [here](https://github.com/compchem-cybertraining/Tutorials_CP2K/blob/master/INSTALLATION.md).

In [2]:
!module load intel/20.2
!module load intel-mpi/2020.2

 The Intel 20.2 compilers are in your path. This is adequate for compiling and
running most codes. Source compilervars.sh for more features including the
debugger. 
 The Intel MPI 2020.2 is in your path. This is adequate for compiling and
running most codes. Run "source
/util/academic/intel/20.2/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/mpivars.sh"
for more features including the debugger. 


## 3.2 Starting the calculations <a name="start_overlap_calculations"></a>
[Back to TOC](#toc)

The following cell will perform the MO overlap calculations using the molecular dynamics trajectory file and the CP2K input template file (`es_diag_temp`). The details about the parameters are brought in the [Readme.md](../Readme.md) file but we repeat them here again:

`path`: This will be the path to the job folder in which the calculations will be done. It is set to `os.getcwd()`. Please do not change this.

`params['nprocs']`: The number of processors to use for the calculations. This will be both the number of processors used
by CP2K and the number of processors that will be used to compute the AO overlap matrices.

`params['istep']`: The initial time step for this job. Libra will choose the `istep` from the trajectory `.xyz` file.

`params['fstep']`: The final time step for this job. Again, it will be chosen from the trajectory `.xyz` file.

_*Note:*_ If you want to run it by submitting multiple jobs, do not fill them. Libra will automatically fill them based on the
number of jobs and the number of steps.

`params['lowest_orbital']`: The lowest number of orbital to be considered in the calculations. This value starts from 1.

`params['highest_orbital']`: The highest number of orbital to be considered in the calculations. This value starts from 1.

`params['isXTB']`: A boolean flag for xTB calculations. If it is set to `False` the DFT calculations will be considered. The difference between 
xTB and DFT calculations is that for diagonalization in xTB we need a converged OT wavefunction as an initial guess. Therefore,
we will need an extra input for OT calculations (the `param['cp2k_ot_input_template']`).

`params['isUKS']`: A boolean flag for the unrestricted spin Kohn-Sham calculations. If it is set to `True`, the unrestricted spin calclations is considered.
Make sure consistent keywords are used in the CP2K input template for spin-polarized calculations(the `UKS` or `LSD` keywords).

`params['is_periodic']`: A boolean flag for periodic calculations. If it is set to `True` a periodic AO overlap matrix will be computed.

`params['A_cell_vector']`: The Cartesian A cell vector as in the CP2K input file used to compute the electronic structure calculations.

`params['B_cell_vector']`: The Cartesian B cell vector as in the CP2K input file used to compute the electronic structure calculations.

`params['C_cell_vector']`: The Cartesian C cell vector as in the CP2K input file used to compute the electronic structure calculations.


`params['periodicity_type']`: This parameter is used to generate the translational vectors and shows the periodicity in each of the 
Cartesian X, Y, and Z axes. For example, if the system is a bulk structure, you can set it to `'XYZ'` and if it is a monolayer and you have vacuum in
the Z axis, you can set it to `'XY'`. 


`params['translational_vectors']`: For periodic calculations, CP2K uses a periodic Kohn-Sham Hamiltonian and AO overlap matrix. In order to
accurately compute the MO overlaps, we therefore need to compute the periodic AO overlap matrix. This will be done by computing the overlap between
the central cell and the periodic images of the central cell obtained from the translational vectors. These translational vectors are 
generated using `CP2K_methods.generate_translational_vectors`. The translational vectors are obtained with respect to the `origin`, which 
in here is `[0,0,0]`. The second argument of this function, is a list of 3 elements showing the number of periodic images in each of the 
X, Y, and Z axis respectively. Note that this includes the periodic images in the opposite directions of the axis as well. For example, `[1,1,1]` with 
`params['periodicity_type']='XY'`, computes the AO overlap between the central cell and 8 other cells and itself then sums them to get the periodic 
AO overlap. Since the periodicity is set to `'XY'`, Libra will ignore the 3rd element in this list and will generate the translational vectors 
only for X and Y directions. The following image shows the periodic cells for this configuration:

<div>
<img src="attachment:cell.png" width="200"/>
</div>


`params['is_spherical']`: A boolean flag for computing the AO overlaps in Cartesian or spherical coordinates.

`params['remove_molden']`: A boolean flag to remove or keep the `molden` files after the computations are done.

`params['res_dir']`: The full path to where the MO overlap files will be stored. 

`params['all_pdosfiles']`: The full path to where the `.pdos` files for each step will be stored.

`params['all_logfiles']`: The full path to where the `.log` files for each step will be stored.

`params['cp2k_exe']`: The full path to where the CP2K executable is. If you load CP2K using `module load`, you just need to set the executable name,
 such as `'cp2k.popt'` or `'cp2k.psmp'`.

`params['cp2k_ot_input_template']`: The full path to the CP2K OT input template for xTB calculations. As was mentioned before, we need a good guess 
for the diagonalization algorithm of the xTB calculations. A good guess can be obtained using the OT method. Libra will ignore this if the 
`params['isXTB'] = False`. In this case you can set an empty string.

`params['cp2k_diag_input_template']`: The full path to the CP2K diagonalization input template, either for DFT or xTB.

`params['trajectory_xyz_filename']`: The full path to the trajectory `.xyz` file. 

The calculations are then run using the function `step2.run_cp2k_libint_step2(params)`.

In [5]:
path = os.getcwd()
params = {}
# number of processors

params['nprocs'] = 12
# The istep and fstep
params['istep'] = 1
params['fstep'] = 5
# Lowest and highest orbital, Here HOMO is 128
params['init_state'] = 128-20
params['final_state'] = 128+21
# extended tight-binding calculation type
params['isxTB'] = True
# DFT calculation type
params['isUKS'] = False
# Periodic calculations flag
params['is_periodic'] = True
# Set the cell parameters for periodic calculations
if params['is_periodic']:
    params['A_cell_vector'] = [14.2415132523,0.0000000000,0.0000000000]
    params['B_cell_vector'] = [0.0002151559,12.3343009930,0.0000000000]
    # The vacuum 
    params['C_cell_vector'] = [0.0018975023,0.0028002808,14.9999996186]
    params['periodicity_type'] = 'XY'
    # Set the origin
    origin = [0,0,0]
    tr_vecs = params['translational_vectors'] = CP2K_methods.generate_translational_vectors(origin, [1,1,1],
                                                                                            params['periodicity_type'])
    
    print('The translational vectors for the current periodic system are:\n')
    print(tr_vecs)
    print(F'Will compute the S^AO between R(0,0,0) and {tr_vecs.shape[0]+1} translational vectors')

# The AO overlaps in spherical or Cartesian coordinates
params['is_spherical'] =  True
# Remove the molden files, which are large files for some systems, 
# after the computaion is done for tha system
params['remove_molden'] = True
# The results are stored in this folder
params['res_dir'] = path + '/res'
params['all_pdosfiles'] = path + '/all_pdosfiles'
params['all_logfiles'] = path + '/all_logfiles'
# CP2K executable 
params['cp2k_exe'] = '/projects/academic/cyberwksp21/Software/cp2k-intel/cp2k-8.2/exe/Linux-x86-64-intelx/cp2k.psmp'
# If the xTB calculations are needed, we need an OT procedure 
params['cp2k_ot_input_template'] = path + '/es_ot_temp.inp'
params['cp2k_diag_input_template'] = path + '/es_diag_temp.inp'
# The trajectory xyz file path
params['trajectory_xyz_filename'] = path + '/../../../6_step1_cp2k/2_xTB/c3n4_1x1_MD_xTB-pos-1.xyz'

step2.run_cp2k_libint_step2(params)

The translational vectors for the current periodic system are:

[[-1 -1  0]
 [-1  0  0]
 [-1  1  0]
 [ 0 -1  0]
 [ 0  1  0]
 [ 1 -1  0]
 [ 1  0  0]
 [ 1  1  0]]
Will compute the S^AO between R(0,0,0) and 9 translational vectors
-----------------------Start-----------------------
-----------------------Step 1-----------------------
**************** Running CP2K ****************
Step 1 Computing the OT method wfn file...
Done with OT wfn. Elapsed time: 1.9777138233184814
Computing the wfn file using diagonalization method...
Done with diagonalization. Elapsed time: 1.1838455200195312
Done with step 1 Elapsed time: 3.188774347305298
Creating shell...
Done with creating shell. Elapsed time: 0.04091310501098633
Reading energies and eigenvectors....
Done with reading energies and eigenvectors. Elapsed time: 0.15270137786865234
Computing atomic orbital overlap matrix...
Computing the AO overlaps between R(-1,-1,0) and R(0,0,0)
Computing the AO overlaps between R(-1,0,0) and R(0,0,0)
Computing

# 4. Checking the orthonormality of the wavefunctions <a name="check_ortho"></a>
[Back to TOC](#toc)

Sometimes, some atoms in the system have still significant overlap with some other atoms in the periodic system further from the range of the translational vectors specified. Therefore, more translational vectors are needed to better compute the atomic orbital overlaps so that the wavefunctions to be orthonormal. We can check this by printing out the diagonal elements of the S matrices stored in the `res` directory.

In [6]:
# Load sample file for S and St matrices and then 
# print the diagonal to check the orthonormality of the wavefunctions
S = sp.load_npz('res/S_ks_1.npz').todense()
print('S matrix:\n',np.diag(S))
St = sp.load_npz('res/St_ks_1.npz').todense()
print('St matrix:\n',np.diag(St))

S matrix:
 [0.99999975+0.j 0.99999986+0.j 0.99999972+0.j 0.99999967+0.j
 0.9999997 +0.j 0.99999987+0.j 0.99999988+0.j 0.99999983+0.j
 0.99999978+0.j 0.99999963+0.j 0.99999965+0.j 0.99999948+0.j
 0.99999988+0.j 0.99999989+0.j 0.99999967+0.j 0.99999968+0.j
 0.99999949+0.j 0.99999952+0.j 0.99999953+0.j 0.99999944+0.j
 0.9999998 +0.j 0.99999963+0.j 0.99999959+0.j 0.9999996 +0.j
 0.99999966+0.j 0.99999954+0.j 0.99999959+0.j 0.99999962+0.j
 0.99999945+0.j 0.9999994 +0.j 0.99999955+0.j 0.99999937+0.j
 0.99999949+0.j 0.99999941+0.j 0.9999993 +0.j 0.99999941+0.j
 0.99999928+0.j 0.99999925+0.j 0.99999927+0.j 0.99999927+0.j
 0.99999922+0.j 0.99999923+0.j 0.99999975+0.j 0.99999986+0.j
 0.99999972+0.j 0.99999967+0.j 0.9999997 +0.j 0.99999987+0.j
 0.99999988+0.j 0.99999983+0.j 0.99999978+0.j 0.99999963+0.j
 0.99999965+0.j 0.99999948+0.j 0.99999988+0.j 0.99999989+0.j
 0.99999967+0.j 0.99999968+0.j 0.99999949+0.j 0.99999952+0.j
 0.99999953+0.j 0.99999944+0.j 0.9999998 +0.j 0.99999963+0.j
 0.99999959+0