# Tutorial 3

This tutorial is about generating parameter files for subsequent 'ab initio'
tight-binding (AITB) calculations with [Tibi]( https://mvdb.gitlab.io/tibi),
again taking silicon as an example.

In contrast to the previous tutorials, which involved writing and running
Python code, we will here make use of the `hotcent-basis` and `hotcent-tables`
command line tools. Note that these rely on pseudopotential calculations (as
required for AITB) and do not support all-electron calculations.

## Step 1: selecting a pseudopotential

Pseudoatomic DFT calculations in Hotcent currently require norm-conserving
pseudopotentials (NCPPs) in the Siesta format (`.psf` extension). These can be

* downloaded from online repositories such as the
  [NNIN/C Pseudopotential Virtual Vault](https://nninc.cnf.cornell.edu)
* generated using programs such as the [Atomic Pseudopotential Engine (APE)](
  https://www.tddft.org/programs/APE/)

A couple of things to consider:

* The `psf`-formatted NCPPs from NNIN/C appear to be fine for main group elements.
  The ones for transition metals (especially $3d$ ones) provide an appropriate
  starting point but can be further improved by regenerating them with e.g.
  shorter core radii and with non-linear core corrections.

* For certain elements, such as alkali(ne earth) and transition metals, it is
  common to include semi-core electrons as part of the valence for accurate DFT
  calculations (e.g. iron's $3s^2$ $3p^6$ states). For AITB applications,
  however, the accuracy gains (if any) will usually be considered too small to
  justify the large increase in computational cost. Note also that Hotcent
  currently always uses one Kleinman-Bylander projector per angular momentum,
  which is not sufficient when including semi-core states.

For this tutorial we will simply use the Si LDA pseudopotential from NNIN/C:

In [None]:
! wget https://nninc.cnf.cornell.edu/psp_files/Si.psf -O Si.psf

# To plot the NCPP's local and semilocal potentials that we'll be using:
! python -m hotcent.kleinman_bylander Si.psf --lmax=3 --local-component=siesta

<img src="./Si_potentials.png" style="width: 500px;">

## Step 2: generating the basis set

Now we can generate an LCAO basis set with the `hotcent-basis` tool
(which is in some ways similar to how the [gpaw-basis](
https://wiki.fysik.dtu.dk/gpaw/documentation/lcao/lcao.html#basis-set-generation)
tool works).

The primary attributes that need to be determined here are the cutoff radii
$r_{cut}$ for the soft confinement potentials of the first-$\zeta$ basis
functions. This is usually done by first specifying how much an atomic
eigenvalue should shift upward with respect to the free atom and then searching
for the cutoff that produces this shift. Another attribute is the
characteristic radius $r_{pol}$ for the polarization functions.

These $r_{cut}$ and $r_{pol}$ values, together with other influential
parameters, will be stored in a YAML file which encodes how the pseudoatomic
DFT calculations need to be run and how the main (and auxiliary) basis sets
should be generated.

The cell below will generate such a YAML file for Si, which will be called
`Si.lda.yaml`. We'll be using the LDA functional and will ask for a
double-$\zeta$-plus-polarization ('dzp') main basis with $r_{cut}$ values
obtained via a 'self-consistent' energy shift approach with an upshift of
2.2 eV for the first-$\zeta$ $3s$ and $3p$ main basis functions. We also
already specify the auxiliary basis size ('3D', i.e. a set of triple-$\zeta$
$s-$, $p-$ and $d-$type functions).

In [None]:
# This can take up to one minute
! hotcent-basis Si --pseudo-path=. --label=lda --xcfunctional=LDA --basis=dzp \
                   --configuration=[Ne],3s2,3p2 --valence=3s,3p --aux-basis=3D \
                   --rcut-approach=energy_shift_user_sc --energy-shift=2.2 \
                   --plot --quiet

In [None]:
# To show the contents of the YAML file:
from IPython.display import display, Pretty
display(Pretty('Si.lda.yaml'))

As you can see, the YAML file also contains a number of default choices that
were not entered via the command line, such as the amplitude and inner cutoff
fraction for the soft confinement and the parameters for generating the
higher-$\zeta$ basis functions.

Some more notes:
* Execute `hotcent-basis --help` for an overview of all available options.
* If unsure about which electronic configuration to choose, it is usually best
  to take the same as the one with which the pseudopotential was generated.
* For transition metals it can be necessary to manually modify the start of
  the radial mesh with the option `--rmin=1e-4`.
* The run also produced a `Si.lda.ion` file which can be used to perform
  DFT calculations with the same basis set and pseudopotential, using the
  `User.Basis` option of the [Siesta](https://siesta-project.org/siesta) code.

The YAML file does not contain the basis functions themselves, but these did
get plotted via the `--plot` option:

* the main basis functions (`Si_Rnl.lda.png`):

<img src="./Si_Rnl.lda.png" style="width: 500px;">

* the auxiliary basis functions (`Si_Anl.lda.png`):

<img src="./Si_Anl.lda.png" style="width: 500px;">

## Step 3: generating the integral tables

Everything is now in place to generate the various integral tables needed for
AITB calculations with Tibi (more precisely the 3cTB-GY method). This is where
the `hotcent-tables` tool comes in. Let's start with a dry run to see which
tasks will need to be performed (each task roughly corresponds to one table
type):

In [None]:
! hotcent-tables Si --tasks=all --dry-run

The task names should be interpreted like this:

* '2c' and '3c' stand for 'two-center' and 'three-center'
* 'on' and 'off' stand for 'on-site' and 'off-site'
* 'chg' refers to the Hartree-XC ('U') kernel
* 'mag' refers to the spin-resolved XC ('W') kernel
* 'map' refers to the Giese-York mapping integrals
* 'rep' refers to the repulsive potential
* 'on2c/on3c/off2c/off3c' are for the Hamiltonian integrals
  (in the case of 'off2c' also including the overlap integrals)

For a real `hotcent-tables` run we also need to pass the appropriate `--label`
and `--pseudo-path`. We'll furthermore apply loose grid settings via the
`--grid-opt-{int,tab}=...` options -- this is just to speed up the tutorial and
is not recommended for production runs! Finally, we set `--processes=2` to
distribute the tasks over two (single core) processes.

In [None]:
# This will take about 10 minutes
! hotcent-tables Si --tasks=all --label=lda --pseudo-path=. \
                    --grid-opt-int=grid_opt_int.yaml \
                    --grid-opt-tab=grid_opt_tab.yaml \
                    --processes=2

This hopefully resulted in a bunch of files inside the `tables_Si` directory.

There are typically more than one file per task because each file only covers
(main or auxiliary) basis functions from one '$\zeta$-pair'. For example, the
`Si-Si++_onsiteU.1ck` file contains the matrix elements of the
one-center-expanded U kernel involving same-center pairs of first- and
third-$\zeta$ auxiliary basis functions for Si.

By default an SCF-related task will also create files for both Giese-York and
Mulliken mapping (indicated with 'GY' and 'Mu' below). This can be controlled
via the `--aux-mappings` option.

In [None]:
! echo '### Overview of the different tasks and integral table types'
! echo '# from chgon1c (GY):'; ls tables_Si/*_onsiteU.1ck
! echo '# from chgon1c (Mu):'; ls tables_Si/*_hubbard_values.txt
! echo '# from magon1c (GY):'; ls tables_Si/*_onsiteW.1ck
! echo '# from magon1c (Mu):'; ls tables_Si/*_spin_constants.txt
! echo '# from map1c (GY):'; ls tables_Si/*_onsiteM_*.1cm
! echo '# from off2c:'; ls tables_Si/*_offsite2c.skf
! echo '# from on2c:'; ls tables_Si/*_onsite2c_*.skf
! echo '# from chgoff2c (GY):'; ls tables_Si/*_offsiteU.2ck 
! echo '# from chgoff2c (Mu):'; ls tables_Si/*_offsiteU.2cl
! echo '# from chgon2c (GY):'; ls tables_Si/*_onsiteU_*.2ck
! echo '# from chgon2c (Mu):'; ls tables_Si/*_onsiteU_*.2cl
! echo '# from magoff2c (GY):'; ls tables_Si/*_offsiteW.2ck
! echo '# from magoff2c (Mu):'; ls tables_Si/*_offsiteW.2cl
! echo '# from magon2c (GY):'; ls tables_Si/*_onsiteW_*.2ck
! echo '# from magon2c (Mu):'; ls tables_Si/*_onsiteW_*.2cl
! echo '# from map2c (GY):'; ls tables_Si/*_offsiteM_*.2cm
! echo '# from rep2c:'; ls tables_Si/*_repulsion2c.spl
! echo '# from off3c:'; ls tables_Si/*_offsite3c_*.3cf
! echo '# from on3c:'; ls tables_Si/*_onsite3c_*.3cf
! echo '# from rep3c:'; ls tables_Si/*_repulsion3c_*.3cf

To complete the dataset we still need to paste the `Si-Si_offsite2c.skf` and
`Si-Si_repulsion2c.spl` files together into `Si-Si.skf` and copy the other
`*_offsite2c.skf` files to just `*.skf`. This is to comply with the usual
(DFTB) format for this kind of tables and is easily accomplished with
`hotcent-concat`:

In [None]:
! cd tables_Si && hotcent-concat -b dzp Si

For a 3cTB-GY calculation with Tibi you will need all these files, except those
marked with 'Mu' which are for 3cTB-Mu calculations. After running `hotcent-concat`
also the `*_offsite2c.skf` and `*_repulsion2c.spl` are no longer needed, since the
corresponding `*.skf` files are read instead. In case there is no need for spin
polarization then the different `mag`-tasks can also be skipped (tip: you can do
this via `--tasks=all,^magon1c,^magon2c,^magoff2c`).

Note: for a (SCC-)DFTB-like calculation, with a minimal main basis set, a file
like `Si-Si.skf` is the only one you would need. The present `Si-Si.skf` file,
however, is not well suited for this purpose (except perhaps in the case of
gas-phase Si dimers). That is because a (semi-)empirical method like DFTB needs
empirical corrections to function (e.g. a repulsive potential fitted to
reference data so as to absorb errors arising from the more approximate DFTB
Hamiltonian).

## What's next

The parameter set inside in the `tables_Si` folder can now be used as input
for the corresponding [Tibi tutorial](https://gitlab.com/mvdb/tibi/tutorials).

### Grid settings

Keep in mind that the integration and tabulation grids are quite coarse in this
example (chosen via the `grid_opt_int.yaml` and `grid_opt_tab.yaml` files).
For production runs you can rely on the default grid settings and you should
therefore not need to use the `--grid-opt-int` and `--grid-opt-tab` options.
With these default (rather tight) settings the tasks will of course take a
longer time to complete.

### Beyond LDA

Switching to a GGA functional is mostly a matter of choosing another value
for the `-f/--xcfunctional` option in `hotcent-basis`. As Hotcent relies on
[Libxc](https://www.tddft.org/programs/Libxc/) for GGA functionals, this means
choosing values like GGA_X_PBE+GGA_C_PBE (for PBE) and
GGA_X_PBE_SOL+GGA_C_PBE_SOL (for PBEsol). For a complete overview of the
(GGA) functionals available in Libxc, either consult the [Libxc homepage](
https://www.tddft.org/programs/libxc/functionals/#gga-functionals-) or look
inside Libxc's `xc_funcs.h` header file.

Note that with GGAs it is usually a good idea to also use a pseudopotential
that has been generated with a GGA functional (preferably even the very same
GGA functional if possible).

Also keep in mind that e.g. the three-center tasks will take significantly
more time with GGA functionals compared to LDA (roughly a factor of 5).

### Multiple elements

What if your target compounds contain more than one element, e.g. both
silicon and oxygen? Then Step 2 (with `hotcent-basis`) will need to be repeated
for oxygen. And Step 3 (with `hotcent-tables`) will need to not only deal with
the corresponding O-specific tasks, but also with the ones that involve
combinations of O and Si atoms, preferably without redoing the Si-specific
tasks. You would then go about it as follows:

```bash
hotcent-tables Si,O --exclude=Si --tasks=all ...
```

Note that without the `--exclude=Si` option the Si-specific tasks would also
be redone, thereby (needlessly) regenerating and overwriting the existing
Si-specific parameter files residing in the `tables_Si` folder.

Another example: say that you also got interested in titanium and aluminium
oxides (just the binary compounds, not the ternary ones) and that you would
like to handle all the tasks in the same run for convenience or for
better parallel efficiency (as touched upon below). Assuming that you still
got the parameters for Si and O from the previous step, then you could
launch the new tasks with:

```bash
hotcent-tables Al,O Ti,O --exclude=O --tasks=all ...
```

Suppose now that you did get interested in the mixed oxides of Al and Si
after all. To then obtain the remaining parameters (involving combinations
of Al & Si and of Al, Si & O) you would need to run:

```bash
hotcent-tables Al,Si,O --exclude=Al-O,Si-O,Al,Si,O --tasks=all ...
```

Finally: the output of such multi-element runs will be stored in
folders named `tables_O-Si`, `tables_O-Si-Al` and so forth. 
For subsequent Tibi runs you will need to move their contents to a single
directory (and also apply `hotcent-concat`), e.g.:

```bash
mv tables_*/* .
hotcent-concat -b dzp Al Si O
```

### Parallelization

When the workload increases due to production-quality grids or the need for
multiple elements or GGA functionals, it will be useful to employ more
processes than in the present `--processes=2` example. Take into account,
though, that `hotcent-tables` can only parallelize over tasks and that the
three-center tasks are significantly more time consuming than the other ones.
This is also the reason why the three-center tasks are first ones to be
processed, as it helps with the load balancing / parallel efficiency.
`hotcent-tables` furthermore uses the [multiprocessing module](
https://docs.python.org/3/library/multiprocessing.html) for parallelization,
which means that it will not be able to spawn processes across multiple nodes.