# Basics of BigDFT: N2 molecule as example

This is a simple notebook that shows how to execute a simple calculation with BigDFT.
You will learn how to manipulate basic DFT objects from a python script.
This expands some of the concepts which have been briefly introduced in the quickstart tutorial.

## Setup Environment

Recall that installing conda will crash the kernel. This is necessary.

In [None]:
! pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
! conda install -c "conda-forge/label/bigdft_dev" bigdft-suite  > /dev/null  2> /dev/null

In [None]:
! pip install PyBigDFT  > /dev/null  2> /dev/null
! pip install py3dmol  > /dev/null  2> /dev/null

## How to perform a first run with default parameters

For this tutorial, we will run just a simple N2 molecule, taken from the BigDFT database (available [here](https://gitlab.com/l_sim/bigdft-suite/-/tree/devel/PyBigDFT/BigDFT/Database/XYZs)).

In [None]:
from BigDFT.Systems import System
from BigDFT.Fragments import Fragment
from BigDFT.IO import XYZReader

N2 = System()
with XYZReader("N2") as ifile:
    N2["N:0"] = Fragment([next(ifile)])
    N2["N:1"] = Fragment([next(ifile)])

Every system object has a `posinp` representation, which is a dictionary representation of the geometry.

In [None]:
print(N2.get_posinp())

We now run the DFT calculation (with the default input parameters) of this molecule.
This can be done by instantiating a class from the `Calculators` module. Then we call the run method of that calculator with the system of interest.

In [None]:
from BigDFT import Calculators as C
study = C.SystemCalculator(verbose=False, skip=True) #Create a calculator
log = study.run(sys=N2, name="N2", run_dir='scratch') #run the code

The `run` method of the `BigDFT.Calculators.SystemCalculator` class shows in the standard output which is the 
command that is invoked.
Then, an instance of the `Logfile` class is returned.
Such class can be used (we will see after) to extract the information about the electronic structure of the system.

Now we can retrieve important informations on the run. See the examples below:

In [None]:
log.energy #this value is in Ha

In [None]:
log.evals[0].tolist() # the eigenvalues in Ha ([0] stands for the first K-point, here meaningless)

In [None]:
log.get_dos().plot() #the density of states

In case it's of interest, the tutorial [DoS Manipulation](https://githubtocolab.com/BigDFT-group/bigdft-school/blob/main/CCP_tutorials/1.J.DoS-Manipulation.ipynb) (created for a separate BigDFT tutorial) can be followed to learn how to plot Density of States from a Logfile

## How to use the SystemCalculator instance 

Before managing different calculations at once, let us look closer at the `SystemCalculator` that was presented above.

In [None]:
study = C.SystemCalculator(verbose=False, omp=1, mpi_run='mpirun -np 2')

As you can see, initiating an instance allows to set the computational parameters such as the OpenMP and MPI parallelisation. In this case, a single thread is used while two processes are running in parallel, therefore reducing the computation time. 
The global options of the runner (or calculator) can then be accessed by 

In [None]:
study.global_options()

Global options can be also added and removed as follows

In [None]:
study.update_global_options(new_option = 'value')
study.global_options()

Similarly, global options can also be removed. Since we can't run with OpenMP or MPI on colab, we'll remove the omp and mpi_run arguments

In [None]:
study.pop_global_option('omp')
study.pop_global_option('mpi_run')
study.global_options()

Lastly, note that PyBigDFT also enables to run workflows remotely, although this will not be covered in this school.

## How to modify the input parameters from the default ones

To specify non-default input parameters to the code, we should employ a `Inputfile` object. One can choose a naming prefix for a run, which enables us to classify the runs which are performed, and ease the subsequent postprocessing.

Methods of the class can be employed for modifying the input parameters. For instance, the XC functional can be chosen via the `set_xc` method.
Imagine for example that you are interested in visualizing the wavefunctions output of the calculation. To do that, the suitable method of the class instance should be called.
Create a new calculation set by using, for instance, the `LDA` prefix.

In [None]:
from BigDFT import Inputfiles as I

inp = I.Inputfile()
inp.set_xc('LDA')
inp.write_orbitals_on_disk()

The wavefunctions will be present at the end of calculation, by indicating the value of the key *orbitals* in the output dictionary.

In [None]:
study.update_global_options(skip=True) # this would avoid to rerun the calculation and consider the data on the drive

In [None]:
#Run the code with the LDA prefix
LDA = study.run(name="LDA",input=inp,posinp=N2.get_posinp(),run_dir='scratch')

When using a naming scheme, the output files are placed in a directory called  **data-LDA**. In our LDA example, the wavefunctions of the system can thus be found in the **data-LDA** directory:

In [None]:
!ls scratch/data-LDA

Here 001 means the first K-point (meaningless in this case), N stands for non spin-polarized, R for real part and the remaining number is the orbital ID. Post-processing of these files can be done for visualisation purposes, or they can be used as an input for another calculation, e.g. for restarting an unconverged calculation.

In the same spirit, another calculation can be done with different parameters. Imagine we want to perform a Hartree-Fock calculation.
We should simply change the XC functional used. However, we also have to specify the pseudopotential.

In [None]:
inp.set_xc('HF')
HF = study.run(name="HF",input=inp,run_dir='scratch') #Run the code with the name scheme HF

Without specifying the pseudopotential, an error will occur:

In [None]:
!tail scratch/log-HF.yaml

The same error is specified in [debug/bigdft-err-0.yaml](./scratch/debug/bigdft-err-0.yaml)

This is because the pseudopotential is assigned by default in the code only for LDA and PBE XC approximations. 
Another alternative is to specify the internal PSP that might be used, taking from the default database of BigDFT. This can be done as follows:

In [None]:
inp['psppar.N']={'Pseudopotential XC': 1} #here 1 stands for LDA as per the XC codes

There are also some routines built into PyBigDFT for setting Krack or NLCC pseudoptentials.

In [None]:
# inp.set_psp_krack(functional="LDA")
# inp.set_psp_nlcc()

When possible, care should be taken in choosing a pseudopotential which has been generated with the same XC approximation used. Unfortunately, at present HGH data are only available for semilocal functionals. For example, the same exercise as follows could have been done with Hybrid XC functionals, like for example PBE0 (ixc=-406). In the case of Hartree-Fock calculations, using semilocal functionals generally yield accurate results (see [Physical Review B 37.5 (1988): 2674](https://journals.aps.org/prb/pdf/10.1103/PhysRevB.37.2674)). 

In BigDFT, XC functionals are specified using the built in named functionals, or using the [LibXC codes](https://www.tddft.org/programs/libxc/functionals/).

Now we can run the Hartree Fock calculation, which will take around 30 seconds

In [None]:
HF = study.run(name="HF",input=inp,run_dir='scratch') #Run the code with the name scheme HF

Let's also now run using the PBE0 functional

In [None]:
inp.set_xc('PBE0')
PBE0 = study.run(name="PBE0",input=inp,run_dir='scratch') #Run the code with the name scheme PBE0

The variables *LDA*, *HF*, and *PBE0* contains all information about the calculation. This is a class Logfile which simplify considerably the extraction of parameters of the associated output file *log-LDA.yaml*. If we simply type:

In [None]:
print(LDA)

We display some information about the LDA calculation. For instance, we can extract the eigenvalues of the Hamiltonian *i.e.* the DOS (Density of States), in Ha:

In [None]:
LDA.evals[0].tolist()

Note that *LDA.log* is the python dictionary associated with the full output file, which is in yaml format, so if know the relevant key we can extract arbitrary information, e.g. we can print information about the Poisson solver

In [None]:
LDA.log["Poisson Solver"]

The results can be compared to all-electron calculations, done with different basis sets, from references (units are eV)
(1) S.&nbsp;Hamel <i>et&nbsp;al.</i> J. Electron Spectrospcopy and Related Phenomena 123 (2002) 345-363 and (2) P.&nbsp;Politzer, F.&nbsp;Abu-Awwad, Theor. Chem. Acc. (1998), 99, 83-87:

eigenvalues          | LDA(1) | HF(1) | HF(2) | (Exp.)
---------------------|--------|-------|-------|-------
3&sigma;<sub>g</sub> | 10.36  | 17.25 | 17.31 | (15.60)
1&pi;<sub>u</sub>    | 11.84  | 16.71 | 17.02 | (16.98)
2&sigma;<sub>u</sub> | 13.41  | 21.25 | 21.08 | (18.78)

### Exercise

How do the results from the different BigDFT runs compare? What approximations are we making?

Example [solution](./solutions/02-N2-solution.py)

## The overall structure of files in the disk

In the disk, after that the run is performed, different files are created:
* [input_minimal.yaml](./N2_minimal) which contains the minimal set of input variables to perform the run;
* [log.yaml](./log-N2.yaml) which contains the log of the run with all calculated quantities;
* [time.yaml](./time-N2.yaml) and [forces_posinp.yaml](./forces_N2.yaml) which we will see later.

For its I/O, BigDFT uses the [yaml](http://www.yaml.org/spec) format.
If you look at the [N2_minimal.yaml](./scratch/N2_minimal.yaml) file, you can see:

In [None]:
! cat scratch/N2_minimal.yaml

This file could be used to rerun the calculations, e.g. on scarf.

In this case, only the atomic positions are indicated using a yaml format.

However, there is a one-to-one correspondence between a yaml file and a python dictionary. For this reason
we will create the input parameters of this runs from dictionaries. See the useful [yaml online parser](http://yaml-online-parser.appspot.com/) page to understand the correspondence.

In the log file [log.yaml](./scratch/log-N2.yaml), BigDFT automatically displays all the input parameters used for the calculations. 

The parameters which were not explicitly given are set to a default value, except the atomic positions of course, which have to be given. As we did not specified input files here, our run is a single-point LDA calculation, without spin-polarisation.