# Saving SCF results on disk and SCF checkpoints

For longer DFT calculations it is pretty standard to run them on a cluster
in advance and to perform postprocessing (band structure calculation,
plotting of density, etc.) at a later point and potentially on a different
machine.

To support such workflows DFTK offers the two functions `save_scfres`
and `load_scfres`, which allow to save the data structure returned
by `self_consistent_field` on disk or retrieve it back into memory,
respectively. For this purpose DFTK uses the
[JLD2.jl](https://github.com/JuliaIO/JLD2.jl) file format and Julia package.

> **Availability of `load_scfres`, `save_scfres` and checkpointing**
>
> As JLD2 is an optional dependency of DFTK these three functions are only
> available once one has *both* imported DFTK and JLD2 (`using DFTK`
> and `using JLD2`).

To illustrate the use of the functions in practice we will compute
the total energy of the O₂ molecule at PBE level. To get the triplet
ground state we use a collinear spin polarisation
(see Collinear spin and magnetic systems for details)
and a bit of temperature to ease convergence:

In [1]:
using DFTK
using LinearAlgebra
using JLD2

d = 2.079  # oxygen-oxygen bondlength
a = 9.0    # size of the simulation box
lattice = a * I(3)
O = ElementPsp(:O, load_psp("hgh/pbe/O-q6.hgh"))
atoms     = [O, O]
positions = d / 2a * [[0, 0, 1], [0, 0, -1]]
magnetic_moments = [1., 1.]

Ecut  = 10  # Far too small to be converged
model = model_DFT(lattice, atoms, positions;
                  functionals=PBE(),
                  temperature=0.02, smearing=Smearing.Gaussian(),
                  magnetic_moments)
basis = PlaneWaveBasis(model; Ecut, kgrid=[1, 1, 1])

scfres = self_consistent_field(basis, tol=1e-2, ρ=guess_density(basis, magnetic_moments))
save_scfres("scfres.jld2", scfres);

n     Energy            log10(ΔE)   log10(Δρ)   Magnet   Diag   Δtime
---   ---------------   ---------   ---------   ------   ----   ------
  1   -27.64472902151                   -0.13    0.001    6.0    150ms
  2   -28.92293329081        0.11       -0.82    0.672    2.0    105ms
  3   -28.93095543028       -2.10       -1.14    1.171    2.0   98.5ms
  4   -28.93763235131       -2.18       -1.18    1.765    2.0   81.2ms
  5   -28.93954492025       -2.72       -1.51    1.997    2.0   89.1ms
  6   -28.93959878008       -4.27       -2.00    1.978    1.0   79.1ms
  7   -28.93961180431       -4.89       -2.79    1.986    1.0   93.4ms


In [2]:
scfres.energies

Energy breakdown (in Ha):
    Kinetic             16.7715337
    AtomicLocal         -58.4947213
    AtomicNonlocal      4.7096877 
    Ewald               -4.8994689
    PspCorrection       0.0044178 
    Hartree             19.3610006
    Xc                  -6.3912242
    Entropy             -0.0008372

    total               -28.939611804315

The `scfres.jld2` file could now be transferred to a different computer,
Where one could fire up a REPL to inspect the results of the above
calculation:

In [3]:
using DFTK
using JLD2
loaded = load_scfres("scfres.jld2")
propertynames(loaded)

(:α, :history_Δρ, :converged, :occupation, :occupation_threshold, :algorithm, :basis, :runtime_ns, :n_iter, :n_matvec, :history_Etot, :εF, :energies, :ρ, :timedout, :n_bands_converge, :eigenvalues, :ψ, :ham)

In [4]:
loaded.energies

Energy breakdown (in Ha):
    Kinetic             16.7715337
    AtomicLocal         -58.4947213
    AtomicNonlocal      4.7096877 
    Ewald               -4.8994689
    PspCorrection       0.0044178 
    Hartree             19.3610006
    Xc                  -6.3912242
    Entropy             -0.0008372

    total               -28.939611804315

Since the loaded data contains exactly the same data as the `scfres` returned by the
SCF calculation one could use it to plot a band structure, e.g.
`plot_bandstructure(load_scfres("scfres.jld2"))` directly from the stored data.

Notice that both `load_scfres` and `save_scfres` work by transferring all data
to/from the master process, which performs the IO operations without parallelisation.
Since this can become slow, both functions support optional arguments to speed up
the processing. An overview:
- `save_scfres("scfres.jld2", scfres; save_ψ=false)` avoids saving
  the Bloch wave, which is usually faster and saves storage space.
- `load_scfres("scfres.jld2", basis)` avoids reconstructing the basis from the file,
  but uses the passed basis instead. This save the time of constructing the basis
  twice and allows to specify parallelisation options (via the passed basis). Usually
  this is useful for continuing a calculation on a supercomputer or cluster.

See also the discussion on Input and output formats on JLD2 files.

## Checkpointing of SCF calculations
A related feature, which is very useful especially for longer calculations with DFTK
is automatic checkpointing, where the state of the SCF is periodically written to disk.
The advantage is that in case the calculation errors or gets aborted due
to overrunning the walltime limit one does not need to start from scratch,
but can continue the calculation from the last checkpoint.

The easiest way to enable checkpointing is to use the `kwargs_scf_checkpoints`
function, which does two things. (1) It sets up checkpointing using the
`ScfSaveCheckpoints` callback and (2) if a checkpoint file is detected,
the stored density is used to continue the calculation instead of the usual
atomic-orbital based guess. In practice this is done by modifying the keyword arguments
passed to # `self_consistent_field` appropriately, e.g. by using the density
or orbitals from the checkpoint file. For example:

In [5]:
checkpointargs = kwargs_scf_checkpoints(basis; ρ=guess_density(basis, magnetic_moments))
scfres = self_consistent_field(basis; tol=1e-2, checkpointargs...);

n     Energy            log10(ΔE)   log10(Δρ)   Magnet   α      Diag   Δtime
---   ---------------   ---------   ---------   ------   ----   ----   ------
  1   -27.64322612132                   -0.13    0.001   0.80    6.5    223ms
  2   -28.92298364772        0.11       -0.82    0.675   0.80    2.0    863ms
  3   -28.93101523705       -2.10       -1.14    1.176   0.80    2.5    124ms
  4   -28.93768029200       -2.18       -1.19    1.769   0.80    2.0   88.5ms
  5   -28.93946466596       -2.75       -1.29    1.999   0.80    1.5   92.1ms
  6   -28.93959412470       -3.89       -1.94    1.977   0.80    1.0   86.1ms
  7   -28.93961177868       -4.75       -3.16    1.985   0.80    1.0   85.9ms


Notice that the `ρ` argument is now passed to kwargs_scf_checkpoints instead.
If we run in the same folder the SCF again (here using a tighter tolerance),
the calculation just continues.

In [6]:
checkpointargs = kwargs_scf_checkpoints(basis; ρ=guess_density(basis, magnetic_moments))
scfres = self_consistent_field(basis; tol=1e-3, checkpointargs...);

n     Energy            log10(ΔE)   log10(Δρ)   Magnet   α      Diag   Δtime
---   ---------------   ---------   ---------   ------   ----   ----   ------
  1   -28.93961281556                   -3.34    1.985   0.80   12.0    245ms


Since only the density is stored in a checkpoint
(and not the Bloch waves), the first step needs a slightly elevated number
of diagonalizations. Notice, that reconstructing the `checkpointargs` in this second
call is important as the `checkpointargs` now contain different data,
such that the SCF continues from the checkpoint.
By default checkpoint is saved in the file `dftk_scf_checkpoint.jld2`, which can be changed
using the `filename` keyword argument of `kwargs_scf_checkpoints`. Note that the
file is not deleted by DFTK, so it is your responsibility to clean it up. Further note
that warnings or errors will arise if you try to use a checkpoint, which is incompatible
with your calculation.

We can also inspect the checkpoint file manually using the `load_scfres` function
and use it manually to continue the calculation:

In [7]:
oldstate = load_scfres("dftk_scf_checkpoint.jld2")
scfres   = self_consistent_field(oldstate.basis, ρ=oldstate.ρ, ψ=oldstate.ψ, tol=1e-4);

n     Energy            log10(ΔE)   log10(Δρ)   Magnet   Diag   Δtime
---   ---------------   ---------   ---------   ------   ----   ------
  1   -28.93872217176                   -2.33    1.986    5.0    134ms
  2   -28.93948954523       -3.11       -2.73    1.985    1.0   86.5ms
  3   -28.93961175159       -3.91       -2.63    1.985    4.0    119ms
  4   -28.93961242720       -6.17       -2.76    1.985    1.0   73.2ms
  5   -28.93961285253       -6.37       -2.92    1.985    1.0   73.2ms
  6   -28.93961293015       -7.11       -2.99    1.985    1.0   73.4ms
  7   -28.93961311080       -6.74       -3.31    1.985    1.0   74.4ms
  8   -28.93961315397       -7.36       -3.64    1.985    1.5   96.6ms
  9   -28.93961316724       -7.88       -3.80    1.985    2.0   83.9ms
 10   -28.93961317214       -8.31       -4.84    1.985    1.5   80.2ms


Some details on what happens under the hood in this mechanism: When using the
`kwargs_scf_checkpoints` function, the `ScfSaveCheckpoints` callback is employed
during the SCF, which causes the density to be stored to the JLD2 file in every iteration.
When reading the file, the `kwargs_scf_checkpoints` transparently patches away the `ψ`
and `ρ` keyword arguments and replaces them by the data obtained from the file.
For more details on using callbacks with DFTK's `self_consistent_field` function
see Monitoring self-consistent field calculations.

(Cleanup files generated by this notebook)

In [8]:
rm("dftk_scf_checkpoint.jld2")
rm("scfres.jld2")