Support for NC pseudos in UPF format #741

azadoks · 2022-10-03T09:30:17Z

Based on @unkcpz 's PR.

To do:

antoine-levitt · 2022-10-05T08:41:45Z

If the rab are already the integration weights, why do you do a quadrature (ctrap) rather than just a dot product?

azadoks · 2022-10-06T11:32:26Z

I wasn't sure if they account for the half-weight at zero. I did a bit of testing comparing the dot product and the ctrap with the half-weight, and they seem to be in similar agreement with HGH.

I've changed to using a dot product, it's probably a good step for better performance assuming that it's the right thing to do.

azadoks · 2022-10-06T12:21:23Z

It looks like ~78% of time in initializing the PlaneWaveBasis for small Ecut and few k-points with UPFs is taken up by _besselj, which is a call to Amos.

Allocations could still be improved,

function init_si(pseudo_type, n)
    a = 10.26  # Silicon lattice constant in Bohr
    lattice = a / 2 * [[0 1 1.];
                    [1 0 1.];
                    [1 1 0.]]
    Si = ElementPsp(:Si, psp=load_psp("$pseudo_type/pbe/si-q4.$pseudo_type"))
    atoms     = [Si, Si]
    positions = [ones(3)/8, -ones(3)/8]

    model = model_PBE(lattice, atoms, positions)
    for _ = 1:n-1
        PlaneWaveBasis(model; Ecut=15, kgrid=[4, 4, 4])
    end
    PlaneWaveBasis(model; Ecut=15, kgrid=[4, 4, 4])
end

basis = @btime init_si("upf", 1);     # benchmark
  6.753 s (26559047 allocations: 865.72 MiB)

Initializing the PlaneWaveBasis for, e.g. Ecut=60, kgrid=[12, 12, 12] takes quite a few minutes. Maybe we need to implement an approximate but optimized sphericalbesselj like what is done in QE?
I'm waiting on the profiling + benchmarking for this setup, but I'm guessing that reducing allocations is also going to be a top priority to get this down to a reasonable time.

antoine-levitt · 2022-10-06T13:39:39Z

If the profiling clearly shows up the besselj function, working on allocations isn't going to do anything.
Note https://discourse.julialang.org/t/ann-bessels-jl-bessel-hankel-and-airy-functions/87941 but that's only a factor of 2. Note though that these packages are for general (non-integer) order, and the integer-order bessel functions are considerably simpler to compute: https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions:_jn,_yn . So at worst we could just implement them by hand for low-order (only l=3 is needed, right?)

antoine-levitt · 2022-10-06T13:43:20Z

With judicious use of @inline and @simd, it should be possible to get the code to vectorize. If that still does not result in acceptable performance, we bring out the big guns (LoopVectorization)

antoine-levitt · 2022-10-06T15:04:46Z

Just did some quick tests for integrating an array of size 100000 with l=2.
Naive version (what you have, essentially): 15ms
Hand-coded spherical bessel of order 2: 1.5ms
Hand-coded + @inline + @simd : 1ms
Hand-coded + @turbo : 400μs
Hand-coded + not building x but just summing up the result: 300μs

So no need to go do complicated optimizations, it looks like :-) Here's the final version:

using SpecialFunctions
using LoopVectorization

@inline function sphericalbesselj2(order, x)
    (3/x^2-1)*sin(x)/x-3cos(x)/(x^2)
end

function compute(q, rgrid, drgrid, rprojs, order)
    N = length(rgrid)
    x = Vector{Float64}(undef, N)
    s = 0.0
    @turbo for ir = 1:N
        s += drgrid[ir]*sphericalbesselj2(order, q * rgrid[ir]) * rprojs[ir]
    end
    s
end

N = 100000
rgrid = rand(N)
drgrid = rand(N)
rprojs = rand(N)
q = 2.3
order = 2
@btime compute($q, $rgrid, $drgrid, $rprojs, $order)

antoine-levitt · 2022-10-06T15:06:19Z

Sorry, there was a mistake in the previous post (if you read this via email notifications), updated now.

azadoks · 2022-10-06T15:17:40Z

Ok sweet, I did write up the hand-coded spherical Bessels, which was already much faster. I thought that LoopVectorization wasn't a dependency of DFTK, do I didn't give it a go.

I'll work a bit on implementing like this with @turbo, and @inline and check back in, thanks!

azadoks · 2022-10-06T16:17:51Z

# For Silicon, 2-atom cell, si-q4.upf
Ecut = 80
kgrid = [12, 12, 12]
julia> basis = @benchmark PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid)  # benchmark
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 32.746 s (0.65% GC) to evaluate,
 with a memory estimate of 1.41 GiB, over 3165129 allocations.

Much better!

antoine-levitt · 2022-10-06T16:32:46Z

Still looks expensive but combined with q interpolation it should be ok. You can avoid the code duplication by using Val types and function barriers: have order be a Val type, and convert from int to Val in a separate function from the loop (so that the loop is compiled using the static type information for the order, and propagates it to the inlined bessel function). There are instances of such tricks in the "performance tips" section of the julia manual

azadoks · 2022-10-06T17:03:29Z

I think this is as you suggest(?), but it seems to break @turbo.

@inline function sphericalbesselj(order::Val{0}, x)
    sin(x) / x
end
@inline function sphericalbesselj(order::Val{1}, x)
    (sin(x) / x - cos(x)) / x
end
...
function psp_projector_fouier_inner_loop(order::Val, psp::PspUpf, i, l, q::T)::T where {T <: Real}
    N = length(psp.rprojs[l+1][i])
    s = zero(T)
    @turbo for ir = 1:N
        s += sphericalbesselj(l_val, q * psp.rgrid[ir]) * psp.rprojs[l+1][i][ir] * psp.drgrid[ir]
    end
end
function eval_psp_projector_fourier(psp::PspUpf, i, l, q::T)::T where {T <: Real}
    if iszero(q)
        if l == 0
            s = T(1)
        else
            s = T(0)
        end
    else
        l_val = Val(l)
        s = psp_projector_fouier_inner_loop(l_val, psp, i, l, q)
    end
    return s
end

julia> PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid);                    # warm up
┌ Warning: #= /data/azadoks/source/git/DFTK.jl/src/pseudo/PspUpf.jl:172 =#:
│ `LoopVectorization.check_args` on your inputs failed; running fallback `@inbounds @fastmath` loop instead.
│ Use `warn_check_args=false`, e.g. `@turbo warn_check_args=false ...`, to disable this warning.

antoine-levitt · 2022-10-06T18:15:16Z

Hm curious (and annoying). That looks like a limitation of LoopVectorization, I'm not sure how to work around it... that said if just by using hand-coded and SIMD we get reasonable enough performance, we might just as well not use LV...

azadoks · 2022-10-07T09:33:03Z

It seems like @simd also doesn't work with the Val{l} dispatch.
With @simd:

# For Silicon, 2-atom cell, si-q4.upf
Ecut = 80
kgrid = [12, 12, 12]
julia> basis = @time PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid);      # benchmark
 70.301417 seconds (29.15 M allocations: 2.576 GiB, 0.61% gc time, 0.08% compilation time)

With @inbounds + @fastmath:

> # For Silicon, 2-atom cell, si-q4.upf
> Ecut = 80
> kgrid = [12, 12, 12]
julia> basis = @time PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid);      # benchmark
 69.722348 seconds (29.08 M allocations: 2.573 GiB, 0.43% gc time)

A glance through the LoopVectorization documentation makes me think that @simd and @turbo might give us problems down the line with dual numbers anyway. It would be a bit messy and repetitive, but I'd propose having both

The verbose @turbo version to be used with Float64 + Float32 which can be easily turbocharged without messing with array of structs -> arrays of numbers transforms
The safe and succinct Val{l}-dispatched @inbounds + @fastmath version for everything else

antoine-levitt · 2022-10-07T13:38:16Z

Dftk does pride itself on simplicity so I'd say let's forget about LV, just use simd and Val. The timings you report are reasonable I'd say. In any event, if we want to get more perf, we just do interpolation in q space (which even with a very fine mesh should result in an extremely significant boost, since we are basically sampling all of 3d q space here, and the functions should be very smooth functions of q)

antoine-levitt · 2022-10-07T13:39:42Z

Also LV is kind of in flux and due for a rewrite from what I understood, so hopefully in one year or so we just slap turbo on the Val version and it'll just work

mfherbst

Some notes.

src/pseudo/PspUpf.jl

mfherbst · 2022-10-08T09:24:57Z

Another thing that could be worth trying is to parallelise over the evaluation of projectors or the evaluation of Fourier components in the local potential. This will certainly help for larger calculations on clusters.

mfherbst · 2022-10-08T09:26:03Z

I agree with Antoine regarding the tradeoff simplicity / speed. I think what we have now is ok in speed and I would avoid making the code more complicated if possible.

src/pseudo/PspUpf.jl

mfherbst · 2022-10-08T09:30:22Z

Add / remove included UPF pseudopotentials

On that point: I think we should certainly not try to have a comprehensive UPF pseudo library in our code base. I see two options:

Introduce a data artefact containing all sorts of pseudo libraries (that could then be referenced by whomever who needs this, so UPF.jl, DFTK, other people ...).
Have only a very stripped down version in DFTK (implying potential false promises about their quality, what works etc.) and beyond that have people manage their pseudos themselves (can already be done now: load_psp takes an arbitrary path to load your custom pseudos).

I lean towards the first thing, but I don't have an overview how many pseudo libraries are actually used, i.e. if there is a combinatorial explosion sitting behind this solution that one should better not buy into.

mfherbst · 2022-10-08T09:36:53Z

Also: I think there are still some stability issues. Just out of curiosity I tried a UPF for a pretty big atom (Hafnium) and got greeted with NaNs. I have not investigated further as I think we should slowly make our way up the complexity ladder, but just to warn everyone :).

antoine-levitt · 2022-10-08T09:47:35Z

Agreed we should not have these in the repo. Gth are fine because they are super small fine, but beyond that it's fine to require people to get the upf themselves (and having a repo with many of them is a separate project)

azadoks · 2022-10-08T21:11:51Z

Just out of curiosity I tried a UPF for a pretty big atom (Hafnium) and got greeted with NaNs. I have not investigated further as I think we should slowly make our way up the complexity ladder, but just to warn everyone :).

I suspect this is because it's on a linear grid (is it from ONCVPSP?), and the linear grids usually contain r = 0. The sphericalbesselj functions divide by q * r, and we don't check for iszero(psp.rgrid[ir]), only for iszero(q).

azadoks · 2022-10-08T21:18:44Z

Agreed we should not have these in the repo. Gth are fine because they are super small fine, but beyond that it's fine to require people to get the upf themselves (and having a repo with many of them is a separate project)

Sounds good. It could be nice eventually to have something à la aiida-pseudo which packages the pseudos, recommended cutoffs, and other metadata from projects like the SSSP and Pseudo-Dojo along with the UPFs / PSPs. We can kick this down the road for now.

Shall we leave in a few pseudos for testing purposes? Maybe si-q4.upf for testing consistency with HGH and the Pseudo-Dojo silicon pseudo for testing ONCVPSP pseudos (i.e. linear grid pseudos)?

There might be some other pseudo generators that have their own quirks for which we could have an example for testing; I should probably look into this...

azadoks · 2022-10-11T08:07:13Z

I made a bit of a mess yesterday trying to rebase, but I've fixed up a few things:

Fixed (I hope) the NaN problem
Moved the sphericalbesselj functions to common
Refactored eval_psp_projector_fourier
Fixed the real-fourier consistency tests
- Instantiate real-space interpolators at parse time so that tests take a reasonable amount of time
Trimmed down the number of UPF files to a few used for testing
Renamed the UPF.jl package to PseudoPotentialIO.jl
- Made the repo public and started the registration process
- Should be available in a few days

What do you guys think are the remaining key points to address to wrap this all up?

mfherbst · 2022-10-11T08:40:23Z

Re PseudoPotentialIO.jl: why so complicated? Would not PseudoPotential be sufficient? I mean long-term I guess we should move the PspUpf and the PspHgh busyness from DFTK there as well, don't you think?

Trimmed down the number of UPF files to a few used for testing´

I'm not sure we should have them in data as that could suggest too much. Perhaps just have them in the test subfolder somewhere?

What do you guys think are the remaining key points to address to wrap this all up?

From my point of view there are two blocking things:

The unit issue
In case not yet done: Some careful test against QE or ABINIT in an example (i.e. show that you can reproduce the numbers over a reasonable range of Ecuts). Maybe with something non-HGH?

A nice example could also be good (I guess you can just download the UPF file from e.g. https://pseudopotentials.quantum-espresso.org/legacy_tables/hartwigesen-goedecker-hutter-pp/si in the test and do the SCF once with the UPF and once with the analytical HGH?

azadoks · 2022-10-11T09:03:27Z

Re PseudoPotentialIO.jl: why so complicated? Would not PseudoPotential be sufficient? I mean long-term I guess we should move the PspUpf and the PspHgh busyness from DFTK there as well, don't you think?

I agree, but PseudoPotentials.jl was already registered. I guess I could have registered PseudoPotential.jl (maybe a bit too close?); I can always change it.

I'm not sure we should have them in data as that could suggest too much. Perhaps just have them in the test subfolder somewhere?

True, I'll move them into test/

In case not yet done: Some careful test against QE or ABINIT in an example (i.e. show that you can reproduce the numbers over a reasonable range of Ecuts). Maybe with something non-HGH?

Ah yeah, I think this will definitely take some careful thought to try to get all the inputs lined up as well as we can. I'll give it a first shot and share the input files.

mfherbst · 2022-10-11T09:07:41Z

Ah yeah, I think this will definitely take some careful thought to try to get all the inputs lined up as well as we can. I'll give it a first shot and share the input files.

Oh yes, this is tricky. Don't try to be too fancy at first (i.e. use a simple functional and setup, first one projector then many etc). But it's worth doing. I found quite a few smaller errors along the way when I did this with ABINIT back in the days.

azadoks · 2022-10-11T09:10:03Z

The unit issue

Yeah, this is quite messy.

rgrid -- Always in Bohr
vloc -- Always in Ry
beta -- Either in Bohr^{-1/2} (Vanderbilt USPP) or in Ry Bohr^{-1/2} ("some converters")
Dion / Dij / h -- Either in Ry (Vanderbilt USPP) or in Ry^-1 ("some converters")

A note on units: the beta functions and D matrix are defined as in Vanderbilt’s US PP code and should have Bohr^{-1/2} and Ry units, respectively. Since they enter the calculation only as (betaDbeta), one might as well have "dion" in Ry^{-1} units and "beta" in Ry*Bohr^{-1/2} units, in line with what suggested by the Kleinman-Bylader transformation. Some converters actually do exactly that.

Because the units of beta and D are undefined, I'm not sure how to deal with this generally. For the pseudos I've tested, it seems like the Kleinman-Bylander units are being used.

For the local potential, I don't understand why, but if I convert before the FT, V(q) has strong oscillations. I also still don't completely understand why the valence number Z needs to be multiplied by two. I'll spend some time starting at it again, but if anyone can see why all this is happening more easily, let me know!

azadoks · 2022-10-21T08:03:06Z

I've just committed a (non-working) draft of the example: the idea is to run an Ecut convergence on an HGH and a PseudoDojo UPF pseudo (to show the difference in how hard they are).

Then, use the converged Ecut to plot a bandstructure (hopefully to show that the results are quite similar).

I chose Magnesium because it's one of the few heavier elements (i.e. not H, He, Li) in PseudoDojo that doesn't contain NLCC (which we don't yet support). I can't seem to get the 'mg-q10.hgh' pseudo to converge (up to 76 Ha, I haven't tried higher).

mfherbst · 2022-10-21T08:33:14Z

Hmm. I think this is too large an Ecut for the docs to handle in the CI. E.g. for https://docs.dftk.org/stable/examples/convergence_study/ we only run rather moderate Ecuts on the CI and show static images for the more reasonable sizes.

Regarding the large Ecuts: I'm not super surprised as HGHs are indeed quite hard. I also know other cases, where we had to go to beyond 100 to converge to 4-5 digits.

I'm not sure it's necessary to do a full convergence + bandstructure example here. I would literally just show how to obtain the pseudos (for now just download from the git repo, I guess) and then run an SCF at a moderate ecut. Surely the results will not be 100% the same, but that's fine.

azadoks · 2022-10-21T08:46:22Z

I'm not sure it's necessary to do a full convergence + bandstructure example here. I would literally just show how to obtain the pseudos (for now just download from the git repo, I guess) and then run an SCF at a moderate ecut. Surely the results will not be 100% the same, but that's fine.

I guess this is probably more reasonable; I still think it's interesting to show the convergence, so I could do that with static images.

Running the SCF won't really show anything because the energies will be quite different. Something like a bandstructure or equation of state should show that the pseudos give results that agree, at least somewhat.

mfherbst · 2022-10-21T08:57:27Z

Running the SCF won't really show anything because the energies will be quite different. Something like a bandstructure or equation of state should show that the pseudos give results that agree, at least somewhat.

Makes sense. In any case: If you create static images, please always add the code to generate the images to the https://github.com/JuliaMolSim/DFTK.jl/blob/master/docs/src/assets/0_pregenerate.jl file to ensure we can easily update them.

examples/pseudopotentials.jl

mfherbst · 2022-10-21T10:12:56Z

docs/src/assets/0_pregenerate.jl

+    plot!(plt, conv_hgh.Ecuts, conv_hgh.errors, label="HGH")
+    plot!(plt, conv_upf.Ecuts, conv_upf.errors, label="PseudoDojo UPF")


Show convergence on a logplot, thicker lines.

Very fishy, why is there such a big plateau? Is it because of the tolerance? If so can you just not display that part of the curve?

Otherwise very interesting! I knew the hgh were kind of hard but the difference is huge!

😱 That does look like a numerical grid issue. Do you by any chance have a graph of |V(q)| and |beta(q)| in semilog scale at the ready (so we can see at which rate it decays)? Otherwise I can take a look

I can probably get it in a few mins.

Yup, it's a grid issue. I'm opening a separate issue, it shouldn't hold back the merge

Note the potential seems stagnant around q=10, which is around Ecut=50, so that's consistent.

docs/src/assets/0_pregenerate.jl

mfherbst · 2022-10-21T11:59:36Z

docs/src/assets/0_pregenerate.jl

+    plot!(plt, conv_hgh.Ecuts, conv_hgh.errors, label="HGH")
+    plot!(plt, conv_upf.Ecuts, conv_upf.errors, label="PseudoDojo UPF")


Yes it's the SCF tolerance. By default in this example tol/100 where tol is the convergence we are aiming for (so 1e-3).

docs/src/assets/0_pregenerate.jl

mfherbst · 2022-10-21T12:03:39Z

Apart from the cosmetic nits on the plot script and the pregenerated plot this LGTM.

docs/src/assets/0_pregenerate.jl

mfherbst · 2022-10-21T12:56:58Z

@azadoks Great PR, thanks!

src/pseudo/PspUpf.jl

azadoks mentioned this pull request Oct 3, 2022

UPF pseudo format (NC only) support #729

Closed

mfherbst reviewed Oct 8, 2022

View reviewed changes

src/pseudo/PspUpf.jl Outdated Show resolved Hide resolved

src/pseudo/PspUpf.jl Show resolved Hide resolved

src/pseudo/PspUpf.jl Outdated Show resolved Hide resolved

src/pseudo/PspUpf.jl Outdated Show resolved Hide resolved

mfherbst reviewed Oct 8, 2022

View reviewed changes

src/pseudo/PspUpf.jl Outdated Show resolved Hide resolved

Working example

9dc3389

mfherbst reviewed Oct 21, 2022

View reviewed changes

examples/pseudopotentials.jl Outdated Show resolved Hide resolved

examples/pseudopotentials.jl Outdated Show resolved Hide resolved

Remove UPF files, clean up pseudo example

dc51637

mfherbst reviewed Oct 21, 2022

View reviewed changes

Better plot

aa199bb

mfherbst reviewed Oct 21, 2022

View reviewed changes

docs/src/assets/0_pregenerate.jl Outdated Show resolved Hide resolved

azadoks and others added 4 commits October 21, 2022 14:38

More example plot tweaks

efb0c68

Void statement

ee8cbaf

Update convergence study explanation

c31a5f3

Merge branch 'master' into nc-upf-support

7b23103

mfherbst reviewed Oct 21, 2022

View reviewed changes

docs/src/assets/0_pregenerate.jl Outdated Show resolved Hide resolved

Clean units

530b459

mfherbst enabled auto-merge (squash) October 21, 2022 12:56

antoine-levitt mentioned this pull request Oct 21, 2022

Possible grid issue with numerical pseudopotentials #761

Open

antoine-levitt reviewed Oct 21, 2022

View reviewed changes

src/pseudo/PspUpf.jl Outdated Show resolved Hide resolved

Prevent out-of-bounds indexing of projs

e1b4968

auto-merge was automatically disabled October 21, 2022 15:26
Head branch was pushed to by a user without write access

azadoks and others added 2 commits October 21, 2022 17:27

A bit cleaner

011fe2d

Disable UPF pseudo tests for MPI

74865f0

mfherbst enabled auto-merge (squash) October 21, 2022 16:34

mfherbst merged commit 3b1bfff into JuliaMolSim:master Oct 21, 2022

This was referenced Oct 22, 2022

Implement ONCVPSP (Norm-converving Vanderbilt PSPs) #134

Closed

Implement PSP8 Pseudopotential format #2

Open

		plot!(plt, conv_hgh.Ecuts, conv_hgh.errors, label="HGH")
		plot!(plt, conv_upf.Ecuts, conv_upf.errors, label="PseudoDojo UPF")

Support for NC pseudos in UPF format #741

Support for NC pseudos in UPF format #741

Conversation

azadoks commented Oct 3, 2022 • edited Loading

antoine-levitt commented Oct 5, 2022

azadoks commented Oct 6, 2022

azadoks commented Oct 6, 2022 • edited Loading

antoine-levitt commented Oct 6, 2022

antoine-levitt commented Oct 6, 2022

antoine-levitt commented Oct 6, 2022 • edited Loading

antoine-levitt commented Oct 6, 2022

azadoks commented Oct 6, 2022

azadoks commented Oct 6, 2022

antoine-levitt commented Oct 6, 2022

azadoks commented Oct 6, 2022

antoine-levitt commented Oct 6, 2022

azadoks commented Oct 7, 2022 • edited Loading

antoine-levitt commented Oct 7, 2022

antoine-levitt commented Oct 7, 2022

mfherbst left a comment

Choose a reason for hiding this comment

mfherbst commented Oct 8, 2022

mfherbst commented Oct 8, 2022

mfherbst commented Oct 8, 2022 • edited Loading

mfherbst commented Oct 8, 2022 • edited Loading

antoine-levitt commented Oct 8, 2022

azadoks commented Oct 8, 2022

azadoks commented Oct 8, 2022

azadoks commented Oct 11, 2022

mfherbst commented Oct 11, 2022 • edited Loading

azadoks commented Oct 11, 2022

mfherbst commented Oct 11, 2022

azadoks commented Oct 11, 2022

azadoks commented Oct 21, 2022

mfherbst commented Oct 21, 2022

azadoks commented Oct 21, 2022

mfherbst commented Oct 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azadoks Oct 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfherbst commented Oct 21, 2022

mfherbst commented Oct 21, 2022

azadoks commented Oct 3, 2022 •

edited

Loading

azadoks commented Oct 6, 2022 •

edited

Loading

antoine-levitt commented Oct 6, 2022 •

edited

Loading

azadoks commented Oct 7, 2022 •

edited

Loading

mfherbst commented Oct 8, 2022 •

edited

Loading

mfherbst commented Oct 8, 2022 •

edited

Loading

mfherbst commented Oct 11, 2022 •

edited

Loading

azadoks Oct 21, 2022 •

edited

Loading