LOBPCG stability: Suggestion of improvements #246

mfherbst · 2019-05-20T15:08:37Z

We currently employ the lobpcg solver from this Julia package in our DFTK.jl electronic structure theory code. In our observations, the current Cholesky-QR-based LOBPCG implementation can become numerically unstable and sometimes even produce spurious eigenvalues.

As an illustrative example, run the following julia code

A = [1.25, 1.5, 1.5, 1.25, 1.5, 1.25, 1.5, 0, 1.13, 1.13, 1.5, 1.13, 1.5, 1.5, 1.13]
res = lobpcg(Diagonal(A), false, 5)

In my experiments with this simple test problem, it fails about each 2nd time with a PosDefException from the Cholesky factorization. In the remaining cases, it returns two approximations to the zero eigenvalue at res.λ[1] and res.λ[2]. In all such instances I further observed the first two eigenvectors to be non-orthogonal. That is to say, that

dot(res.X[:, 1], res.X[:, 2])

returns a numerically non-zero answer, order of magnitude 0.0001.

A couple of strategies to improve the numerical stability of LOBPCG have been discussed in the literature, e.g. in https://arxiv.org/abs/1704.07458. As far as I can judge from reading through the code, the suggested basis truncation and selection strategies are not yet present and it might be advantageous to take a look at implementing them.

The text was updated successfully, but these errors were encountered:

mfherbst · 2019-05-21T15:00:59Z

Note: The original scipy implementation does not suffer from this problem. For example

from scipy.sparse.linalg import lobpcg
import numpy as np
lobpcg(np.diag(A), orth(np.random.randn(len(A), 5)), largest=False)

returns the correct result each time I tried it.

mohamed82008 · 2019-05-21T15:28:55Z

Seems like a duplicate of #223.

mohamed82008 · 2019-05-21T15:33:02Z

I am traveling this week, but I will give it a look next week. Thanks for the issue.

mfherbst · 2019-05-21T16:31:08Z

Thanks for looking into this. Indeed, this shows some similarity with #223. E.g. block size 5 is a magic number. 4 or 6 work much better with the A defined as above.

I can assure you, however, we have similarly frequent problems in our application code, where the matrices are much larger compared to the block size than in the shown example. I'll try to come up with a reproducible example for you.

mfherbst · 2019-05-22T07:54:04Z

I came up with a larger example, that still illustrates the PosDefException problem, see this gist.

On my machine this has a success rate of about 97%. This is not good enough for our needs, since we need in the order of hundred, potentially even thousands of such eigensolves.

mohamed82008 · 2019-05-29T02:59:17Z

Ok this is interesting. I managed to reproduce this with a specific seed. I will look into it.

lobpcg · 2019-06-04T04:56:52Z

@mohamed82008 see
#246 (comment) but scipy version uses Cholesky, so there's nothing wrong with Cholesky in this example. It looks like there's just a bug in your Julia code, unrelated to your QR "fix". You may want to just compare your code against the scipy latest version rather then putting new QR

mohamed82008 · 2019-06-04T11:43:37Z

It is definitely possible that it is a bug in my code, but given the number of passed test cases, I am very curious where I might have gone wrong. I can do a close comparison when I get some time.

lobpcg · 2019-06-05T04:00:54Z

@mohamed82008 this may be a bug in the core LOBPCG algorithm after all, also found by @mfherbst in scipy, see scipy/scipy#10258 (comment)

Let me think how to fix it easily...

mfherbst · 2019-06-05T07:36:55Z

@mohamed82008 (related to scipy/scipy#10258) I added a julia version to the example repository mfherbst/issue_scipy_lobpcg_guess for convenience and out of curiosity. As it turns out and as suspected by @lobpcg, the julia implementation of LOBPCG also fails on this example, but in fact both with and without the guess vectors.

mfherbst · 2019-06-05T08:02:12Z

The naive julia implementation referenced in scipy/scipy#10258 actually does pretty good in the example of mfherbst/issue_scipy_lobpcg_guess. Even converging down to 1e-14 tolerance in a reasonable number of iterations (some 250 without preconditioner and 100 with) is possible.

mohamed82008 · 2019-06-05T12:52:04Z

@mfherbst I think full orthogonalization of the basis with QR is an almost sure-fire way to make LOBPCG as stable as the QR alg (there is a loop-hole when using constraints). The only problem is the complexity the QR approach introduces in the form of additional matrix-matrix multiplications e.g. https://gist.github.com/antoine-levitt/f8ac3637c5d5d778a448780172e63e28#file-lobpcg-jl-L30, which is why I suggested this as a final measure in #247 if nothing else works. The QR "fix" in #247 is actually a less extreme measure than full orthogonalization as it only does matrix-matrix multiplications of P and/or R but not X for example.

lobpcg · 2019-06-05T12:58:07Z

QR is also very bad for parallel execution - the actual reason I avoid it in the original LOBPCG algorithm, which is also implemented in many parallel libraries.

mohamed82008 · 2019-06-05T13:43:38Z

I think the following approach has potential. Let U be the basis [X P R].

Remove components of U along constraints matrix C
Find the Gram matrix gramB = U' * B * U
Find the "compact" eigenvalue decomposition of gramB eliminating the nullspace basis, gramB = V_c L_c V_c'.
Update U using U = U * V_c * sqrt.(L_c).
Efficiently update A*U and B*U by right-multiplying by V_c * sqrt.(L_c).

Note that the sizes of U, A*U and B*U can decrease in steps 4 and 5. The above guarantees that U' B U == I without introducing new basis vectors that may conflict with the constraints matrix C.

Similarly, if using QR, I think we need to make sure not to include any additional basis from Q that are not spanned by the matrix we are trying to orthogonalize, i.e. only take columns whose corresponding diagonal in R is non-zero. These additional basis can conflict with the constraint matrix C which will backfire at the end.

The eigenvalue decomposition approach above avoids the need for additional matrix-matrix multiplications involving the matrices A or B and is more parallelizable so it should be more efficient without sacrificing stability.

Comments? @lobpcg @antoine-levitt

antoine-levitt · 2019-06-05T13:52:09Z

As soon as you discard information in this way, you will slow down convergence (essentially, downgrade to LOBPG, ie without the P) for precisions smaller than ~1e-8 (try it, you'll see), which are important for a number of applications.

mohamed82008 · 2019-06-05T14:05:20Z

Well, if the whole P matrix disappears due to 0 eigenvalues, LOBPCG basically turns into a preconditioned gradient descent algorithm, as opposed to a preconditioned conjugate gradient algorithm, so the convergence can possibly get hit. But increasing the subspace dimension is not computationally trivial, so perhaps the cost of the additional iterations of LOBPG (without the P matrix, dropped "conjugate"), can be compensated by the savings made from not expanding the basis using QR for example in the unconstrained case. Besides, if a good preconditioner is used, then the difference in convergence speed in terms of number of iterations may be even less serious. I think it is worth a try.

mohamed82008 · 2019-06-05T14:12:29Z

Perhaps both options can be provided once the constraint issue for QR is figured out, not that I am volunteering to implement both! If this turns out to be too much work, I may not have the time to do it any time soon; this is a somewhat busy period for me. But let' see.

mohamed82008 · 2019-06-05T14:29:14Z

The constrained case for QR seems simple to handle. We just need to check that any additional column of Q that we are including in the basis, for which the corresponding diagonal element of R is 0, has an orthogonal component to C, and is C-orthgonalized before being added.

mohamed82008 · 2019-06-06T11:14:31Z

So it seems a similar method to the one proposed above was also proposed in https://arxiv.org/pdf/1704.07458.pdf with the name svqbDrop. I haven't read the paper in details but it seems to have interesting ideas.

lobpcg · 2019-06-06T13:30:55Z

So it seems a similar method to the one proposed above was also proposed in https://arxiv.org/pdf/1704.07458.pdf with the name svqbDrop. I haven't read the paper in details but it seems to have interesting ideas.

We discussed this paper already, e.g., #247 (comment)

lobpcg · 2019-06-06T13:35:52Z

Well, if the whole P matrix disappears due to 0 eigenvalues, LOBPCG basically turns into a preconditioned gradient descent algorithm, as opposed to a preconditioned conjugate gradient algorithm, so the convergence can possibly get hit. But increasing the subspace dimension is not computationally trivial, so perhaps the cost of the additional iterations of LOBPG (without the P matrix, dropped "conjugate"), can be compensated by the savings made from not expanding the basis using QR for example in the unconstrained case. Besides, if a good preconditioner is used, then the difference in convergence speed in terms of number of iterations may be even less serious. I think it is worth a try.

Dropping P for the rest of iterations is extreme (I do it in MATLAB under some conditions) and of course results in dramatic slow down, e.g., already discussed in #247 (comment)

To fix the current test case, you need to drop P only on iterations where P is linear dependent, as I now do in https://github.com/lobpcg/blopex/blob/master/blopex_tools/matlab/lobpcg/lobpcg.m , but not yet in scipy. please see updated scipy/scipy#10258 (comment)

lobpcg · 2019-06-07T17:36:49Z

Let me ping an expert @joseeroman in case he has some advice

lobpcg · 2019-06-08T17:53:56Z

@rc @mfherbst @joseeroman @antoine-levitt @mohamed82008
please see scipy/scipy#10258 (comment)

mohamed82008 · 2019-06-10T07:15:49Z

I have my doubts about this "fix" since it doesn't really change the overall basis. I would be very surprised if it actually fixed all of @mfherbst 's test cases.

lobpcg · 2019-06-12T05:38:43Z

I have updated my comment: to fix the current test case, you need to drop P on iterations where P is linear dependent, as I now do in https://github.com/lobpcg/blopex/blob/master/blopex_tools/matlab/lobpcg/lobpcg.m , but not yet in scipy. please see scipy/scipy#10258 (comment)

stevengj · 2020-11-18T14:28:29Z

When I was implementing a related algorithm (a cruder predecessor to LOPCG, since at that time I didn't realize you could perform the line minimizations by solving a Ritz problem) many years ago (https://www.osapublishing.org/oe/fulltext.cfm?uri=oe-8-3-173&id=63584), I found that it was sufficient to re-orthogonalize occasionally (not every step). Maybe you could estimate the condition number from the Cholesky factors (which can be done cheaply, I think?) in order to decide whether to re-orthogonalize?

In that work I used the polar decomposition A=QP. This can be computed relatively straightforwardly in parallel since P = sqrt(A'A) is a small matrix, so you just need a parallel matrix transpose and reduction for the Gram matrix A'A. Maybe this is related to @mohamed82008's suggestion above.

antoine-levitt · 2020-11-18T15:17:13Z

Oh wow that's an old thread. Since then, we ended up writing our own implementation in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/eigen/lobpcg_hyper_impl.jl. It uses a number of tricks for maximal stability, and we haven't been able to make it break yet (and not for lack of trying :-)), and keeps full convergence rate even when converging to close to machine precision, which I have never been able to do with any other LOBPCG implementation. It also fixes a very tricky issue whereby locking degraded convergence, which took me a good while to figure out; I think no other implementation has that fix. I wanted to add some other tricks (like avoid doing a full diagonalization of X'AX using perturbation theory when close to convergence, which gives quite a nice speedup in certain circumstances) and possibly make a paper out of it, but other things got in the way. If somebody is interested in picking this up again I'd be happy to share.

The choice there was to only use Cholesky for orthogonalization, because it's the fastest in a parallel setting (it's like the polar decomposition, but choleskys are faster than square roots). It's also very unstable so we do two things: 1) we add something to the diagonal when the cholesky fails 2) we re-orthogonalize when needed, indeed using the condition number of the Cholesky factors as you suggest @stevengj (see https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/eigen/lobpcg_hyper_impl.jl#L77)

ViralBShah · 2020-11-18T15:58:33Z

Would it be useful for others to bring your implementation into this package?

mfherbst · 2020-11-18T16:15:46Z

As I said on the discourse thread our implementation at its current stage is not complete drop-in replacement for the lobpcg of this package (e.g. generalised eigensolves are not tested very thoroughly, only smallest eigenvalues implemented etc). So to get it to fully replace the implementation that exists would be a bit of work. Coexistence might be a little easier short-term I suppose.

Other than that it would lower the burden of maintaining it on our side, so I would not mind helping to get it integrated elsewhere 😄. What do you think @antoine-levitt?

lobpcg · 2020-11-18T16:39:00Z

Since my last comment in this thread, I have updated a year ago
https://github.com/scipy/scipy/blob/v1.5.4/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py
to make it more stable, e.g., also run in single precision to full attainable accuracy. My tricks are different from those @antoine-levitt describes above, so stability and performance should also differ. It would be interesting for someone to compare, maybe combine some of the tricks...

antoine-levitt · 2020-11-18T20:22:33Z

I already put it up at https://github.com/antoine-levitt/LOBPCG last year. It's the version that DFTK's one is based on. Anybody is welcome to do what they want with it. It's definitely more stable than the one in IterativeSolvers, and therefore I would recommend it for somebody to use for their first attempt. Generalized eigenvalue problems are OK (definitely more stable than the implementation in IterativeSolvers) as long as B is not horribly badly conditioned; there's an alternative tweak that doesn't assume B to be relatively well conditioned but it needs additional B multiplications for stability so I didn't put it in by default.

I did not benchmark extensively outside of DFTK's operating point so I can't say for sure that the implementation is unambiguously better than the one in IterativeSolvers regarding performance: there might be more matmuls or memory usage in our version; to be checked.

LOBPCG, as most Krylov methods, is a very delicate algorithm that is fundamentally unstable, and implementations have to find a fine line between stability and efficiency, so I think it's fine to have several implementations coexist. If someone does the work of comparing implementations and finds that one (possibly after tweaks) is unambiguously better than the others, then it might make sense to just have one, or possibly two pareto-optimal implementations, one more geared towards stability and one more towards performance. I can help but I don't have time to actually do that non-trivial amount of work.

mohamed82008 · 2020-11-18T23:57:54Z

Let's get a GSoC on this.

mohamed82008 · 2020-11-19T00:04:40Z

If there is a write-up somewhere describing the algorithm to be implemented, it would make it easier for the student to do it. Otherwise we would need a more mathematically adept student than the average GSoC to make sense of the arguments and scripts posted here. If a paper can come out of it, perhaps a graduate student might also be interested to volunteer and do all the comparisons and benchmarks.

lobpcg · 2020-11-19T01:37:15Z

The problem is that there are several papers describing different competing tricks and correspondingly several different actual implementations of LOBPCG in various packages, see https://en.wikipedia.org/wiki/LOBPCG . In exact arithmetic, different implementations should result in exactly same results, although with quite different computational costs and various memory requirements. In double precision, and especially single precision, implementations vary greatly in numerical stability and may just fail depending on the input. There is no universally accepted "best" implementation. Arguably, the Python version https://github.com/scipy/scipy/blob/v1.5.4/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py is most used and tested by various users. If you just want to modify the existing Julia version to literally follow every detail of this Python version, it is a purely coding exercise, easy for a CS student to do, just basically translating Python into Julia. If the goal is first to determine the "best" version for implementation in Julia, it is a difficult, possibly open-ended, PhD-level math/CS research project.

mohamed82008 · 2020-11-19T02:25:03Z

Either way, we can put it on the GSoC project list and depending on the level of the student, we can do good old scope creep.

antoine-levitt · 2020-11-19T09:59:51Z

Agreed with Andrew here: the pure coding exercise is not very interesting, as one can just use PyCall to use scipy's version from julia. The interesting part requires serious numerical skills.

mfherbst · 2020-11-19T12:21:44Z

A GSoC student for this would be pretty cool. Let me know if there is something I can do to help when it gets to it!

ViralBShah · 2020-11-19T12:58:23Z

I think the main thing to do is to list the project here and add yourself as a mentor (with any other potential co-mentors):

https://julialang.org/jsoc/projects/

I think it is soon going to be time that students start visiting this list, so best to put it up immediately.

mfherbst changed the title ~~LOBPCG stability improvements~~ LOBPCG stability: Suggestion of improvements May 20, 2019

mohamed82008 linked a pull request May 29, 2019 that will close this issue

[Do not merge] Fix 246 - QR ortho fallback for LOBPCG #247

Draft

lobpcg mentioned this issue Jun 5, 2019

LOBPCG convergence failure if guess provided scipy/scipy#10258

Closed

GVigne mentioned this issue Oct 21, 2022

Merge DFTK's implementation of LOBPCG into IterativeSolvers #329

Open

LOBPCG stability: Suggestion of improvements #246

LOBPCG stability: Suggestion of improvements #246

Comments

mfherbst commented May 20, 2019

mfherbst commented May 21, 2019 • edited Loading

mohamed82008 commented May 21, 2019 • edited Loading

mohamed82008 commented May 21, 2019

mfherbst commented May 21, 2019 • edited Loading

mfherbst commented May 22, 2019

mohamed82008 commented May 29, 2019

lobpcg commented Jun 4, 2019

mohamed82008 commented Jun 4, 2019

lobpcg commented Jun 5, 2019 • edited Loading

mfherbst commented Jun 5, 2019

mfherbst commented Jun 5, 2019 • edited Loading

mohamed82008 commented Jun 5, 2019

lobpcg commented Jun 5, 2019

mohamed82008 commented Jun 5, 2019 • edited Loading

antoine-levitt commented Jun 5, 2019

mohamed82008 commented Jun 5, 2019 • edited Loading

mohamed82008 commented Jun 5, 2019

mohamed82008 commented Jun 5, 2019

mohamed82008 commented Jun 6, 2019

lobpcg commented Jun 6, 2019

lobpcg commented Jun 6, 2019 • edited Loading

lobpcg commented Jun 7, 2019

lobpcg commented Jun 8, 2019

mohamed82008 commented Jun 10, 2019

lobpcg commented Jun 12, 2019

stevengj commented Nov 18, 2020

antoine-levitt commented Nov 18, 2020

ViralBShah commented Nov 18, 2020

mfherbst commented Nov 18, 2020 • edited Loading

lobpcg commented Nov 18, 2020

antoine-levitt commented Nov 18, 2020 • edited Loading

mohamed82008 commented Nov 18, 2020

mohamed82008 commented Nov 19, 2020

lobpcg commented Nov 19, 2020

mohamed82008 commented Nov 19, 2020

antoine-levitt commented Nov 19, 2020

mfherbst commented Nov 19, 2020

ViralBShah commented Nov 19, 2020

mfherbst commented May 21, 2019 •

edited

Loading

mohamed82008 commented May 21, 2019 •

edited

Loading

mfherbst commented May 21, 2019 •

edited

Loading

lobpcg commented Jun 5, 2019 •

edited

Loading

mfherbst commented Jun 5, 2019 •

edited

Loading

mohamed82008 commented Jun 5, 2019 •

edited

Loading

mohamed82008 commented Jun 5, 2019 •

edited

Loading

lobpcg commented Jun 6, 2019 •

edited

Loading

mfherbst commented Nov 18, 2020 •

edited

Loading

antoine-levitt commented Nov 18, 2020 •

edited

Loading