-
Notifications
You must be signed in to change notification settings - Fork 25
Implement norm-preserving retractions #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
The norm-preserving by retraction may not totally resolve the issue in #137 and #148 as we still can't avoid small norm of peps when grad is large during the optimization, e.g. norm(peps+alpha*grad). Previously I used the sphere manifold in Optim.jl to implement the norm-preserving retraction, and also found some cases the singular values is small during the CTMRG. Actually the small singular values cause little effect in the forward evaluation of the energy, but it may cause instability for the AD of tsvd. |
|
As observed in #148, the use of If we can just set the tolerance of the I feel like the discrepancy issue in |
Another potential improvement is to integrate a backtracking line-search into the LBFGS algorithm. In certain non-physical states, small singular values occur frequently, which can destabilize the automatic differentiation of the SVD. Fortunately, these problematic states tend to be rejected based on their energy evaluations, which are generally robust. Consequently, after the line-search, the accepted states typically exhibit stable singular values, allowing us to compute the gradient only for these states. Currently, it appears that |
I fully agree, as I've also had some issues with the Hager-Zhang linesearch. I have an attempt at a backtracking implementation in OptimKit.jl here which helps in exactly the way you mention. I've been meaning to file an actual PR but have not gotten around to it. It's just copied from LineSearches.jl so I want to give a proper attribution of credit at the very least, but it's also not as efficient as it could be since it always computes the gradient as well inside the linesearch but never uses it. So I agree we should definitely do this, but this is really a separate issue for a follow-up. |
Issue [Jutho/KrylovKit.jl#110](Jutho/KrylovKit.jl#110) appears to stem from multiple sources. One potential factor is the tolerance setting in the |
I agree this will still be an issue, but the way I see it it's really a separate issue from the one being addressed here. This is just meant to prevent runaway norms during optimization. If singular values during CTMRG on normalized states are small during optimization, that's really not a bug anymore, but rather an algorithmic issue that should be addressed separately. We anyway want to prevent runaway norms since this makes no sense, so this is a logical first step. Let's tackle the problem with small singular values in a separate issue and collect all our thoughts there to keep things more clear. This PR is not meant to fix that problem anyway. |
If I got it right, the backtracking in |
|
As for how to proceed, I don't think it's a good idea to switch to
I'm happy to keep this open until either of those is done. |
In this implement, the |
Indeed, the title of the PR may be a bit misleading, in the sense that the 'norm' that we're preserving is not at all the physical norm of the two-dimensional quantum state we're optimizing. To your question if we can preserve the Firstly, preserving the physical norm is not as simple as changing all the Secondly, I don't know if just purely preserving the physical norm is actually a real benefit from the point of view of optimization. When viewing the individual PEPS tensors as our degrees of freedom, since the energy is invariant under arbitrary rescalings of individual tensors, we really want the tensors themselves to be well-conditioned in terms of their norm. Even when preserving the total physical norm, the optimization could still become unstable with respect to redistributing this norm across individual tensors in an unbalanced way. In that sense, preserving the Euclidean norm of each individual tensor, while not really physically motivated, is really what we want to do from the point of view of numerical stability. In short, while it's indeed not the 'right' thing to do from a physics point of view, I think this is perfect as a stability improvement. |
|
Seems like #150 indeed fixed the tests, but two observations:
|
Co-authored-by: Lukas Devos <ldevos98@gmail.com>
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got a little lost in all the PRs, but this looks reasonable, so if the tests pass, but take more iterations because our effective convergence criterium changed, that seems okay for me.
|
To add a small comment on the tolerance: I think it makes sense to use I'd be fine with merging the PR as is! The optimization now taking more iterations is not a problem in my opinion and perhaps this is even advantageous in proper applications since this might alleviate the issue of barren plateaus (where the gradient norm plateaus for extended periods in the optimization)? Maybe a last comment in terms of user experience (but this can be addressed in a separate PR): I'm not sure if these gauge warnings, even with the tolerance we have right now where the warnings are not that frequent anymore, should be really displayed to such accuracy by default. I would imagine that this is confusing to users that are not super familiar with the SVD adjoint/gauge dependency details. Should we maybe try to hide this by default unless the gauge dependency is very high and only enable accurate warnings if a high verbosity is chosen? |
|
Yes, I'm quite happy with the tolerance now in principle ( I think that to properly address the gauge sensitivity warnings we need dedicated verbosity and tolerance for those. Right now the tolerance for the warning is also used in the actual implementation, so messing with that just to get rid of the warnings does not seem like a good idea. So I agree it would be good to deal with this separately later. |
|
Okay, some unsuccessful attempts to get rid of gauge sensitivity warnings later this is basically in the same state as before, sorry about that. Should be pretty much ready to go now. |
* Implement norm-preserving retractions * Drop implicit assumption that state has unit norm * Update src/algorithms/optimization/peps_optimization.jl Co-authored-by: Lukas Devos <ldevos98@gmail.com> * Increase tol for cotangent gauge sensitivity warnings * Fix typo * Not too high, not too low? * Definitely too low * Very arbitrary * Restore tolerance, reduce bond dimension instead? * No real difference, so restore bond dimension --------- Co-authored-by: Lukas Devos <ldevos98@gmail.com>
…`KrylovKit.svdsolve` pullback (#155) * Scale `rrule_alg` tolerance with singular value magnitude when using `KrylovKit.svdsolve` pullback * Format * Be more explicit * Done * Update src/utility/svd.jl Co-authored-by: Lukas Devos <ldevos98@gmail.com> * Address review suggestions * Fix * Increase gradient test coverage * Wrong filter * Filter was right all along * Fix SVD pullback typing * Implement norm-preserving retractions (#151) * Implement norm-preserving retractions * Drop implicit assumption that state has unit norm * Update src/algorithms/optimization/peps_optimization.jl Co-authored-by: Lukas Devos <ldevos98@gmail.com> * Increase tol for cotangent gauge sensitivity warnings * Fix typo * Not too high, not too low? * Definitely too low * Very arbitrary * Restore tolerance, reduce bond dimension instead? * No real difference, so restore bond dimension --------- Co-authored-by: Lukas Devos <ldevos98@gmail.com> * Fix SVD mess * Fix typo * Split off gradient tests * Attempt at testing all working combinations * A bit less also works * Cut down on algorithm combinations for naive gradient test * Fix testset name * Increase test timeout --------- Co-authored-by: Lukas Devos <ldevos98@gmail.com> Co-authored-by: Paul Brehmer <paul.brehmer@univie.ac.at>
Implements a norm-preserving retraction for
InfinitePEPSsuch that all of the individual PEPS tensors in the unit cell have a constant unit Euclidean norm, using the scheme suggested by @Jutho. The details of and intuition behind the procedure should be explained in the corresponding docstrings.I did not add any explicit projections killing parallel components anywhere, since all the gradients and descent directions satisfy all natural orthogonality conditions up to a high precision.
Something of note is that these conditions are satisfied up to machine precision when using
iterscheme=:diffgauge, but only up to approximately1e-9relative precision when usingiterscheme=:fixedin the cases I checked. Perhaps in the latter case this can add up over time to give some stability issues, but I did not check this extensively. If we think this is useful I can add some explicit orthogonality tests and investigate the effect of explicit projections on stability in a follow-up.Co-authored-by: Jutho Jutho@users.noreply.github.com