Skip to content

Conversation

@leburgel
Copy link
Member

Implements a norm-preserving retraction for InfinitePEPS such that all of the individual PEPS tensors in the unit cell have a constant unit Euclidean norm, using the scheme suggested by @Jutho. The details of and intuition behind the procedure should be explained in the corresponding docstrings.

I did not add any explicit projections killing parallel components anywhere, since all the gradients and descent directions satisfy all natural orthogonality conditions up to a high precision.

Something of note is that these conditions are satisfied up to machine precision when using iterscheme=:diffgauge, but only up to approximately 1e-9 relative precision when using iterscheme=:fixed in the cases I checked. Perhaps in the latter case this can add up over time to give some stability issues, but I did not check this extensively. If we think this is useful I can add some explicit orthogonality tests and investigate the effect of explicit projections on stability in a follow-up.

Co-authored-by: Jutho Jutho@users.noreply.github.com

@lkdvos
Copy link
Member

lkdvos commented Mar 10, 2025

Should also resolve #137, and we can discuss whether it is still necessary to normalize the expanded corners in #148 : if the peps tensors as well as the corners and edges all have norm one that should never explode to the point where LAPACK becomes unstable?

@codecov
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
src/algorithms/optimization/peps_optimization.jl 96.55% <100.00%> (+0.89%) ⬆️
src/utility/svd.jl 91.17% <100.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Confusio
Copy link

Should also resolve #137, and we can discuss whether it is still necessary to normalize the expanded corners in #148 : if the peps tensors as well as the corners and edges all have norm one that should never explode to the point where LAPACK becomes unstable?

The norm-preserving by retraction may not totally resolve the issue in #137 and #148 as we still can't avoid small norm of peps when grad is large during the optimization, e.g. norm(peps+alpha*grad). Previously I used the sphere manifold in Optim.jl to implement the norm-preserving retraction, and also found some cases the singular values is small during the CTMRG. Actually the small singular values cause little effect in the forward evaluation of the energy, but it may cause instability for the AD of tsvd.

@leburgel
Copy link
Member Author

leburgel commented Mar 10, 2025

As observed in #148, the use of iterscheme=:fixed is running into the issue described in Jutho/KrylovKit.jl#110 a lot more often now that the singular values are properly normalized and therefore much smaller. If I understand correctly from @Confusio's observation here, the issue is really just that the tolerance of the rrule_alg is hardcoded to some specific value, which causes issues for any singular values whose magnitude falls below this tolerance.

If we can just set the tolerance of the rrule_alg dynamically based on the magnitude of the smallest singular value that would help, but maybe we can just kill singular values that are that small in a better way to avoid having to try so hard in the backward pass in the first place.

I feel like the discrepancy issue in rrule_alg tolerance and singular value magnitude is more of a KrylovKit thing, and I'm not sure how much effort we want to put into that here since the whole issue is hopefully circumvented by #150 anyway?

@Confusio
Copy link

Should also resolve #137, and we can discuss whether it is still necessary to normalize the expanded corners in #148 : if the peps tensors as well as the corners and edges all have norm one that should never explode to the point where LAPACK becomes unstable?

The norm-preserving by retraction may not totally resolve the issue in #137 and #148 as we still can't avoid small norm of peps when grad is large during the optimization, e.g. norm(peps+alpha*grad). Previously I used the sphere manifold in Optim.jl to implement the norm-preserving retraction, and also found some cases the singular values is small during the CTMRG. Actually the small singular values cause little effect in the forward evaluation of the energy, but it may cause instability for the AD of tsvd.

Another potential improvement is to integrate a backtracking line-search into the LBFGS algorithm. In certain non-physical states, small singular values occur frequently, which can destabilize the automatic differentiation of the SVD. Fortunately, these problematic states tend to be rejected based on their energy evaluations, which are generally robust. Consequently, after the line-search, the accepted states typically exhibit stable singular values, allowing us to compute the gradient only for these states.

Currently, it appears that OptimKit.jl only provides an implementation of Zhang-Hager's method. While Zhang-Hager's approach performs very well near the ground state, its reliability diminishes during the initial steps due to AD instability. A good choice is backtracking first and then switch to Zhang-Hager when we are close to the ground state.

@leburgel
Copy link
Member Author

Another potential improvement is to integrate a backtracking line-search into the LBFGS algorithm.

I fully agree, as I've also had some issues with the Hager-Zhang linesearch. I have an attempt at a backtracking implementation in OptimKit.jl here which helps in exactly the way you mention. I've been meaning to file an actual PR but have not gotten around to it. It's just copied from LineSearches.jl so I want to give a proper attribution of credit at the very least, but it's also not as efficient as it could be since it always computes the gradient as well inside the linesearch but never uses it.

So I agree we should definitely do this, but this is really a separate issue for a follow-up.

@Confusio
Copy link

As observed in #148, the use of iterscheme=:fixed is running into the issue described in Jutho/KrylovKit.jl#110 a lot more often now that the singular values are properly normalized and therefore much smaller. If I understand correctly from @Confusio's observation here, the issue is really just that the tolerance of the rrule_alg is hardcoded to some specific value, which causes issues for any singular values whose magnitude falls below this tolerance.

If we can just set the tolerance of the rrule_alg dynamically based on the magnitude of the smallest singular value that would help, but maybe we can just kill singular values that are that small in a better way to avoid having to try so hard in the backward pass in the first place.

I feel like the discrepancy issue in rrule_alg tolerance and singular value magnitude is more of a KrylovKit thing, and I'm not sure how much effort we want to put into that here since the whole issue is hopefully circumvented by #150 anyway?

Issue [Jutho/KrylovKit.jl#110](Jutho/KrylovKit.jl#110) appears to stem from multiple sources. One potential factor is the tolerance setting in the rrule_alg, which may address the problem in certain cases. Another possibility involves the use of FixedSpaceTruncation during automatic differentiation, particularly for the symmetric peps. In the line-search implementation, we apply FixedSpaceTruncation, but the fixed subspace for peps does not align well with the updated state, defined as peps' = peps + α * grad. In the worst case, this misalignment might lead to retaining only partially degenerate singular values, which then complicates solving the Sylvester equation. While incorporating a backtracking line-search could help mitigate these issues, it remains necessary to adjust the fixed subspace for the accepted peps—a refinement that should be implemented.

@leburgel
Copy link
Member Author

The norm-preserving by retraction may not totally resolve the issue in #137 and #148 as we still can't avoid small norm of peps when grad is large during the optimization, e.g. norm(peps+alpha*grad). Actually the small singular values cause little effect in the forward evaluation of the energy, but it may cause instability for the AD of tsvd.

I agree this will still be an issue, but the way I see it it's really a separate issue from the one being addressed here. This is just meant to prevent runaway norms during optimization. If singular values during CTMRG on normalized states are small during optimization, that's really not a bug anymore, but rather an algorithmic issue that should be addressed separately.

We anyway want to prevent runaway norms since this makes no sense, so this is a logical first step. Let's tackle the problem with small singular values in a separate issue and collect all our thoughts there to keep things more clear. This PR is not meant to fix that problem anyway.

@Confusio
Copy link

Another potential improvement is to integrate a backtracking line-search into the LBFGS algorithm.

I fully agree, as I've also had some issues with the Hager-Zhang linesearch. I have an attempt at a backtracking implementation in OptimKit.jl here which helps in exactly the way you mention. I've been meaning to file an actual PR but have not gotten around to it. It's just copied from LineSearches.jl so I want to give a proper attribution of credit at the very least, but it's also not as efficient as it could be since it always computes the gradient as well inside the linesearch but never uses it.

So I agree we should definitely do this, but this is really a separate issue for a follow-up.

If I got it right, the backtracking in LineSearches.jl only performs the gradient when the state is accepted if you define f and g separately in fg! for only_fg!? See here (https://github.com/JuliaNLSolvers/LineSearches.jl/blob/3259cd240144b96a5a3a309ea96dfb19181058b2/src/backtracking.jl#L37) and here (https://github.com/JuliaNLSolvers/LineSearches.jl/blob/3259cd240144b96a5a3a309ea96dfb19181058b2/src/LineSearches.jl#L41).

@leburgel
Copy link
Member Author

As for how to proceed, I don't think it's a good idea to switch to iterscheme=:diffgauge in the failing tests or to lower the default tolerance in the SVD rrule_alg since this just circumvents the problem. Maybe we should either

  • Scale the tolerance of the rrule_alg with respect to the magnitude of the smallest singular value, either from PEPSKit.jl or directly in KrylovKit.jl (I assume this problem might pop up in other contexts as well, so the latter might be best?).
  • Finish Implement TensorKit's tsvd! reverse-rule for :fixed differentiation mode #150 to make sure iterscheme=:fixed does not run into this problem by default in the first place.

I'm happy to keep this open until either of those is done.

@Confusio
Copy link

Implements a norm-preserving retraction for InfinitePEPS such that all of the individual PEPS tensors in the unit cell have a constant unit Euclidean norm, using the scheme suggested by @Jutho. The details of and intuition behind the procedure should be explained in the corresponding docstrings.

I did not add any explicit projections killing parallel components anywhere, since all the gradients and descent directions satisfy all natural orthogonality conditions up to a high precision.

Something of note is that these conditions are satisfied up to machine precision when using iterscheme=:diffgauge, but only up to approximately 1e-9 relative precision when using iterscheme=:fixed in the cases I checked. Perhaps in the latter case this can add up over time to give some stability issues, but I did not check this extensively. If we think this is useful I can add some explicit orthogonality tests and investigate the effect of explicit projections on stability in a follow-up.

Co-authored-by: Jutho Jutho@users.noreply.github.com

In this implement, the norm is defined by the norm of Array. In peps, the physical norm should be network_value(peps,env) which is also packed as norm(peps,env). Can we simply replace the Array norm by the network_value norm here?

@leburgel
Copy link
Member Author

In this implement, the norm is defined by the norm of Array. In peps, the physical norm should be network_value(peps,env) which is also packed as norm(peps,env). Can we simply replace the Array norm by the network_value norm here?

Indeed, the title of the PR may be a bit misleading, in the sense that the 'norm' that we're preserving is not at all the physical norm of the two-dimensional quantum state we're optimizing. To your question if we can preserve the network_value instead: in principle we could, but not as easily as the Euclidean norm, and I'm not sure if this would actually be better.

Firstly, preserving the physical norm is not as simple as changing all the norms above to network_value. I think the most straightforward way to do this is to simply keep the retraction exactly as above, and at the end rescale the entire PEPS by the appropriate root of the physical norm of the retracted state such that the rescaled state has unit physical norm. The problem with this is that to compute the physical norm of the retracted state, we really need to compute a new environment to evaluate this norm with. Since we don't really have access to any of the contraction algorithm options from within the retract, it is not at all obvious with what settings to contract the retracted state. To do what you suggest I think it would be best to actually retract the environment along with the state according to the given gradient, at which point we could use this to preserve the physical norm. This is certainly possible in principle since we know the environment satisfies some fixed-point equations at the original point (and in fact we've already computed the environment cotangent during the fixed-point differentiation process). But again, this would require bits and pieces that we currently don't have access to in retract. So retracting the state and environment simultaneously in a way that preserves the physical norm is a great idea,but I think it's a bit involved to do it properly. It would be great to follow up on this at some point in the future.

Secondly, I don't know if just purely preserving the physical norm is actually a real benefit from the point of view of optimization. When viewing the individual PEPS tensors as our degrees of freedom, since the energy is invariant under arbitrary rescalings of individual tensors, we really want the tensors themselves to be well-conditioned in terms of their norm. Even when preserving the total physical norm, the optimization could still become unstable with respect to redistributing this norm across individual tensors in an unbalanced way. In that sense, preserving the Euclidean norm of each individual tensor, while not really physically motivated, is really what we want to do from the point of view of numerical stability.

In short, while it's indeed not the 'right' thing to do from a physics point of view, I think this is perfect as a stability improvement.

@leburgel
Copy link
Member Author

Seems like #150 indeed fixed the tests, but two observations:

  • There's a lot of seemingly harmless svd cotangent gauge sensitivity warnings in the examples compared to the master. All of the actual sensitivity values seem very small though (in particular, all far below the contraction and fixed-point gradient tolerance). Maybe scaling the tolerance for the gauge sensitivity check as 1e-2 * smallest_singular_value was a bit too strict? It was quite arbitrary to begin with, so maybe we can relax this a bit?
  • Some tests take a bit more time now, because all optimizations perform a bit more iterations. It seems that our silently runaway increasing norms gave rise to gradient norms that were unnaturally small. So with the same tolerances for the gradient we're actually demanding more accuracy now. Doesn't seem like a real problem, but if they take too long maybe we can relax some tests.

Co-authored-by: Lukas Devos <ldevos98@gmail.com>
lkdvos
lkdvos previously approved these changes Mar 12, 2025
Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got a little lost in all the PRs, but this looks reasonable, so if the tests pass, but take more iterations because our effective convergence criterium changed, that seems okay for me.

@pbrehmer
Copy link
Collaborator

To add a small comment on the tolerance: I think it makes sense to use tol=smallest_sval (and then clamp it) since in the TensorKit svd_pullback! the rank is computed based on that tol using searchsortedlast(S, tol; rev=true).

I'd be fine with merging the PR as is! The optimization now taking more iterations is not a problem in my opinion and perhaps this is even advantageous in proper applications since this might alleviate the issue of barren plateaus (where the gradient norm plateaus for extended periods in the optimization)?

Maybe a last comment in terms of user experience (but this can be addressed in a separate PR): I'm not sure if these gauge warnings, even with the tolerance we have right now where the warnings are not that frequent anymore, should be really displayed to such accuracy by default. I would imagine that this is confusing to users that are not super familiar with the SVD adjoint/gauge dependency details. Should we maybe try to hide this by default unless the gauge dependency is very high and only enable accurate warnings if a high verbosity is chosen?

@leburgel
Copy link
Member Author

Yes, I'm quite happy with the tolerance now in principle (pullback_tol = clamp(smallest_sval, eps(scalartype(S̃))^(3 / 4), eps(scalartype(S̃))^(1 / 2)), at least from the point of view of the actual algorithm. I'm a bit annoyed by all the extra gauge sensitivity warnings we get now that the singular values are properly normalized and therefore much smaller, but after playing around a bit I don't think there's much to do about this right now.

I think that to properly address the gauge sensitivity warnings we need dedicated verbosity and tolerance for those. Right now the tolerance for the warning is also used in the actual implementation, so messing with that just to get rid of the warnings does not seem like a good idea. So I agree it would be good to deal with this separately later.

@leburgel
Copy link
Member Author

Okay, some unsuccessful attempts to get rid of gauge sensitivity warnings later this is basically in the same state as before, sorry about that. Should be pretty much ready to go now.

@leburgel leburgel requested a review from lkdvos March 13, 2025 10:31
@leburgel leburgel merged commit 135c5c1 into master Mar 13, 2025
33 checks passed
@leburgel leburgel deleted the lb/better_retract branch March 13, 2025 11:02
leburgel added a commit that referenced this pull request Mar 13, 2025
* Implement norm-preserving retractions

* Drop implicit assumption that state has unit norm

* Update src/algorithms/optimization/peps_optimization.jl

Co-authored-by: Lukas Devos <ldevos98@gmail.com>

* Increase tol for cotangent gauge sensitivity warnings

* Fix typo

* Not too high, not too low?

* Definitely too low

* Very arbitrary

* Restore tolerance, reduce bond dimension instead?

* No real difference, so restore bond dimension

---------

Co-authored-by: Lukas Devos <ldevos98@gmail.com>
leburgel added a commit that referenced this pull request Mar 17, 2025
…`KrylovKit.svdsolve` pullback (#155)

* Scale `rrule_alg` tolerance with singular value magnitude when using `KrylovKit.svdsolve` pullback

* Format

* Be more explicit

* Done

* Update src/utility/svd.jl

Co-authored-by: Lukas Devos <ldevos98@gmail.com>

* Address review suggestions

* Fix

* Increase gradient test coverage

* Wrong filter

* Filter was right all along

* Fix SVD pullback typing

* Implement norm-preserving retractions (#151)

* Implement norm-preserving retractions

* Drop implicit assumption that state has unit norm

* Update src/algorithms/optimization/peps_optimization.jl

Co-authored-by: Lukas Devos <ldevos98@gmail.com>

* Increase tol for cotangent gauge sensitivity warnings

* Fix typo

* Not too high, not too low?

* Definitely too low

* Very arbitrary

* Restore tolerance, reduce bond dimension instead?

* No real difference, so restore bond dimension

---------

Co-authored-by: Lukas Devos <ldevos98@gmail.com>

* Fix SVD mess

* Fix typo

* Split off gradient tests

* Attempt at testing all working combinations

* A bit less also works

* Cut down on algorithm combinations for naive gradient test

* Fix testset name

* Increase test timeout

---------

Co-authored-by: Lukas Devos <ldevos98@gmail.com>
Co-authored-by: Paul Brehmer <paul.brehmer@univie.ac.at>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants