New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further correcting grad_eigh to support hermitian matrices and the UPLO kwarg properly #527
Conversation
The version of grad_eigh used previously only supported real symmetric inputs to eigh. Changing v to conj(v) in two places makes this more general, allowing eigh to support arbitrary hermitian matrices.
I think the loss you defined The good news is, normaly, we don't use a loss with phase dependancy, since the loss itself is not so well defined. This is only a question of how to test it, here is an example (in Julia though) of testing: |
I see, I did realize that the gauge might be a problem, but didn't investigate further. But I think there's something more to that. It doesn't work even with a loss like Not sure what more could be done about this, perhaps a warning to use at your own risk for complex-valued eigenvectors? That also reminds me that the gradient breaks when there are degenerate eigenvalues, perhaps a warning is warranted there too? |
|
Oh, dang, I mixed up the indexes, my bad. I meant However it's still not clear to me why the |
Ok, I just made the test function return |
This looks good and I'm happy to merge, however I'd like to be clear, is there still an issue here that we don't fully understand? The fact that the function being differentiated needs to not be gauge-dependent is fine, this is analogous to e.g. the singular value decomposition, which also has multiple valid output values. We assume that the user is knowledgable enough to know that their loss function should be invariant to this 🙂. |
No other issues that I'm aware of. The last thing to be careful about is degenerate eigenvalues. However, this is once again a known problem and we can hope the user is knowledgeable enough. That said, I just pushed one final improvement. Basically the problem with degenerate eigenvalues is in the backprop of the eigenvector gradient. Previously, even if your function only depended on the eigenvalues, if there were degenerate ones you would get a "division by zero" warning, and |
I used a similar |
Isn't there a danger that a wrong gradient is returned in that case if the backprop eigenvector gradient is nonzero and there is a degeneracy? That is to say, the problem is not division by Otherwise yeah can change the cutoff to |
Well, this is a known (maybe intrinsically) hard problem. Reduce the cutoff or make the cutoff a tunable parameter is what you can do. |
Ok I just changed the check to simply use |
Thanks! |
Edit: as discussed in the comments below, the issue with the complex eigenvectors is the gauge, which is arbitrary. However, this updated code should work for complex-valued matrices and functions that do not depend on the gauge. So for example, the test for the complex case uses
np.abs(v)
.What this update does:
However:
The gradient for the eigenvectors does not pass a general test. However, it works in some cases. For example, this code
returns a difference smaller than 1e-6 for any individual component of
vs
that is put in the return statement. However, it breaks for a more complicated function, e.g.return npa.abs(vs[0, 0] + vs[1, 1])
.It would be great if someone can address this further. Still, for now this PR is a significant improvement in the behavior of the
linalg.eigh
function.