Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#481 possibly broke ishermitian #606

Open
torfjelde opened this issue Nov 11, 2022 · 15 comments
Open

#481 possibly broke ishermitian #606

torfjelde opened this issue Nov 11, 2022 · 15 comments

Comments

@torfjelde
Copy link
Contributor

After #481 , ishermitian is now broken (maybe other methods too?) when working with ForwardDiff.Dual.

julia> Pkg.status()
Status `/tmp/jl_gs3tnK/Project.toml`
  [f6369f11] ForwardDiff v0.10.33
  [37e2e46d] LinearAlgebra

julia> using LinearAlgebra, ForwardDiff

julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> ishermitian(x)
true

julia> f(x) = sum(cholesky(reshape(x, 2, 2)))
f (generic function with 1 method)

julia> ForwardDiff.gradient(vec(x))
ERROR: MethodError: no method matching gradient(::Vector{Float64})
Closest candidates are:
  gradient(::Any, ::StaticArraysCore.StaticArray) at ~/.julia/packages/ForwardDiff/eqMFf/src/gradient.jl:44
  gradient(::Any, ::StaticArraysCore.StaticArray, ::ForwardDiff.GradientConfig) at ~/.julia/packages/ForwardDiff/eqMFf/src/gradient.jl:45
  gradient(::Any, ::StaticArraysCore.StaticArray, ::ForwardDiff.GradientConfig, ::Val) at ~/.julia/packages/ForwardDiff/eqMFf/src/gradient.jl:46
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1

julia> ForwardDiff.gradient(f, vec(x))
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
Stacktrace:
  [1] checkpositivedefinite(info::Int64)
    @ LinearAlgebra /opt/julia-1.8.2/share/julia/stdlib/v1.8/LinearAlgebra/src/factorization.jl:18
  [2] cholesky!(A::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4}}, ::NoPivot; check::Bool)
    @ LinearAlgebra /opt/julia-1.8.2/share/julia/stdlib/v1.8/LinearAlgebra/src/cholesky.jl:299
  [3] #cholesky#162
    @ /opt/julia-1.8.2/share/julia/stdlib/v1.8/LinearAlgebra/src/cholesky.jl:402 [inlined]
  [4] cholesky (repeats 2 times)
    @ /opt/julia-1.8.2/share/julia/stdlib/v1.8/LinearAlgebra/src/cholesky.jl:402 [inlined]
  [5] f(x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4}})
    @ Main ./REPL[9]:1
  [6] vector_mode_dual_eval!
    @ ~/.julia/packages/ForwardDiff/eqMFf/src/apiutils.jl:37 [inlined]
  [7] vector_mode_gradient(f::typeof(f), x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4, Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/eqMFf/src/gradient.jl:106
  [8] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4, Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4}}}, ::Val{true})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/eqMFf/src/gradient.jl:19
  [9] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4, Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f), Float64}, Float64, 4}}}) (repeats 2 times)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/eqMFf/src/gradient.jl:17
 [10] top-level scope
    @ REPL[11]:1

This is because now the A[i,j] != adjoint(A[j,i]) check performed in ishermitian returns true.

@devmotion
Copy link
Member

Maybe #481 should have been put in a breaking release after all. Maybe it would be safer to revert it and then prepare a breaking release with it?

@mcabbott
Copy link
Member

I'm sorry it broke things, but I think this is the intention.

While x really is Hermitian, x .+ Dual.(0, rand.()) represents a small perturbation around x, and most such perturbed matrices will not be Hermitian. Using an algorithm which assumes they are will tend to give zero gradients in some directions. This seems exactly analogous to checking det(x) == 0 etc.

Hermitian(x .+ Dual.(0, rand.())) is of course still Hermitian, and has fewer independent directions of perturbation.

@torfjelde
Copy link
Contributor Author

torfjelde commented Nov 11, 2022

While x really is Hermitian, x .+ Dual.(0, rand.()) represents a small perturbation around x, and most such perturbed matrices will not be Hermitian.

I'm with you on this, but at the same time it seems quite detrimental that we no longer can compute the gradient through methods such as cholesky using ForwardDiff.jl.

EDIT: Am aware that we can just do check=false, but still seems quite annoying in practice 😕

@torfjelde
Copy link
Contributor Author

I see there has been quite extensive discussions about this change, so I'll stop complaining about some of the resulting inconvenience 👍

But do we have an idea of what's the best approach to deal with all of the downstream beaking code?

@devmotion
Copy link
Member

While x really is Hermitian, x .+ Dual.(0, rand.()) represents a small perturbation around x, and most such perturbed matrices will not be Hermitian. Using an algorithm which assumes they are will tend to give zero gradients in some directions. This seems exactly analogous to checking det(x) == 0 etc.

I'm with you regarding this argument. But I think it is problematic that 1) such a major change is released in a non-breaking release and breaks many downstream packages and applications and 2) it seems this is inconsistent with e.g. ChainRules which allows to compute derivatives of cholesky(::StridedMatrix): https://github.com/JuliaDiff/ChainRules.jl/blob/1f4a8a9d86c79f024a911f61aa180bdc094bb8a3/src/rulesets/LinearAlgebra/factorization.jl#L491

@mcabbott
Copy link
Member

deal with all of the downstream beaking code

breaks many downstream packages and applications

That's a lot of plurals, but so far one link. Is the problem only cholesky?

Are there any examples where cholesky is used by some meta-algorithm behind a check like ishermitian, or must it always be explicitly called? Knowingly calling it should perhaps be regarded as an assertion that you wish to constrain x to the subspace, like writing Hermitian(x).

julia> x = [1 0.5; 0.5 1]; #  as above

julia> ForwardDiff.jacobian(x -> x \ [1,-2], x)  # without constraining x
2×4 Matrix{Float64}:
 -3.55556   1.77778   4.44444  -2.22222
  1.77778  -3.55556  -2.22222   4.44444

julia> x \ [1,-2]  cholesky(x) \ [1,-2]
true

julia> ForwardDiff.jacobian(x -> cholesky(x) \ [1,-2], x)
2×4 Matrix{Float64}:  # before, implicit constraint
 -3.55556  0.0   6.22222  -2.22222
  1.77778  0.0  -5.77778   4.44444
# after, an error.

julia> ForwardDiff.jacobian(x -> Hermitian(x) \ [1,-2], x)  # with explicit constraint
2×4 Matrix{Float64}:
 -3.55556  0.0   6.22222  -2.22222
  1.77778  0.0  -5.77778   4.44444

julia> ForwardDiff.jacobian(x -> cholesky(Hermitian(x)) \ [1,-2], x)  # with explicit constraint
2×4 Matrix{Float64}:  # still works after PR
 -3.55556  0.0   6.22222  -2.22222
  1.77778  0.0  -5.77778   4.44444

A nearby example of checks working as intended is:

julia> ForwardDiff.jacobian(x -> x \ [1,-2], [1 0; 0 0.1])
2×4 Matrix{Float64}:  # before, chooses the wrong algorithm (but not cholesky) hence imposes constraint:
 -1.0  -0.0  -0.0   -0.0
  0.0   0.0   0.0  200.0
2×4 Matrix{Float64}:  # after, avoids that (because istril(x) and istriu(x) are false on Dual):
 -1.0    0.0  20.0    0.0
  0.0  -10.0   0.0  200.0

julia> FiniteDifferences.jacobian(central_fdm(5, 1), x -> x \ [1,-2], [1 0; 0 0.1])[1]
2×4 Matrix{Float64}:
 -1.0           6.79813e-15  20.0            1.14103e-12
  6.1814e-13  -10.0           3.81529e-14  200.0

@torfjelde
Copy link
Contributor Author

I think it's fair to assume that this one particular test is not the only case where people are using ForwardDiff + cholesky, even if you exclude all the Turing-models using Wishart (of which I know there are way more than just this one test).

Are there any examples where cholesky is used by some meta-algorithm behind a check like ishermitian, or must it always be explicitly called? Knowingly calling it should perhaps be regarded as an assertion that you wish to constrain x to the subspace, like writing Hermitian(x).

I'm a bit uncertain what you're asking here tbh. But I think it all just comes down to the following question: do you not think we should be able to differentiate through cholesky?

If yes, then we need checkpositivedefinite to return true if value.(x) is PD. Yes, you can wrap it in Hermitian, or just pass check=false, but that implies that you know it's Hermitian, meaning that, unless you constructed the matrix yourself, you need to call checkpositivedefinite first.

@mcabbott
Copy link
Member

mcabbott commented Nov 12, 2022

I'm a bit uncertain what you're asking here

I went looking and found many checks of what algorithm to use based on istrui etc. I am literally asking whether there are such paths which use ishermitian and lead to cholesky. In LinearAlgebra or elsewhere. And if there are, whether they have additional non-error paths for generic matrices.

I ask this to get an idea of the lay of the land, since probably you know this better. I have never wanted to have anything to do with cholesky except for fixing problems like this.

should be able to differentiate through cholesky

Maybe? I have not formed a very clear opinion.

then we need checkpositivedefinite to return true

Not obviously. We could also have a method like cholesky(::AbstractArray{Dual}) which does not check.

Overloading isposdef or ishermitian to always ignore Dual means that things unrelated to Cholesky will get those versions.

but that implies that you know it's Hermitian, meaning that, unless you constructed the matrix yourself, you need to call checkpositivedefinite first

Yes. I think the question is what should happen if this matrix (which you didn't construct yourself) is not positive definite. Do you want an error? Or is there another path in your function?

  • If there is another path, then the claim of 481 is that AD ought to take that. Your function knows about dimensions outside of Hermitian, and AD ought to see them. The check is simply about what algorithm to use internally, not part of your function's contract.

  • If there is no other path, and the domain of your function is Hermitian matrices, it just happens to accept Matrix as a convenience. Then I guess it makes more sense to project the gradient onto that subspace.

The awkward thing is that I don't think it's easy for ForwardDiff to distinguish these situations. Is this "if" statement choosing an algorithm, or checking that input is legal?

The one clean way seems to be to push this to the type level. Don't check isinteger(x) inside a function and throw, rather define the function for ::Integer and let Julia throw a MethodError for you, is the idiom. But presumably it's hard to teach the whole world to call cholesky(Hermitian(x)).

@devmotion
Copy link
Member

Maybe the cleanest and safest way would be to just fix and implement cholesky similar to ChainRules (which was also was ForwardDiff returned in previous versions), and only turn to more general approaches - which possibly could have other side effects - if it turns out that it is really needed?

Not being able to use cholesky(x) with ForwardDiff and x::StridedMatrix is a major issue for Turing and anyone who wants to use cholesky, so maybe we should not try to come up with a solution to every possible issue around cholesky but just fix these regressions or revert the PR.

ChainRules (and hence also Zygote) already decided how to handle these cases. A comparison of ForwardDiff 0.10.32, 0.10.33, and ChainRules/Zygote:

ForwardDiff 0.10.32:

julia> using ForwardDiff, LinearAlgebra

julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> f(x) = sum(cholesky(x).U);

julia> ForwardDiff.gradient(f, x)
2×2 Matrix{Float64}:
 0.394338  0.42265
 0.0       0.57735

julia> ForwardDiff.gradient(f, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...

julia> ForwardDiff.gradient(f, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
...

julia> g(x) = sum(cholesky(Hermitian(x)).L);

julia> ForwardDiff.gradient(g, x)
2×2 Matrix{Float64}:
 0.394338  0.42265
 0.0       0.57735

julia> ForwardDiff.gradient(g, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...

julia> ForwardDiff.gradient(g, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
...

ForwardDiff 0.10.33:

julia> using ForwardDiff, LinearAlgebra

julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> f(x) = sum(cholesky(x).U);

julia> ForwardDiff.gradient(f, x)
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
...

julia> ForwardDiff.gradient(f, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...

julia> ForwardDiff.gradient(f, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
...

julia> g(x) = sum(cholesky(Hermitian(x)).L);

julia> ForwardDiff.gradient(g, x)
2×2 Matrix{Float64}:
 0.394338  0.42265
 0.0       0.57735

julia> ForwardDiff.gradient(g, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...

julia> ForwardDiff.gradient(g, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
...

ChainRules 1.45.0:

julia> using ChainRules, LinearAlgebra

julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> ChainRules.rrule(cholesky, x, NoPivot())[1]
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> ChainRules.rrule(cholesky, Symmetric(x), NoPivot())[1]
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> ChainRules.rrule(cholesky, Hermitian(x), NoPivot())[1]
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> Δ = Cholesky(UpperTriangular(ones(2, 2)))
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  1.0
     1.0

julia> ChainRules.rrule(cholesky, x, NoPivot())[2](Δ)[2]
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 0.394338  0.42265
          0.57735

julia> ChainRules.rrule(cholesky, Symmetric(x), NoPivot())[2](Δ)[2]
2×2 Symmetric{Float64, Matrix{Float64}}:
 0.394338  0.211325
 0.211325  0.57735

julia> ChainRules.rrule(cholesky, Hermitian(x), NoPivot())[2](Δ)[2]
2×2 Hermitian{Float64, Matrix{Float64}}:
 0.394338  0.211325
 0.211325  0.57735

Zygote 0.6.49:

julia> using Zygote, LinearAlgebra

julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> f(x) = sum(cholesky(x).U);

julia> only(Zygote.gradient(f, x))
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 0.394338  0.42265
          0.57735

julia> only(Zygote.gradient(f, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
 0.394338  0.211325
 0.211325  0.57735

julia> only(Zygote.gradient(f, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
 0.394338  0.211325
 0.211325  0.57735

julia> g(x) = sum(cholesky(Hermitian(x)).U);

julia> only(Zygote.gradient(g, x))
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 0.394338  0.42265
          0.57735

julia> only(Zygote.gradient(g, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
 0.394338  0.211325
 0.211325  0.57735

julia> only(Zygote.gradient(g, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
 0.394338  0.211325
 0.211325  0.57735

Interestingly, FiniteDifferences is broken for all examples (errors or returns incorrect gradients).
FiniteDifferences :

julia> using FiniteDifferences, LinearAlgebra

julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.5
     0.866025

julia> fdm = FiniteDifferences.central_fdm(5, 1);

julia> f(x) = sum(cholesky(x).U);

julia> only(FiniteDifferences.grad(fdm, f, x))
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
...

julia> only(FiniteDifferences.grad(fdm, f, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
 0.398523  0.41868
 0.41868   0.579496

julia> only(FiniteDifferences.grad(fdm, f, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
 0.398523  0.41868
 0.41868   0.579496

julia> g(x) = sum(cholesky(Hermitian(x)).U);

julia> only(FiniteDifferences.grad(fdm, g, x))
2×2 Matrix{Float64}:
 0.398523     0.41868
 6.01822e-15  0.579496

julia> only(FiniteDifferences.grad(fdm, g, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
 0.398523  0.41868
 0.41868   0.579496

julia> only(FiniteDifferences.grad(fdm, g, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
 0.398523  0.41868
 0.41868   0.579496

There is an open issue that discusses issues with positive semi-definite matrices (JuliaDiff/FiniteDifferences.jl#52).

@mcabbott
Copy link
Member

mcabbott commented Nov 12, 2022

One idea is to overload ishermitian with some kind of dep-warn which does the old thing, and tells you to put Hermitian there in future.

To try this out.. the example in the top message doesn't work on any version, as the function doesn't make a scalar. Here's a version of it, and comparison to finite differences:

julia> y = [1 0.3; 0.3 0.7]; ishermitian(y)
true

# cholesky example
# on v0.10.33, PosDefException: matrix is not Hermitian
# with depwarn, same result as v0.10.32

julia> ForwardDiff.gradient(x -> sum(sin, cholesky(x).U), y)
┌ Warning: ishermitian(A) was true for this matrix, but will become false due to Dual numbers.
│   caller = cholesky!(A::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{var"#35#36", Float64}, Float64, 4}}, ::NoPivot; check::Bool) at cholesky.jl:296
└ @ LinearAlgebra ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/cholesky.jl:296
2×2 Matrix{Float64}:
 0.16777  0.682544
 0.0      0.454654

julia> ForwardDiff.gradient(x -> sum(sin, cholesky(Hermitian(x)).U), y)  # silences warning
2×2 Matrix{Float64}:
 0.16777  0.682544
 0.0      0.454654

julia> Zygote.gradient(x -> sum(sin, cholesky(x).U), y)[1]  # agrees
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 0.16777  0.682544
         0.454654

julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(sin, cholesky(x).U), y)[1]
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.

julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(sin, cholesky(Hermitian(x)).U), y)[1]
2×2 Matrix{Float64}:
 0.16777     0.682544
 1.9345e-16  0.454654

Looking for what else depends on ishermitian, here's another example with a matrix function. Notice that the old gradient from ForwardDiff has a zero, where FiniteDifferences instead produces the matrix symmetrised.

Notice also that inserting Hermitian wrapper here does not send ForwardDiff down a path it can handle, and never did so.

# cos example
# on v0.10.33, MethodError: no method matching exp!(::Matrix{Complex{ForwardDiff.Dual
# with depwarn, same result as v0.10.32

julia> g2 = ForwardDiff.gradient(x -> sum(cos(x)), y)
┌ Warning: ishermitian(A) was true for this matrix, but will become false due to Dual numbers.
│   caller = issymmetric at generic.jl:1177 [inlined]
└ @ Core ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/generic.jl:1177
2×2 Matrix{Float64}:
 -0.989728  -1.81858
  0.0       -0.81771

julia> ForwardDiff.gradient(x -> sum(cos(Hermitian(x))), y)  # doesn't work, also fails on v0.10.32
ERROR: MethodError: no method matching eigen!(::Hermitian{ForwardDiff.Dual

julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(cos(x)), y)[1]
2×2 Matrix{Float64}:
 -0.989728  -0.90929
 -0.90929   -0.81771

julia> (g2 + g2')./2
2×2 Matrix{Float64}:
 -0.989728  -0.90929
 -0.90929   -0.81771

julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(cos(Hermitian(x))), y)[1]
2×2 Matrix{Float64}:
 -0.989728     -1.81858
 -8.01163e-16  -0.81771

@mcabbott
Copy link
Member

mcabbott commented Nov 12, 2022

That's a lot of text, note that your f is different from the first message's f.

FiniteDifferences is broken for all examples (errors or returns incorrect gradients).

No, FiniteDifferences.grad(fdm, g, x) is returning the same result as pre-481 ForwardDiff.

I believe it is broken when it has Hermitian(x) as an argument. In my example immediately above, this makes a Hermitian and writes into it, rather than projecting onto the symmetric space:

julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(cos(x)), Hermitian(y))[1]
2×2 Hermitian{Float64, Matrix{Float64}}:
 -0.989728  -1.81858
 -1.81858   -0.81771
 
julia> FiniteDifferences.grad(central_fdm(7, 1), x -> x[1,2] + x[2,1], Hermitian(y))[1]
2×2 Hermitian{Float64, Matrix{Float64}}:
 9.35703e-16  2.0
 2.0          9.35703e-16

At least this is wrong for reverse-mode AD, cotangents. I am less than 100% sure it's wrong for forward mode, tangents.

@mcabbott
Copy link
Member

mcabbott commented Nov 12, 2022

safest way would be to just fix and implement cholesky similar to ChainRules (which was also was ForwardDiff returned in previous versions)

Does this work? I am a little concerned it's going to introduce ambiguities. Also not sure it covers all cases:

@eval ForwardDiff begin
using LinearAlgebra: NoPivot, checksquare, checkpositivedefinite, Cholesky, BlasInt, Hermitian
function LinearAlgebra.cholesky!(A::AbstractMatrix{<:Dual}, ::NoPivot = NoPivot(); check::Bool = true)
    checksquare(A)
    if !ishermitian(value.(A))  # this check is quite expensive, could be done better?
        @info "path 1" check
        check && checkpositivedefinite(-1)
        return Cholesky(A, 'U', convert(BlasInt, -1))
    else
        @info "path 2"
        return cholesky!(Hermitian(A), NoPivot(); check = check)
    end
end
end
julia> ForwardDiff.gradient(x -> sum(sin, cholesky(x).U), y)  # matches Zygote
[ Info: path 2
2×2 Matrix{Float64}:
 0.16777  0.682544
 0.0      0.454654

@torfjelde
Copy link
Contributor Author

torfjelde commented Nov 14, 2022

I am a little concerned it's going to introduce ambiguities.

The reason why I didn't mention this approach is because I was thinking the same :/ Seems doomed to introduce a bunch of ambiguities given the amount of AbstractArray impl floating around. But I don't currently see a better approach..

@torfjelde
Copy link
Contributor Author

I ask this to get an idea of the lay of the land, since probably you know this better. I have never wanted to have anything to do with cholesky except for fixing problems like this.

Ah gotcha! cholesky is used heavily (and explicitly) in many implementations of distributions, and there are many use-cases where you want to be able to differentiate both wrt. realizations and the parameter of a distribution (any gradient-based sampler, MLE/MAP, etc.). And in these cases, you're usually updating these values, and so you're not being able to guarantee that the resulting matrix is Hermitian at any given time, i.e. you cannot just do cholesky(Hermitian(x)) or cholesky(x, check=false), unfortunately.

@mcabbott
Copy link
Member

Not a very careful check, but with the above code, I wasn't sure whether Diagonal{Dual} would be ambiguous. But it seems not to be:

julia> methods(cholesky!)
# 19 methods for generic function "cholesky!" from LinearAlgebra:
...
 [12] cholesky!(A::Diagonal)
     @ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:818
 [13] cholesky!(A::Diagonal, ::NoPivot; check)
     @ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:818
 [14] cholesky!(A::Diagonal, ::Val{false}; check)
     @ deprecated.jl:103
 [15] cholesky!(A::AbstractMatrix{<:Dual})
     @ ForwardDiff REPL[19]:3
...
 [18] cholesky!(A::AbstractMatrix)
     @ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/cholesky.jl:294

julia> cholesky!(Diagonal(rand(3) .+ Dual(0,1)))
Cholesky{Dual{Nothing, Float64, 1}, Diagonal{Dual{Nothing, Float64, 1}, Vector{Dual{Nothing, Float64, 1}}}}
U factor:
3×3 Diagonal{Dual{Nothing, Float64, 1}, Vector{Dual{Nothing, Float64, 1}}}:
 Dual{Nothing}(0.546063,0.915645)                         
                                                          
                                     Dual{Nothing}(0.909924,0.549496)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants