Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useful packages for (sparse) differentiation? #229

Open
gdalle opened this issue May 16, 2024 · 11 comments
Open

Useful packages for (sparse) differentiation? #229

gdalle opened this issue May 16, 2024 · 11 comments

Comments

@gdalle
Copy link
Contributor

gdalle commented May 16, 2024

Hey there friends of JSO!

@adrhill and I have been hard at work on two autodiff toolkits which you might find useful:

  • DifferentiationInterface.jl, an interface to every possible autodiff package with support for derivative, gradient, jacobian, hessian, JVP, VJP, HVP. It also supports sparse autodiff with a coloring-based approach.
  • SparseConnectivityTracer.jl, a much faster alternative to Symbolics.jl for sparsity pattern detection.

We would be happy to discuss potential uses, and integration with your ecosystem. We're having similar talks with Optimization.jl and the SciML organization at large.

Ping @amontoison @abelsiqueira since we've already chatted a few times :)

@amontoison
Copy link
Member

amontoison commented May 17, 2024

@gdalle
I propose to start by replacing SparseDiffTools.jl and Symbolics.jl by SparseConnectivityTracer.jl.

We just need to update sparse_sym.jl and sparse_diff_tools.jl.

@gdalle
Copy link
Contributor Author

gdalle commented May 17, 2024

Sounds good! We're currently in the middle of a big refactor for SparseConnectivityTracer.jl, so it will have to wait a few days

@adrhill
Copy link

adrhill commented May 17, 2024

The refactor has been merged and SparseConnectivityTracer v0.4.0 has just been tagged, so it's ready to be broken by you folks! 😁

PRs adding @test_broken tests are more than welcome.

@amontoison
Copy link
Member

amontoison commented May 18, 2024

@adrhill @gdalle
I started to interface SparseConnectivityTracer.jl quickly tonight and I have a few feedbacks:

  • Is SparseConnectivityTracer.jacobian_pattern(c!, cx, x0) more efficient than SparseConnectivityTracer.jacobian_pattern(c, x0) ?
    Note that SparseConnectivityTracer.jacobian_pattern(c!, y, x) is not documented.
  • Is it possible to add an option to only return one triangle of the sparsity pattern of the Hessian? You could add a keyword argument with possible values F (default), L or U.
  • Can we exploit the symmetry of the hessian to determine it more efficiently?
  • In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of ℓ(x) = f(x) + dot(c(x), y). I have an error because SparseConnectivityTracer doesn't like dot. I can hack it with the help of a sum.
  • Can you do better if you we provide f and c instead of directly as input?
  • Can we compute in the the sparsity structure of the Jacobian of c(x) and the Hessian of augmented Lagrangian l(x) = f(x) + dot(c(x), y) to have a speed-up?

MWE:

using SparseConnectivityTracer

y = zeros(3)
x = rand(3)

function J(y, x)
  y[1] = x[1]^2
  y[2] = 2 * x[1] * x[2]^2
  y[3] = sin(x[3])
  return y
end

S = jacobian_pattern(J, y, x)


x = rand(3)
y = rand(2)

f(x) = 2 * x[1] * x[2]^2
c(x) = [x[3]^3; x[3]^2]
S = hessian_pattern(f, x)

(x) = f(x) + dot(c(x), y)
S = hessian_pattern(ℓ, x)
ERROR: MethodError: no method matching conj(::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}})

Closest candidates are:
  conj(::Missing)
   @ Base missing.jl:101
  conj(::Complex)
   @ Base complex.jl:282
  conj(::SymTridiagonal)
   @ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/tridiag.jl:160
  ...

Stacktrace:
 [1] dot(x::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}, y::Float64)
   @ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:876
 [2] dot(x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}}, y::Vector{Float64})
   @ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:886
 [3] (x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}})
   @ Main ./REPL[41]:1
 [4] trace_function(::Type{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{}}}}, f::typeof(ℓ), x::Vector{Float64})
   @ SparseConnectivityTracer ~/.julia/packages/SparseConnectivityTracer/JvHcU/src/pattern.jl:32
 [5] hessian_pattern(f::Function, x::Vector{Float64}, ::Type{BitSet}, ::Type{Set{Tuple{Int64, Int64}}})
   @ SparseConnectivityTracer ~/.julia/packages/SparseConnectivityTracer/JvHcU/src/pattern.jl:326
 [6] top-level scope
   @ REPL[42]:1
Some type information was truncated. Use `show(err)` to see complete types.

If I also preallocate the storage of c for the Hessian of the augmented Lagrangian, SCT.jl returns an error because the storage of c is not correct.

function compute_hessian_sparsity(f, nvar, c!, ncon)
  if ncon == 0
    x0 = rand(nvar)
    S = SparseConnectivityTracer.hessian_pattern(f, x0)
    return S
  else
    x0 = rand(nvar)
    y0 = rand(ncon)
    cx = similar(y0)
    (x) = f(x) + sum(c!(cx, x)[i] * y0[i] for i = 1:ncon)
    S = SparseConnectivityTracer.hessian_pattern(ℓ, x0)
    return S
  end
end
Basic Jacobian derivative with backend=ADNLPModels.SparseADJacobian and T=Float64: Error During Test at /home/alexis/Bureau/git/ADNLPModels.jl/test/sparse_jacobian.jl:6
  Got exception outside of a @test
  TypeError: in typeassert, expected Float64, got a value of type SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}
  Stacktrace:
    [1] setindex!(A::Vector{Float64}, x::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}, i1::Int64)
      @ Base ./array.jl:1021
    [2] (::var"#c!#21")(cx::Vector{Float64}, x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}})
      @ Main ~/Bureau/git/ADNLPModels.jl/test/sparse_jacobian.jl:10
    [3] (::ADNLPModels.var"#35#37"{Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}}, var"#c!#21", Vector{Float64}, Vector{Float64}})(i::Int64)

Maybe the best solution is to not provide c in-place?

@tmigot
Copy link
Member

tmigot commented May 18, 2024

Hey @gdalle @adrhill ! I love the idea and agree with @amontoison the sparse Jacobian/Hessian is a good starting point.

For the DifferentiationInterface.jl, the best would be if you could provide an example that does this:

T = Float64
n = 2
x0 = rand(T, n)
f(x) = sum(x) # or something
function c!(cx, x)
cx[1] = x[1]^2 + x[2]^2 + x[1] * x[2] 
# a) compute gradient in-place
gx = similar(x0)
# ... ∇f(x)
# b) Hessian-vector product in-place
v = ones(2)
# ... ∇f²(x)*v
# c) Jacobian-vector product
# ∇c!(cx, x) * v
# d) transpose Jacobian-vector product
# ∇c!(cx, x)ᵀ * v
# e) Lagrangian Hessian-vector product
# ... (∇f²(x) + y_i ∇²cᵢ!(cx, x))*v

Usually, we store everything you need to compute the derivatives in a backend that is initialized with the model.
The assumption is that we will do the derivatives multiple times.

@gdalle
Copy link
Contributor Author

gdalle commented May 18, 2024

Here are some answers to your questions:

Is SparseConnectivityTracer.jacobian_pattern(c!, cx, x0) more efficient than SparseConnectivityTracer.jacobian_pattern(c, x0) ?
Note that SparseConnectivityTracer.jacobian_pattern(c!, y, x) is not documented.

The rationale behind it is that f!(y, x) is more efficient than y = f(x) for array-to-array functions. Thus, some functions are implemented in-place, and we need to support them in Jacobian sparsity detection. However, the actual gain of efficiency within the sparsity detection is minimal, since lots of allocations occur anyway due to set manipulations.

Is it possible to add an option to only return one triangle of the sparsity pattern of the Hessian? You could add a keyword argument with possible values F (default), L or U.
Can we exploit the symmetry of the hessian to determine it more efficiently?

At the moment we do not exploit the symmetry of the Hessian. Internally, the Hessian pattern is represented as a Set{Tuple{Int,Int}}, and we make no efforts to only store one triangle. It's "only" a factor-2 efficiency gain, so it's clearly something we wish we did, but:

  • it would complexify the code, which is still very much in flux
  • there are other optimizations with much larger potential improvements that we want to try first

In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of ℓ(x) = f(x) + dot(c(x), y). I have an error because SparseConnectivityTracer doesn't like dot. I can hack it with the help of a sum.

Essentially we need to add some array rules to SparseConnectivityTracer, and dot will be one of the very first.

In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of ℓ(x) = f(x) + dot(c(x), y). Can you do better if you we provide f and c instead of directly ℓ as input? Can we compute in the the sparsity structure of the Jacobian of c(x) and the Hessian of augmented Lagrangian l(x) = f(x) + dot(c(x), y) to have a speed-up?

For SparseConnectivityTracer you can always detect the sparsity pattern of f and c separately, then combine them. But of course it won't be done automatically by the package, so you need to assemble the parts manually.

For DifferentiationInterface, @timholy asked a similar question a week ago, and he had some suggestions:

Feel free to weigh in on those discussions.

If I also preallocate the storage of c for the Hessian of the augmented Lagrangian, SCT.jl returns an error because the storage of c is not correct. Maybe the best solution is to not provide c in-place?

Don't provide c in-place, because SparseConnectivityTracer uses custom number types and operator overloading (like ForwardDiff).

@gdalle
Copy link
Contributor Author

gdalle commented May 18, 2024

@tmigot here's an example doing everything you asked. The mantra of DifferentiationInterface is "prepare once, reuse multiple times", so it fits well with an optimization point of view.
Note that although the differentiation overwrites the storage you provide, we cannot guarantee that other allocations won't occur.

using DifferentiationInterface
import Enzyme, ForwardDiff, Zygote

forward_backend = AutoForwardDiff()
reverse_backend = AutoEnzyme(Enzyme.Reverse)  # supports mutation
second_order_backend = SecondOrder(AutoForwardDiff(), AutoZygote())

T = Float64
n = 2  # variables
m = 1  # constraints
x0 = rand(T, n)

obj(x) = sum(x)

function cons!(c, x)
    c[1] = sum(abs2, x) - one(eltype(x))
    return nothing
end

# a) gradient

extras_a = prepare_gradient(obj, reverse_backend, x0)
g0 = similar(x0)
gradient!(obj, g0, reverse_backend, x0, extras_a)

# b) Hessian-vector product

extras_b = prepare_hvp(obj, second_order_backend, x0, v0)
p0 = similar(x0)
hvp!(obj, p0, second_order_backend, x0, v0, extras_b)

# c) Jacobian-vector product

c0 = ones(T, m)
extras_c = prepare_pushforward(cons!, c0, forward_backend, x0, v0)
p0 = similar(c0)
pushforward!(cons!, c0, p0, forward_backend, x0, v0, extras_c)

# d) transpose Jacobian-vector product

extras_d = prepare_pullback(cons!, c0, reverse_backend, x0, v0)
p0 = similar(x0)
pullback!(cons!, c0, p0, reverse_backend, x0, v0, extras_d)

# e) Lagrangian Hessian-vector product

# not sure what you mean by this one, can either be obtained by summing the HVPs of `f` and `ci` or by taking the HVP of the Lagrangian as a whole

I think, as above, the main point of discussion will be handling the Lagrangian aspect

@gdalle
Copy link
Contributor Author

gdalle commented Jun 27, 2024

@amontoison as of v0.5.6 (released this morning), DifferentiationInterface supports vector-mode pushforwards, pullbacks and HVPs, which makes Jacobians and Hessians much faster. It only works with ForwardDiff for now, but I want to add Enzyme soon.
That was the last ingredient necessary for DI to be feature-complete, at least for one-argument functions. From now on, if something doesn't work well, it will likely be due to a missing or suboptimal implementation, but not to a hole in the design.
I think we should start discussing its integration in ADNLPModels. The hardest remaining issue will be handling the Lagrangian, which takes 2 inputs ($x$ and $\lambda$), but I would very much benefit from hashing out possible solutions with you.

@amontoison
Copy link
Member

@gdalle We can do a meeting next week if you are available to talk about that.

@gdalle
Copy link
Contributor Author

gdalle commented Jul 3, 2024

Unfortunately I'll be at JuliaCon all week! The week after?

@amontoison
Copy link
Member

Yes, perfect.
Enjoy JuliaCon 2024!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants