Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using DifferentiationInterface #140

Conversation

gdalle
Copy link
Contributor

@gdalle gdalle commented May 29, 2024

This PR is the beginning of a solution for #25. It shows how you can use DifferentiationInterface to compute derivatives in a backend-agnostic fashion.
It seems mergeable to me, but here are some more things you can do to increase performance:

  • Figure out if you need several backend objects or just one. For instance, derivatives are more efficient with a forward-mode backend, while gradients are usually more efficient with a reverse-mode backend.
  • Adjust the public API so that you can pass the backend object(s) from the user-facing functions all the way down to the utility functions.
  • Decide if you can reuse the extras that arise from the preparation step, and if so, adjust the code to initialize them once and then pass them around.

Ping @ocots @jbcaillau

@jbcaillau jbcaillau changed the base branch from main to differentiationinterface May 29, 2024 21:10
Project.toml Show resolved Hide resolved
src/CTBase.jl Show resolved Hide resolved
src/utils.jl Outdated Show resolved Hide resolved
@jbcaillau
Copy link
Member

Hi @gdalle; many thanks for the PR!

This PR is the beginning of a solution for #25. It shows how you can use DifferentiationInterface to compute derivatives in a backend-agnostic fashion. It seems mergeable to me, but here are some more things you can do to increase performance:

  • Figure out if you need several backend objects or just one. For instance, derivatives are more efficient with a forward-mode backend, while gradients are usually more efficient with a reverse-mode backend.

Sure. Right now forward mode is OK for the internal AD we need. (Mostly taking gradients of functions with < 1e2 variables.) There would be good reasons to switch to reverse, though. To be tested.

  • Adjust the public API so that you can pass the backend object(s) from the user-facing functions all the way down to the utility functions.

Yes. Right now, defining a single default __auto = AutoForwardDiff can do the job, but good point.

  • Decide if you can reuse the extras that arise from the preparation step, and if so, adjust the code to initialize them once and then pass them around.

Just have had a look at DifferentiationInterface doc: very nice mechanism. Will be very interesting to use when (i) solving an ADNLPModel (iterative calls to gradients of objective and constraints) in CTDirect.jl, (ii) solving a shooting equation (Newton like method).

@gdalle
Copy link
Contributor Author

gdalle commented May 30, 2024

I made the changes, not sure why CI failed the first time but the tests pass locally (at least on 1.10). I guess it's related to your local registry

Copy link
Member

@jbcaillau jbcaillau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gdalle your suggestion to change the API is indeed the good one to have a full parametrisation by a single backend. The more sophisticated alternative being to be able to specify and combine several backends (e.g. to compute second order derivatives).

@gdalle
Copy link
Contributor Author

gdalle commented May 30, 2024

The more sophisticated alternative being to be able to specify and combine several backends (e.g. to compute second order derivatives).

Yes, in the general case you may want to let the user specify

  • a forward mode backend for scalar derivatives
  • a reverse mode backend for gradients
  • a (sparse) forward mode backend for Jacobians
  • a (sparse) forward-over-reverse DifferentiationInterface.SecondOrder backend for Hessians

But for problems with <100 variables, I think ForwardDiff is close to optimal in all of these settings, provided you reuse the preparation step.

@gdalle
Copy link
Contributor Author

gdalle commented May 30, 2024

Did you figure out why CI fails?

@jbcaillau jbcaillau merged commit 7440621 into control-toolbox:differentiationinterface May 30, 2024
0 of 4 checks passed
@ocots
Copy link
Member

ocots commented May 30, 2024

Did you figure out why CI fails?

Maybe the file .github/worflows/CI.yml has to be updated:

  • actions/checkout@v4
  • uses: julia-actions/add-julia-registry@v2
    steps:
      - uses: actions/checkout@v4
      - uses: julia-actions/setup-julia@latest
        with:
          version: ${{ matrix.version }}
          arch: ${{ matrix.arch }}
      - uses: julia-actions/add-julia-registry@v2
        with:
          key: ${{ secrets.SSH_KEY }}
          registry: control-toolbox/ct-registry
      - uses: julia-actions/julia-runtest@latest
      - uses: julia-actions/julia-uploadcodecov@latest
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

@ocots
Copy link
Member

ocots commented May 30, 2024

In the package CTFlows.jl, we build a rhs of an ode computing derivatives of a function. See here.

Here is the piece of code:

function rhs(h::AbstractHamiltonian)
    function rhs!(dz::DCoTangent, z::CoTangent, v::Variable, t::Time)
        n      = size(z, 1) ÷ 2
        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)
        dh     = ctgradient(foo, z)
        dz[1:n]    =  dh[n+1:2n]
        dz[n+1:2n] = -dh[1:n]
    end
    return rhs!
end

If we want to use the preparation step, should we keep track of extras inside rhs or ctgradient?

function rhs(h::AbstractHamiltonian)

    # compute preparation step here to get extras?

    function rhs!(dz::DCoTangent, z::CoTangent, v::Variable, t::Time)
        n      = size(z, 1) ÷ 2
        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)

        dh     = ctgradient(foo, z)  # use extras as an argument to ctgradient?

        dz[1:n]    =  dh[n+1:2n]
        dz[n+1:2n] = -dh[1:n]
    end
    return rhs!
end

@jbcaillau
Copy link
Member

@ocots side note: any reason to write this

        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)

instead of

        foo(z) = h(t, z[1:n], z[n+1:2n], v)

@jbcaillau
Copy link
Member

The more sophisticated alternative being to be able to specify and combine several backends (e.g. to compute second order derivatives).

Yes, in the general case you may want to let the user specify

  • a forward mode backend for scalar derivatives
  • a reverse mode backend for gradients
  • a (sparse) forward mode backend for Jacobians
  • a (sparse) forward-over-reverse DifferentiationInterface.SecondOrder backend for Hessians

✅ check this WIP

But for problems with <100 variables, I think ForwardDiff is close to optimal in all of these settings, provided you reuse the preparation step.

@jbcaillau
Copy link
Member

checks passed, well done @ocots 👍🏽 please document somewhere what you did to solve the CI issue (Error: Input required and not supplied: key)

Did you figure out why CI fails?

Maybe the file .github/worflows/CI.yml has to be updated:

@gdalle
Copy link
Contributor Author

gdalle commented May 30, 2024

If we want to use the preparation step, should we keep track of extras inside rhs or ctgradient?

The general rules of preparation are given in this section of the docs. See in particular the paragraph on reusing extras.

The trouble here is that you need the function foo to prepare the gradient operator. If it is generated inside rhs! as a closure, then it seems you cannot do much preparation at all.

@ocots
Copy link
Member

ocots commented May 30, 2024

@ocots side note: any reason to write this

        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)

instead of

        foo(z) = h(t, z[1:n], z[n+1:2n], v)

I thing it is because of the scalar case.

Capture d’écran 2024-05-30 à 11 39 29

@ocots
Copy link
Member

ocots commented May 30, 2024

checks passed, well done @ocots 👍🏽 please document somewhere what you did to solve the CI issue (Error: Input required and not supplied: key)

Did you figure out why CI fails?

Maybe the file .github/worflows/CI.yml has to be updated:

See here.

@jbcaillau
Copy link
Member

checks passed, well done @ocots 👍🏽 please document somewhere what you did to solve the CI issue (Error: Input required and not supplied: key)

Did you figure out why CI fails?

Maybe the file .github/worflows/CI.yml has to be updated:

See here.

🙏🏽 Reads "If your package depends on private packages registered in a private registry..." Indeed another reason for the next move to the general registry 🤞🏾

@jbcaillau
Copy link
Member

@ocots side note: any reason to write this

        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)

instead of

        foo(z) = h(t, z[1:n], z[n+1:2n], v)

I thing it is because of the scalar case.

Oh right. Again. No z[1:1] on a scalar.

@ocots
Copy link
Member

ocots commented May 30, 2024

If we want to use the preparation step, should we keep track of extras inside rhs or ctgradient?

The general rules of preparation are given in this section of the docs. See in particular the paragraph on reusing extras.

The trouble here is that you need the function foo to prepare the gradient operator. If it is generated inside rhs! as a closure, then it seems you cannot do much preparation at all.

Thanks for links. I see your point.

There is no possibility to differentiate a parametric function f(x, p) = y with respect to x and do a preparation step also?

@gdalle
Copy link
Contributor Author

gdalle commented May 30, 2024

There is no possibility to differentiate a parametric function f(x, p) = y with respect to x and do a preparation step also?

DifferentiationInterface was built to support 1-argument functions f(x) = y or f!(y, x). At the moment it does not support multiple arguments, and I don't think it ever will. The reason is that many AD backends themselves (like ForwardDiff) don't support multiple arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants