Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gradient interface, support AbstractDifferentiation #90

Merged
merged 25 commits into from
Jan 21, 2024

Conversation

lostella
Copy link
Member

@lostella lostella commented Jan 6, 2024

  1. Remove dependency on Zygote in favor of AbstractDifferentiation (Autodiff backends #71, Remove Zygote dependency by switching to AbstractDifferentiation #80, cc @gdalle)
  2. Update the interface to gradient computation used by algorithms, see value_and_pullback (similar to AbstractDifferentiation value_and_pullback_function)
  3. Update stepsize backtracking routines, to only evaluate gradients at the end (made possible by the changes in 2.) speeding things up (need to benchmark this but it’s reasonable to expect some speedup)
  4. Update tests and documentation
  5. Add justfile (see just)

Work in progress, opening for feedback and to take a look at CI workflows.

TODOs

  • expand on AD in README and docs landing page
  • update benchmarks code
  • benchmark changes to backtracking

Copy link

codecov bot commented Jan 6, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (4d07b28) 89.84% compared to head (fd98409) 89.03%.

❗ Current head fd98409 differs from pull request most recent head 68a80a0. Consider uploading reports for the commit 68a80a0 to get more accurate results

Files Patch % Lines
src/algorithms/li_lin.jl 66.66% 2 Missing ⚠️
src/algorithms/panoc.jl 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
- Coverage   89.84%   89.03%   -0.81%     
==========================================
  Files          21       20       -1     
  Lines         955      985      +30     
==========================================
+ Hits          858      877      +19     
- Misses         97      108      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/src/guide/custom_objectives.jl Outdated Show resolved Hide resolved
@@ -70,7 +73,8 @@ solution, iterations = ffb(x0=ones(2), f=quadratic_cost, g=box_indicator)

# We can verify the correctness of the solution by checking that the negative gradient is orthogonal to the constraints, pointing outwards:

-ProximalAlgorithms.gradient(quadratic_cost, solution)[1]
v, pb = ProximalAlgorithms.value_and_pullback(quadratic_cost, solution)
-pb()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain that the pullback outputs the gradient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed


function value_and_pullback(f::AutoDifferentiable, x)
fx, pb = AbstractDifferentiation.value_and_pullback_function(f.backend, f.f, x)
return fx, () -> pb(one(fx))[1]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Striclty speaking this is not a pullback if it takes no input. The point of a pullback is to take a cotangent and pull it back into the input space.
In this case, the cotangent is a scalar, and taking this scalar to be 1 returns the gradient, so it's okay, but the terminology might confuse autodiff people

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah right, I was unsure here, thanks for pointing that out. I guess I could call it simply “closure”

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_and_gradient_closure other something like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

"""
AutoDifferentiable(f, backend)

Wrap function `f` to be auto-differentiated using `backend`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify that it's a callable struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed


(f::ZygoteFunction)(x) = f.f(x)

function ProximalCore.gradient!(grad, f::ZygoteFunction, x)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff would be much smaller if you kept your functions gradient and gradient! but made them rely on value_and_pullback instead, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, but I’m thinking that these primitives don’t really fit the interface of ProximalCore maybe, see JuliaFirstOrder/ProximalCore.jl#3

test/utilities/test_ad.jl Show resolved Hide resolved
lostella and others added 2 commits January 17, 2024 17:25
Co-authored-by: Guillaume Dalle <22795598+gdalle@users.noreply.github.com>
@lostella lostella mentioned this pull request Jan 20, 2024
2 tasks
@lostella lostella merged commit 9067e90 into JuliaFirstOrder:master Jan 21, 2024
7 of 8 checks passed
@lostella lostella deleted the new-gradients branch January 21, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Zygote dependency by switching to AbstractDifferentiation Autodiff backends
2 participants