Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use FiniteDifferences.jl for gradient checks #464

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Conversation

oxinabox
Copy link
Member

Its really annoying worrying if your gradient check failed because you did something slightly less accurate in a custom adjoint,
or if it was because ngradients is babies first finite differencing function.
(Ok baby's 3rd it is central fdm after all)

This PR changes it over to use FiniteDifferences.jl
One probably cool switch to FiniteDiff.jl instead, but I couldn't work it out in the 2 minutes i spent looking at the README.

This is substantially more accuate than before.
for neogradient being the new code and ngradient being the old.

demo

using FiniteDifferences
function neogradient(f, xs::AbstractArray...)
  grad(central_fdm(5,1), f, xs...)
end

function ngradient(f, xs::AbstractArray...)
  grads = zero.(xs)
  for (x, Δ) in zip(xs, grads), i in 1:length(x)
    δ = sqrt(eps())
    tmp = x[i]
    x[i] = tmp - δ/2
    y1 = f(xs...)
    x[i] = tmp + δ/2
    y2 = f(xs...)
    x[i] = tmp
    Δ[i] = (y2-y1)/δ
  end
  return grads
end

const v = collect(1:0.1:100)
n2, = ngradient(x->sum(sin.(x)), v)
m2, = neogradient(x->sum(sin.(x)), v)

gold = cos.(v)

maximum(abs.(n2 .- gold))
maximum(abs.(m2 .- gold))

Results:

julia> maximum(abs.(n2 .- gold))
1.1342001304814886e-7

julia> maximum(abs.(m2 .- gold))
6.887490577867084e-12

Project.toml Outdated Show resolved Hide resolved
@oxinabox
Copy link
Member Author

This is good to go now.

@oxinabox oxinabox requested review from MikeInnes and Roger-luo and removed request for MikeInnes January 23, 2020 16:37
Copy link
Member

@MikeInnes MikeInnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ngradients is babies first finite differencing function.

Haha, very fair. Fun fact about finite differences though (which I bet FDM doesn't do either) is that you technically should deepcopy the function the first time you call it (e.g. (deepcopy(f)(x+ϵ) - f(x))/ϵ). If you don't you can get perturbation-confusion-like bugs from numerical diff.

Anyway, don't fully understand the FDM usage but as long as tests pass I think it's pretty certain that this is more solid than the current setup.

Manifest.toml Outdated Show resolved Hide resolved

function default_fdm(f::F) where F
# Attempt to choose a way of finite differencing that will avoid escaping the domain
lower, upper = realdomainrange(F)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using realdomainrange here makes me a bit uneasy... if you're implementing an adjoint it seems like it'd be hard to know that you need to do this (if you do?) and how to do it right. Not any kind of deal-breaker but maybe there's some way to avoid it.

Copy link
Member Author

@oxinabox oxinabox Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do it.
realdomainrange falls back to (-Inf, Inf)
and actually in basically every test bar a few critical ones, that is already being hit because the tests use anon functions

I actually just moved this code up from below where it was defined for those few critical ones already

@oxinabox
Copy link
Member Author

Haha, very fair. Fun fact about finite differences though (which I bet FDM doesn't do either) is that you technically should deepcopy the function the first time you call it (e.g. (deepcopy(f)(x+ϵ) - f(x))/ϵ). If you don't you can get perturbation-confusion-like bugs from numerical diff.

Can you open an issue about this in FiniteDIfferences.jl?
Also !!wat!!

@MikeInnes
Copy link
Member

Thought you'd like that one. Classic example is the one that's currently screwing with FD and FD2:

D(1) do x
  f(y) = (x = x*y)
  D(f, 1)
  D(f, 1)
end # => 0

Correct answer is one (both of our forward-mode packages say 2).

I suspect this is absolutely not worth worrying about in a finite differences package, but can open an issue if it seems useful.

Copy link
Contributor

@Roger-luo Roger-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest seems fine to me, it basically stays the same as the what current ngradient has in Zygote, but using FiniteDifferences, but I'm not sure if we are gonna just overwrite this PR later when #235 is completely done.

test/gradcheck.jl Show resolved Hide resolved
@oxinabox
Copy link
Member Author

I'm not sure if we are gonna just overwrite this PR later when #235 is completely done.

That's fine even if we do.
If this is a easy win fast,
then we can take it even if later it will be eclipsed

@oxinabox
Copy link
Member Author

Problem is FiniteDifferences is just kind of slow.
It really needs some work done to go through and cut down on allocations.
so it was timing out on conv becaue central_fdm(5,1) is just too slow, its over 7x slower than current

neogradient2 follows, should be the same as current ngradients, and indeed it does give results that are only about as accurate as current.
But it is over 3x slower.

function neogradient2(f, xs::AbstractArray...)
  grad(central_fdm(2,1), f, xs...)
end
julia> @btime n, = ngradient(x->sum(sin.(x)), v);
  15.735 ms (1984 allocations: 15.25 MiB)

julia> @btime m2 = neogradient2(x->sum(sin.(x)), v);
  56.336 ms (80768 allocations: 47.82 MiB)

julia> m3, = @btime neogradient3(x->sum(sin.(x)), v);
  73.885 ms (85720 allocations: 63.28 MiB)

julia> @btime m5 = neogradient5(x->sum(sin.(x)), v);
  111.846 ms (90675 allocations: 94.14 MiB)

central_fdm(3,1) might be a better point, since its still quiet accurate:

julia> maximum(abs.(n .- gold))
1.1342001304814886e-7

julia> maximum(abs.(m2 .- gold))
1.3454777503252302e-7

julia> maximum(abs.(m3 .- gold))
6.615960002065435e-10

julia> maximum(abs.(m5 .- gold))
6.887490577867084e-12

@oxinabox
Copy link
Member Author

oxinabox commented Feb 1, 2020

Ah, so the way to really spead up FiniteDifferences.jl is to turn off adapt.

@oxinabox
Copy link
Member Author

No longer timing out.
But some failures

@CarloLucibello CarloLucibello mentioned this pull request Jan 24, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants