Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special case x^0 #331

Merged
merged 1 commit into from Jul 19, 2018
Merged

Special case x^0 #331

merged 1 commit into from Jul 19, 2018

Conversation

tkf
Copy link
Contributor

@tkf tkf commented Jul 12, 2018

In current released (and master; 3f62885) ForwardDiff.jl, n+1-th derivative of x^n at x = 0 is NaN:

julia> Pkg.installed("ForwardDiff")
v"0.7.5"

julia> using ForwardDiff

julia> ForwardDiff.NANSAFE_MODE_ENABLED
true

julia> nthderiv(f, n, x = 0.0) =
           if n == 1
               ForwardDiff.derivative(f, x)
           else
               ForwardDiff.derivative(z -> nthderiv(f, n - 1, z), x)
           end
nthderiv (generic function with 2 methods)

julia> @assert all(nthderiv(x -> x^n, n) == factorial(n) for n in 1:5)

julia> nthderiv(x -> x^0, 1)
NaN

julia> nthderiv(x -> x^1, 2)
NaN

julia> nthderiv(x -> x^2, 3)
NaN

julia> nthderiv(x -> x^3, 4)
NaN

This issue can be observed more directly in nested dual number. Namely, if I have "n-th order" dual number (is it the right word?) with the .value component being 0, its n - 1-th power produces NaN in the .partials component:

julia> using ForwardDiff: Dual

julia> x0 = 0.0
0.0

julia> x1 = Dual{:t1}(x0, 1.0)
Dual{:t1}(0.0,1.0)

julia> x2 = Dual{:t2}(x1, 1.0)
Dual{:t2}(Dual{:t1}(0.0,1.0),Dual{:t1}(1.0,0.0))

julia> x3 = Dual{:t3}(x2, 1.0)
Dual{:t3}(Dual{:t2}(Dual{:t1}(0.0,1.0),Dual{:t1}(1.0,0.0)),
          Dual{:t2}(Dual{:t1}(1.0,0.0),Dual{:t1}(0.0,0.0)))

julia> x3 * x3  # OK
Dual{:t3}(Dual{:t2}(Dual{:t1}(0.0,0.0),Dual{:t1}(0.0,2.0)),
          Dual{:t2}(Dual{:t1}(0.0,2.0),Dual{:t1}(2.0,0.0)))

julia> x3^2
Dual{:t3}(Dual{:t2}(Dual{:t1}(0.0,0.0),Dual{:t1}(0.0,2.0)),
          Dual{:t2}(Dual{:t1}(0.0,2.0),Dual{:t1}(2.0,NaN)))

julia> x2^1
Dual{:t2}(Dual{:t1}(0.0,1.0),Dual{:t1}(1.0,NaN))

julia> x1^0
Dual{:t1}(1.0,NaN)

All I need to make it "work" is to special-case Dual(0.0, ...) ^ 0 (x1^0 above). I'm not familiar with the whole ForwardDiff.jl codebase so I'm not entirely sure if this is the right "fix" (or what I see here is actually a bug). But I can come up with some rationale for why this PR could be the right way to go:

  1. It's more consistent. Identities such as x3^2 === x3 * x3 and x2^1 === x2 (note the triple equal === for comparing .partials) for nested Dual didn't hold without this PR.
  2. Taking the derivative of x^y at x = y = 0 is problematic if both x and y are variables. However, if y = 0 is a non-Dual constant, I suppose we don't have such problem since this case corresponds to taking the limit y -> 0 first and then do another limit for x to compute the derivative.

What do you think?

fixes #330

@@ -455,4 +455,14 @@ for N in (0,3), M in (0,4), V in (Int, Float32)
end
end

@testset "Exponentiation of zero" begin
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to mix this with the rest of the test but then can't think of a good way to turn this into a random test. So I created a separated testset. I also noticed that @testset is not used anywhere but I assumed that's only because ForwardDiff.jl predates @testset. I hope using @testset is OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's fine.

else
new_partials = partials(x) * y * ($f)(v, y - 1)
end
return Dual{Tx}(expv, new_partials)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the main concern here is performance/SIMDability.

A cleaner way to write this that might be easier on the compiler is

new_partials = ifelse(iszero(y), zero(partials(x)), partials(x) * y * ($f)(v, y - 1))

It'd be worth benchmarking this/looking at generated LLVM code to ensure there aren't any substantial regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll do some benchmarks.

I guess we can also define Base.literal_pow to do the branching at the compile time for some extra boost. Though it only works for the case exponent is an integer literal.

@tkf
Copy link
Contributor Author

tkf commented Jul 14, 2018

So I tried some benchmarks to compare this branch and the master. For example:

using ForwardDiff: Dual
using BenchmarkTools

const SUITE = BenchmarkGroup()

function log1mx_taylor(p, x)
    y = -x
    @simd for n in 2:p
        @inbounds y -= x^n / n
    end
    return y
end

for x in Dual.(-0.1:0.1:0.1, 1.0)
    SUITE["log(1-$x)"] = @benchmarkable log1mx_taylor(1000, $x)
end

judge reports that time ratio = 1.07 (5%) only for x = Dual(0.0,1.0).

Link to the full output: https://gist.github.com/tkf/03e0b87c37db85b60bad91fef0a3efcb/967886d655b2268a9a3978700a02cd5f1e3acd15#file-bench_log1mx_taylor_judge-md

The gist includes another benchmark (exp_taylor) with similar "almost invariant" result.

I looked at diff of code_native output but I don't see a big difference (by that, I mean something very naive: it looks like they are touching the same amount of xmm registers):
https://gist.github.com/tkf/03e0b87c37db85b60bad91fef0a3efcb/967886d655b2268a9a3978700a02cd5f1e3acd15#file-bench_log1mx_taylor_native-diff

In the other case (exp_taylor), code_native outputs were identical: https://gist.github.com/tkf/03e0b87c37db85b60bad91fef0a3efcb/967886d655b2268a9a3978700a02cd5f1e3acd15#file-bench_exp_taylor_native-diff

So, does it mean it's good to go? The fact that code_native outputs were identical for one case (exp_taylor) makes me wonder if I'm writing a proper benchmark. Or it's just that Julia and LLVM are super smart?

@tkf
Copy link
Contributor Author

tkf commented Jul 18, 2018

So I keep playing with the benchmark and I couldn't find clear difference.

I wonder if this PR eliminates any possibility for SIMD utilization that was previously possible. First of all, we preserve "trivial" SIMD-ability. For example, with -O3,

using ForwardDiff: Dual
a = Dual(1:1.0:5...)
@code_llvm a^2

does print fmul <4 x double> in the master and this branch. This is because the SIMD-able code (multiplying partials(x) with 4 elements) is inside the if block in this branch.

So, I think the loss of SIMD-ability has to be assessed for the possibility of SIMD across multiple ^ calls. For example, I tried

using ForwardDiff: Dual
using StaticArrays
x = Dual(1.0, 2.0)
xs = SVector(x, x, x, x)
@inline pow2(x) = x^2
f(xs) = pow2.(xs)
@code_llvm f(xs)

which does not print <4 x double> (or <2 x double>) even in the master branch. Note that above example produces SIMD-ed code for vector of floats: @code_llvm f(SVector(1:1.0:4...)).

The only way that I could come up with to make the above function f generates SIMD-ed code was to implement literal_pow. I'm preparing another PR for this change.

So, I think it means that we don't loose much with this PR. The benchmarks (in the previous post) with more complex code support this as well. What do you think?

@jrevels
Copy link
Member

jrevels commented Jul 19, 2018

Awesome, thanks for the performance analysis here. Let's merge this and I'll go take a look at #332.

@jrevels jrevels merged commit 123cc19 into JuliaDiff:master Jul 19, 2018
@dpsanders dpsanders mentioned this pull request Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Third derivative of x^2 at x=0 is NaN
2 participants