Special case x^0 #331

tkf · 2018-07-12T22:21:46Z

In current released (and master; 3f62885) ForwardDiff.jl, n+1-th derivative of x^n at x = 0 is NaN:

julia> Pkg.installed("ForwardDiff")
v"0.7.5"

julia> using ForwardDiff

julia> ForwardDiff.NANSAFE_MODE_ENABLED
true

julia> nthderiv(f, n, x = 0.0) =
           if n == 1
               ForwardDiff.derivative(f, x)
           else
               ForwardDiff.derivative(z -> nthderiv(f, n - 1, z), x)
           end
nthderiv (generic function with 2 methods)

julia> @assert all(nthderiv(x -> x^n, n) == factorial(n) for n in 1:5)

julia> nthderiv(x -> x^0, 1)
NaN

julia> nthderiv(x -> x^1, 2)
NaN

julia> nthderiv(x -> x^2, 3)
NaN

julia> nthderiv(x -> x^3, 4)
NaN

This issue can be observed more directly in nested dual number. Namely, if I have "n-th order" dual number (is it the right word?) with the .value component being 0, its n - 1-th power produces NaN in the .partials component:

julia> using ForwardDiff: Dual

julia> x0 = 0.0
0.0

julia> x1 = Dual{:t1}(x0, 1.0)
Dual{:t1}(0.0,1.0)

julia> x2 = Dual{:t2}(x1, 1.0)
Dual{:t2}(Dual{:t1}(0.0,1.0),Dual{:t1}(1.0,0.0))

julia> x3 = Dual{:t3}(x2, 1.0)
Dual{:t3}(Dual{:t2}(Dual{:t1}(0.0,1.0),Dual{:t1}(1.0,0.0)),
          Dual{:t2}(Dual{:t1}(1.0,0.0),Dual{:t1}(0.0,0.0)))

julia> x3 * x3  # OK
Dual{:t3}(Dual{:t2}(Dual{:t1}(0.0,0.0),Dual{:t1}(0.0,2.0)),
          Dual{:t2}(Dual{:t1}(0.0,2.0),Dual{:t1}(2.0,0.0)))

julia> x3^2
Dual{:t3}(Dual{:t2}(Dual{:t1}(0.0,0.0),Dual{:t1}(0.0,2.0)),
          Dual{:t2}(Dual{:t1}(0.0,2.0),Dual{:t1}(2.0,NaN)))

julia> x2^1
Dual{:t2}(Dual{:t1}(0.0,1.0),Dual{:t1}(1.0,NaN))

julia> x1^0
Dual{:t1}(1.0,NaN)

All I need to make it "work" is to special-case Dual(0.0, ...) ^ 0 (x1^0 above). I'm not familiar with the whole ForwardDiff.jl codebase so I'm not entirely sure if this is the right "fix" (or what I see here is actually a bug). But I can come up with some rationale for why this PR could be the right way to go:

It's more consistent. Identities such as x3^2 === x3 * x3 and x2^1 === x2 (note the triple equal === for comparing .partials) for nested Dual didn't hold without this PR.
Taking the derivative of x^y at x = y = 0 is problematic if both x and y are variables. However, if y = 0 is a non-Dual constant, I suppose we don't have such problem since this case corresponds to taking the limit y -> 0 first and then do another limit for x to compute the derivative.

What do you think?

fixes #330

fixes JuliaDiff#330

tkf · 2018-07-12T22:38:10Z

test/DualTest.jl

@@ -455,4 +455,14 @@ for N in (0,3), M in (0,4), V in (Int, Float32)
    end
 end

+@testset "Exponentiation of zero" begin


I thought to mix this with the rest of the test but then can't think of a good way to turn this into a random test. So I created a separated testset. I also noticed that @testset is not used anywhere but I assumed that's only because ForwardDiff.jl predates @testset. I hope using @testset is OK.

Yup, that's fine.

jrevels · 2018-07-13T13:28:19Z

src/dual.jl

+                else
+                    new_partials = partials(x) * y * ($f)(v, y - 1)
+                end
+                return Dual{Tx}(expv, new_partials)


Looks good. I think the main concern here is performance/SIMDability.

A cleaner way to write this that might be easier on the compiler is

new_partials = ifelse(iszero(y), zero(partials(x)), partials(x) * y * ($f)(v, y - 1))

It'd be worth benchmarking this/looking at generated LLVM code to ensure there aren't any substantial regressions.

Sure, I'll do some benchmarks.

I guess we can also define Base.literal_pow to do the branching at the compile time for some extra boost. Though it only works for the case exponent is an integer literal.

tkf · 2018-07-14T03:50:46Z

So I tried some benchmarks to compare this branch and the master. For example:

using ForwardDiff: Dual
using BenchmarkTools

const SUITE = BenchmarkGroup()

function log1mx_taylor(p, x)
    y = -x
    @simd for n in 2:p
        @inbounds y -= x^n / n
    end
    return y
end

for x in Dual.(-0.1:0.1:0.1, 1.0)
    SUITE["log(1-$x)"] = @benchmarkable log1mx_taylor(1000, $x)
end

judge reports that time ratio = 1.07 (5%) only for x = Dual(0.0,1.0).

Link to the full output: https://gist.github.com/tkf/03e0b87c37db85b60bad91fef0a3efcb/967886d655b2268a9a3978700a02cd5f1e3acd15#file-bench_log1mx_taylor_judge-md

The gist includes another benchmark (exp_taylor) with similar "almost invariant" result.

I looked at diff of code_native output but I don't see a big difference (by that, I mean something very naive: it looks like they are touching the same amount of xmm registers):
https://gist.github.com/tkf/03e0b87c37db85b60bad91fef0a3efcb/967886d655b2268a9a3978700a02cd5f1e3acd15#file-bench_log1mx_taylor_native-diff

In the other case (exp_taylor), code_native outputs were identical: https://gist.github.com/tkf/03e0b87c37db85b60bad91fef0a3efcb/967886d655b2268a9a3978700a02cd5f1e3acd15#file-bench_exp_taylor_native-diff

So, does it mean it's good to go? The fact that code_native outputs were identical for one case (exp_taylor) makes me wonder if I'm writing a proper benchmark. Or it's just that Julia and LLVM are super smart?

tkf · 2018-07-18T23:40:17Z

So I keep playing with the benchmark and I couldn't find clear difference.

I wonder if this PR eliminates any possibility for SIMD utilization that was previously possible. First of all, we preserve "trivial" SIMD-ability. For example, with -O3,

using ForwardDiff: Dual
a = Dual(1:1.0:5...)
@code_llvm a^2

does print fmul <4 x double> in the master and this branch. This is because the SIMD-able code (multiplying partials(x) with 4 elements) is inside the if block in this branch.

So, I think the loss of SIMD-ability has to be assessed for the possibility of SIMD across multiple ^ calls. For example, I tried

using ForwardDiff: Dual
using StaticArrays
x = Dual(1.0, 2.0)
xs = SVector(x, x, x, x)
@inline pow2(x) = x^2
f(xs) = pow2.(xs)
@code_llvm f(xs)

which does not print <4 x double> (or <2 x double>) even in the master branch. Note that above example produces SIMD-ed code for vector of floats: @code_llvm f(SVector(1:1.0:4...)).

The only way that I could come up with to make the above function f generates SIMD-ed code was to implement literal_pow. I'm preparing another PR for this change.

So, I think it means that we don't loose much with this PR. The benchmarks (in the previous post) with more complex code support this as well. What do you think?

jrevels · 2018-07-19T12:11:12Z

Awesome, thanks for the performance analysis here. Let's merge this and I'll go take a look at #332.

Special case x^0

6d0eb49

fixes JuliaDiff#330

tkf commented Jul 12, 2018

View reviewed changes

jrevels reviewed Jul 13, 2018

View reviewed changes

tkf mentioned this pull request Jul 19, 2018

Enable SIMD with Base.literal_pow #332

Merged

jrevels merged commit 123cc19 into JuliaDiff:master Jul 19, 2018

dpsanders mentioned this pull request Oct 9, 2019

Special case x^1 #412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special case x^0 #331

Special case x^0 #331

tkf commented Jul 12, 2018

tkf Jul 12, 2018

jrevels Jul 13, 2018

jrevels Jul 13, 2018

tkf Jul 13, 2018

tkf commented Jul 14, 2018 •

edited

tkf commented Jul 18, 2018

jrevels commented Jul 19, 2018

Special case x^0 #331

Special case x^0 #331

Conversation

tkf commented Jul 12, 2018

tkf Jul 12, 2018

Choose a reason for hiding this comment

jrevels Jul 13, 2018

Choose a reason for hiding this comment

jrevels Jul 13, 2018

Choose a reason for hiding this comment

tkf Jul 13, 2018

Choose a reason for hiding this comment

tkf commented Jul 14, 2018 • edited

tkf commented Jul 18, 2018

jrevels commented Jul 19, 2018

tkf commented Jul 14, 2018 •

edited