Skip to content

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Oct 1, 2025

Fixes #774. Fixes #745. Closes #386.

On this PR, ForwardDiff with NaN-safe mode gives for the example in #774

julia> using ForwardDiff

julia> log(ForwardDiff.Dual{:tag}(0.0, 0.0))
Dual{:tag}(-Inf,0.0)

julia> log(ForwardDiff.Dual{:tag}(0.0, 0.0, 1.0))
Dual{:tag}(-Inf,0.0,Inf)

julia> ForwardDiff.derivative(log  zero, 1.0)
0.0

julia> f(x) = log(zero(x[1]) + x[2])
f (generic function with 1 method)

julia> ForwardDiff.gradient(f, [1.0, 0.0])
2-element Vector{Float64}:
  0.0
 Inf

and the example in #745:

julia> using ForwardDiff

julia> foo(a) = a[1] * exp(-a[2])
foo (generic function with 1 method)

julia> ForwardDiff.gradient(foo, [1., -1e3])
2-element Vector{Float64}:
  Inf
 -Inf

Based on #776, so NaN-safe mode can actually be tested in CI.


Edit: I just realized that @jrevels tried to address this issue in #386


The Rosenbrock benchmark

julia> using ForwardDiff, Chairmarks

julia> function rosenbrock(x::Vector)
           a = 1.0
           b = 100.0
           result = 0.0
           for i in 1:length(x)-1
               result += (a - x[i])^2 + b*(x[i+1] - x[i]^2)^2
           end
           return result
       end

julia> @be rand(1000) ForwardDiff.gradient($rosenbrock, _)

and the native code analysis in https://github.com/JuliaDiff/ForwardDiff.jl/issues/719#issuecomment-2484955066

julia> a = ForwardDiff.Dual(1.0,2.0,3.0,4.0,5.0)
Dual{Nothing}(1.0,2.0,3.0,4.0,5.0)

julia> @code_native debuginfo=:none ForwardDiff._mul_partials(a.partials, a.partials, 2.0, 1.0)

indicates that this PR does not impact performance when NaN-safe mode is disabled but actually can improve performance when NaN-safe mode is enabled:

master

NaN-safe mode disabled:

julia> @be rand(1000) ForwardDiff.gradient($rosenbrock, _)
Benchmark: 152 samples with 1 evaluation
 min    546.375 μs (7 allocs: 121.266 KiB)
 median 587.812 μs (7 allocs: 121.266 KiB)
 mean   627.082 μs (7 allocs: 121.266 KiB, 1.16% gc time)
 max    2.580 ms (7 allocs: 121.266 KiB, 71.90% gc time)

julia> @code_native debuginfo=:none ForwardDiff._mul_partials(a.partials, a.partials, 2.0, 1.0)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 16, 0
        .globl  _julia__mul_partials_12796      ; -- Begin function julia__mul_partials_12796
        .p2align        2
_julia__mul_partials_12796:             ; @julia__mul_partials_12796
; Function Signature: _mul_partials(ForwardDiff.Partials{4, Float64}, ForwardDiff.Partials{4, Float64}, Float64, Float64)
; %bb.0:                                ; %top
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d1 killed $d1 def $q1
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d0 killed $d0 def $q0
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ldp     q2, q3, [x0]
        fmul.2d v2, v2, v0[0]
        ldp     q4, q5, [x1]
        fmul.2d v4, v4, v1[0]
        fadd.2d v2, v2, v4
        fmul.2d v0, v3, v0[0]
        fmul.2d v1, v5, v1[0]
        fadd.2d v0, v0, v1
        stp     q2, q0, [x8]
        ret
                                        ; -- End function
        .section        __DATA,__const
        .p2align        3, 0x0                          ; @"+ForwardDiff.Partials#12798"
"l_+ForwardDiff.Partials#12798":
        .quad   "l_+ForwardDiff.Partials#12798.jit"

.set "l_+ForwardDiff.Partials#12798.jit", 5419293392
.subsections_via_symbols

NaN-safe mode enabled:

julia> @be rand(1000) ForwardDiff.gradient($rosenbrock, _)
Benchmark: 51 samples with 1 evaluation
 min    1.680 ms (7 allocs: 121.266 KiB)
 median 1.856 ms (7 allocs: 121.266 KiB)
 mean   1.985 ms (7 allocs: 121.266 KiB, 1.26% gc time)
 max    5.300 ms (7 allocs: 121.266 KiB, 64.43% gc time)

julia> @code_native debuginfo=:none ForwardDiff._mul_partials(a.partials, a.partials, 2.0, 1.0)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 16, 0
        .globl  _julia__mul_partials_10411      ; -- Begin function julia__mul_partials_10411
        .p2align        2
_julia__mul_partials_10411:             ; @julia__mul_partials_10411
; Function Signature: _mul_partials(ForwardDiff.Partials{4, Float64}, ForwardDiff.Partials{4, Float64}, Float64, Float64)
; %bb.0:                                ; %top
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d1 killed $d1 def $q1
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ldp     q4, q2, [x0]
        fsub    d5, d1, d1
        ldr     d3, [x1]
        fcmp    d5, d5
        b.vs    LBB0_3
; %bb.1:                                ; %top.L61_crit_edge
        ldr     d5, [x1, #8]
LBB0_2:                                 ; %L61
        fsub    d6, d0, d0
        fcmp    d6, d6
        cset    w9, vc
        fcmeq.2d        v6, v4, #0.0
        fcmeq.2d        v7, v2, #0.0
        uzp1.4s v6, v6, v7
        mvn.16b v6, v6
        xtn.4h  v6, v6
        umaxv.4h        h6, v6
        fmov    w10, s6
        orr     w9, w10, w9
        tst     w9, #0x1
        fmov    d6, #1.00000000
        fcsel   d0, d0, d6, ne
        fmul.2d v4, v4, v0[0]
        mov.d   v3[1], v5[0]
        fmul.2d v3, v3, v1[0]
        fadd.2d v3, v4, v3
        fmul.2d v0, v2, v0[0]
        ldr     q2, [x1, #16]
        fmul.2d v1, v2, v1[0]
        fadd.2d v0, v0, v1
        stp     q3, q0, [x8]
        ret
LBB0_3:                                 ; %L37
        ldr     d5, [x1, #8]
        fcmp    d3, #0.0
        b.ne    LBB0_2
; %bb.4:                                ; %L37
        fcmp    d5, #0.0
        b.ne    LBB0_2
; %bb.5:                                ; %L37
        ldr     d6, [x1, #16]
        fcmp    d6, #0.0
        b.ne    LBB0_2
; %bb.6:                                ; %L50
        ldr     d6, [x1, #24]
        fcmp    d6, #0.0
        fmov    d6, #1.00000000
        fcsel   d1, d1, d6, ne
        b       LBB0_2
                                        ; -- End function
        .section        __DATA,__const
        .p2align        3, 0x0                          ; @"+ForwardDiff.Partials#10413"
"l_+ForwardDiff.Partials#10413":
        .quad   "l_+ForwardDiff.Partials#10413.jit"

.set "l_+ForwardDiff.Partials#10413.jit", 4539581520
.subsections_via_symbols

this PR

NaN-safe mode disabled:

julia> @be rand(1000) ForwardDiff.gradient($rosenbrock, _)
Benchmark: 144 samples with 1 evaluation
 min    547.666 μs (7 allocs: 121.266 KiB)
 median 596.917 μs (7 allocs: 121.266 KiB)
 mean   664.194 μs (7 allocs: 121.266 KiB, 0.64% gc time)
 max    7.721 ms (7 allocs: 121.266 KiB, 92.06% gc time)

julia> @code_native debuginfo=:none ForwardDiff._mul_partials(a.partials, a.partials, 2.0, 1.0)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 16, 0
        .globl  _julia__mul_partials_8061       ; -- Begin function julia__mul_partials_8061
        .p2align        2
_julia__mul_partials_8061:              ; @julia__mul_partials_8061
; Function Signature: _mul_partials(ForwardDiff.Partials{4, Float64}, ForwardDiff.Partials{4, Float64}, Float64, Float64)
; %bb.0:                                ; %top
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d1 killed $d1 def $q1
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d0 killed $d0 def $q0
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ldp     q2, q3, [x0]
        fmul.2d v2, v2, v0[0]
        ldp     q4, q5, [x1]
        fmul.2d v4, v4, v1[0]
        fadd.2d v2, v2, v4
        fmul.2d v0, v3, v0[0]
        fmul.2d v1, v5, v1[0]
        fadd.2d v0, v0, v1
        stp     q2, q0, [x8]
        ret
                                        ; -- End function
        .section        __DATA,__const
        .p2align        3, 0x0                          ; @"+ForwardDiff.Partials#8063"
"l_+ForwardDiff.Partials#8063":
        .quad   "l_+ForwardDiff.Partials#8063.jit"

.set "l_+ForwardDiff.Partials#8063.jit", 5182817040
.subsections_via_symbols

NaN-safe mode enabled:

julia> @be rand(1000) ForwardDiff.gradient($rosenbrock, _)
Benchmark: 111 samples with 1 evaluation
 min    789.709 μs (7 allocs: 121.266 KiB)
 median 853.167 μs (7 allocs: 121.266 KiB)
 mean   899.946 μs (7 allocs: 121.266 KiB, 0.60% gc time)
 max    2.531 ms (7 allocs: 121.266 KiB, 66.18% gc time)

julia> @code_native debuginfo=:none ForwardDiff._mul_partials(a.partials, a.partials, 2.0, 1.0)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 16, 0
        .globl  _julia__mul_partials_15428      ; -- Begin function julia__mul_partials_15428
        .p2align        2
_julia__mul_partials_15428:             ; @julia__mul_partials_15428
; Function Signature: _mul_partials(ForwardDiff.Partials{4, Float64}, ForwardDiff.Partials{4, Float64}, Float64, Float64)
; %bb.0:                                ; %top
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d1 killed $d1 def $q1
        ;DEBUG_VALUE: _mul_partials:x_b <- $d1
                                        ; kill: def $d0 killed $d0 def $q0
        ;DEBUG_VALUE: _mul_partials:x_a <- $d0
        ;DEBUG_VALUE: _mul_partials:b <- [DW_OP_deref] [$x1+0]
        ;DEBUG_VALUE: _mul_partials:a <- [DW_OP_deref] [$x0+0]
        ldp     q2, q3, [x0]
        fmul.2d v4, v2, v0[0]
        fcmeq.2d        v2, v2, #0.0
        bic.16b v2, v4, v2
        ldp     q4, q5, [x1]
        fmul.2d v6, v4, v1[0]
        fcmeq.2d        v4, v4, #0.0
        bic.16b v4, v6, v4
        fadd.2d v2, v2, v4
        fmul.2d v0, v3, v0[0]
        fcmeq.2d        v3, v3, #0.0
        bic.16b v0, v0, v3
        fmul.2d v1, v5, v1[0]
        fcmeq.2d        v3, v5, #0.0
        bic.16b v1, v1, v3
        fadd.2d v0, v0, v1
        stp     q2, q0, [x8]
        ret
                                        ; -- End function
        .section        __DATA,__const
        .p2align        3, 0x0                          ; @"+ForwardDiff.Partials#15430"
"l_+ForwardDiff.Partials#15430":
        .quad   "l_+ForwardDiff.Partials#15430.jit"

.set "l_+ForwardDiff.Partials#15430.jit", 5213265424
.subsections_via_symbols
julia> versioninfo()
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 10 × Apple M2 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
Environment:
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_PKG_PRESERVE_TIERED_INSTALLED = true

@devmotion devmotion force-pushed the dw/safer_nansafe_mode branch from 78d9db0 to f33d535 Compare October 1, 2025 13:47
@devmotion devmotion marked this pull request as ready for review October 1, 2025 14:23
@KristofferC
Copy link
Collaborator

More correctness and better performance. What's not to like? :)

Base automatically changed from dw/test_nansafe_mode to master October 6, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NaN-safe mode is not NaN-safe enough Product of Inf terms leading to NaNs
2 participants