Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trap floating point exceptions #27705

Open
antoine-levitt opened this issue Jun 21, 2018 · 21 comments · May be fixed by #47930
Open

Trap floating point exceptions #27705

antoine-levitt opened this issue Jun 21, 2018 · 21 comments · May be fixed by #47930
Labels
domain:error handling Handling of exceptions by Julia or the user

Comments

@antoine-levitt
Copy link
Contributor

Some languages and compilers allow trapping of floating point exceptions, e.g. gfortran -ffpe-trap https://gcc.gnu.org/onlinedocs/gfortran/Debugging-Options.html

Is it possible to have a similar functionality in julia? That would be very useful to debug a NaN or Inf suddenly appearing in a program.

#6170 looks related

@c42f
Copy link
Member

c42f commented Aug 16, 2018

I'd say it would be relatively easy to get this working on linux as there's the feenableexcept() function which we can use to change the floating point exception mask. This should generate SIGFPE which we can turn into an exception using the same machinery as DivideError.

It will require minor changes to the runtime (see, eg, https://github.com/JuliaLang/julia/blob/master/src/signals-unix.c#L743 ) so that the SIGFPE is turned into something other than a DivideError.

As to the correct julia API for calling feenableexcept, perhaps we'd want a context manager style approach there

with_fpe(FE_OVERFLOW) do
   some_code_generating_nans()
end

See also #5234 for somewhat related discussion.

@c42f
Copy link
Member

c42f commented Aug 16, 2018

As it turns out, we can do the following (at least on linux x86_64 with julia >= 0.6) without changing the runtime:

# Bits for x86 FPU control word
const FE_INVALID    = 0x1
const FE_DIVBYZERO  = 0x4
const FE_OVERFLOW   = 0x8
const FE_UNDERFLOW  = 0x10
const FE_INEXACT    = 0x20

fpexceptions() = ccall(:fegetexcept, Cint, ())

function setfpexceptions(f, mode)
    prev = ccall(:feenableexcept, Cint, (Cint,), mode)
    try
        f()
    finally
        ccall(:fedisableexcept, Cint, (Cint,), mode & ~prev)
    end
end

[edit: fixed some brokenness]

Thence,

julia> x = 0.0
0.0

julia> 1.0/x
Inf

julia> setfpexceptions(FE_DIVBYZERO) do
           1.0/x 
       end
ERROR: DivideError: integer division error
Stacktrace:
 [1] /(::Float64, ::Float64) at ./float.jl:0
 [2] setfpexceptions(::##1#2, ::UInt8) at /home/tcfoster/sigfpe.jl:13

Unfortunately the system throws an integer division error, but at least you get a backtrace.

@StefanKarpinski
Copy link
Sponsor Member

Seems like a good thing to have official support for and throw the right exception.

@c42f
Copy link
Member

c42f commented Aug 18, 2018

Yep. I wonder how these are best mapped to exceptions. The IEEE 754 standard defines five standard exceptions. We could just map these to our existing exceptions where possible:

  • Invalid operation -> DomainError? (normally gives qNaN instead)
  • Division by zero -> DivideError (currently we document that this is for integer division by zero. But I'm not sure there's a reason to distinguish 1/0 from 1.0/0.0?)
  • Overflow -> OverflowError
  • Underflow -> A new UnderflowError?
  • Inexact -> InexactError? Difficult to make useful, as any inexact floating point operations in the runtime will cause a trap if this is enabled which seems to cause LLVM to go boom. Probably among other things.

Alternatively we could just define a new FloatingPointError(reason) and map them all to that? That might be simpler and more useful as this is more likely to be a debugging tool than anything else.

@antoine-levitt
Copy link
Contributor Author

Alternatively we could just define a new FloatingPointError(reason) and map them all to that? That might be simpler and more useful as this is more likely to be a debugging tool than anything else.

That seems like the best option since, as you say, trapping FPE is basically a debugging tool, and it would be annoying to have them caught by code that is not expecting them. Maybe an abstract HardwareFPException with more specific exceptions inheriting from it?

@c42f
Copy link
Member

c42f commented Aug 24, 2018

HardwareFPException with more specific exceptions inheriting from it

That would also work and is easy to implement. On balance I'm inclined to have a single type for simplicity. Given it's probably a debugging tool, and that we don't catch exceptions by type in any case.

@c42f
Copy link
Member

c42f commented Oct 24, 2018

I just discovered significant prior discussion related to these issues, particularly at

#2976
#5234 (comment)

@simonbyrne are you still interested in thinking about floating point exceptions? This issue is slightly different from the previous ones, in that it asks whether we should have a way to turn SIG_FPEs into julia exceptions immediately via the signal handler. That should be fairly easy, but I'm not completely sure about the correct API. Currently I think it should be a debugging tool only, perhaps emitting a single FloatingPointException type with internal error code.

I do think the prior discussion (eg, #5234 (comment)) shows that using dynamically scoped FPE masks leads to inherently non-composable code, and should not be used for "real work". This is also my experience in trying to turn on Inexact -> InexactError, which breaks the assumptions of pretty much every piece of code which ever dreams of using a floating point number. In my opinion only a statically scoped solution (applying to floating point operations strictly within the current function) would allow floating point exception flags to be used in a composable way for real production use. But I think that would be a separate issue, and much more difficult to implement.

@simonbyrne
Copy link
Contributor

The other issue is that LLVM isn't aware exceptions, so may reorder operations or propagate constants in a way so that exceptions aren't triggered. The situation has changed somewhat with the addition of LLVM constrained intrinsics, but we need to figure out how to integrate them.

My current thinking is that floating point exceptions and rounding should be done using Cassette.jl, as this would let you overload the necessary intrinsics and allow users to add custom hooks.

@c42f
Copy link
Member

c42f commented Oct 25, 2018

That's interesting, thanks. I figured having a solid general solution for FPEs would require some fairly deep integration with the compiler.

Would that subsume the feature request in this issue (ie, the ability to do simple fail-fast SIGFPE trapping for debugging)? To me these seem like they might be somewhat orthogonal features.

@simonbyrne
Copy link
Contributor

My comment was specifically referring to your concerns about the dynamic scoping, but you're right they are somewhat orthogonal.

I actually did try this out on a branch 4 years ago, and I was surprised how well it worked given my scant knowledge of C, but there were a few issues that would need to be figured out.

@c42f
Copy link
Member

c42f commented Oct 25, 2018

Hah, I had an extremely similar branch with the following relevant commit: adeaa4b

Enabling and disabling the FPE processor flags seemed pretty ugly and system dependent when I looked into it.

@c42f
Copy link
Member

c42f commented Oct 27, 2018

So, if we were going to implement a version of this for debugging purposes, how about the following concrete and minimal proposal:

  • Add a single concrete type FloatingPointException(code)
  • Add a function setfpe(code1 | code2 | ...) which enables floating point exceptions with the given bitmask codes, and returns the previously set fpe bits.
  • Do not supply dynamic scoping (ie, avoid a with_fpe style interface), so as to not hide the global fpe state that this manipulates, nor pretend that it is composable.

@antoine-levitt
Copy link
Contributor Author

The interface is a bit low-level; adding exception types for every exception would allow for an interface like setfpe(FloatOverFlowError, FloatUnderFlowError) which feels cleaner. But then you wouldn't be able to do setfpe(setfpe() | code), and this is pretty low-level anyway, so your proposal looks good! It would also be useful to add a FPE_ALL_BUT_INEXACT code, which is the one likely to be used in practice.

@c42f
Copy link
Member

c42f commented Oct 27, 2018

Yes, I'm not sure about the bitmasks. But I think you want setfpe to be able to return the current flags in some form so that if really necessary you can simulate dynamic scoping with

old_fpes = setfpe(new_fpes)
some_code_to_be_debugged()
setfpe(old_fpes)

and this seems like the simplest way to achieve it with the least number of new functions and types. I guess setfpe could also in principle take a Function so that it works with the do syntax

setfpe(new_fpes) do
    some_code_to_be_debugged()
end

though I'm not sure we should encourage that!

@simonbyrne
Copy link
Contributor

Given all the trouble we had with setrounding (#27166), I would suggest we put the minimal necessary internal changes in Julia (basically, figuring out which error is triggered), and everything else in a package.

@c42f
Copy link
Member

c42f commented Oct 28, 2018

Ok, so the single exception type and support for recognizing SIGFPE is the minimal possible change, though testing this properly will also require setfpe (or equivalent) so I think that's also required. I won't do the version that takes the Function... that's just asking for a recurrence of the issues which led to setrounding's demise :-)

@johnomotani
Copy link

This issue has been quiet for a long time - but this would be a very useful debugging tool! Just arrived here searching for the ability to use SIGFPE...

@StefanKarpinski
Copy link
Sponsor Member

One thing that may make setfpe more tractable than setrounding is that it seems naively more reasonable to ask for FPE globally, whereas with rounding, you'll always want to switch back to other rounding modes in a dynamically scoped way in order to compute things like transcendental functions.

@mvsoom
Copy link

mvsoom commented Oct 9, 2022

This would be a killer feature. Especially in the age of machine learning.

@brenhinkeller brenhinkeller added the domain:error handling Handling of exceptions by Julia or the user label Nov 21, 2022
@simonbyrne simonbyrne linked a pull request Dec 19, 2022 that will close this issue
@chriselrod
Copy link
Contributor

As it turns out, we can do the following (at least on linux x86_64 with julia >= 0.6) without changing the runtime:

This is really cool.

I'm imagining defining a debug mode where all Array{T}(undef, sz...) where T<:Union{Float32,Float64}s and other operations fill with signaling NaNs by default, to capture accidental uses of uninitialized memory.
I'd also extend it to arrays of aggregates in the obvious way (duals with values are partials all being sNaNs).

sNaN demo, first a normal qNaN and then the sNaN:

julia> x = NaN
NaN

julia> setfpexceptions(FE_INVALID) do
           2.0*x
       end
NaN

julia> x = reinterpret(Float64,8189<<50)
NaN

julia> setfpexceptions(FE_INVALID) do
           2.0*x
       end
ERROR: DivideError: integer division error
Stacktrace:
 [1] *(x::Float64, y::Float64)
   @ Base ./float.jl:410
 [2] (::var"#19#20")()
   @ Main ./REPL[43]:2
 [3] setfpexceptions(f::var"#19#20", mode::UInt8)
   @ Main ./REPL[18]:4
 [4] top-level scope
   @ REPL[43]:1

Another fun use case is to use Float64 for your exact integer math, while supporting SIMD and efficiently checking for "overflow".

julia> x = Float64.(1:256);

julia> function mysum(x) # simd
           s = zero(eltype(x))
           for i  eachindex(x)
               @fastmath s += x[i]
           end
           s
       end
mysum (generic function with 1 method)

julia> setfpexceptions(FE_INEXACT) do
           mysum(x)
       end
32896.0

julia> x[5] = 1e18 # too big for exact
1.0e18

julia> x[101] = -1e18 # cancels
-1.0e18

julia> mysum(x) # intermediate rounding inside SIMD code
32812.0

julia> Float64(mysum(big.(x)))
32790.0

julia> setfpexceptions(FE_INEXACT) do
           mysum(x)
       end
ERROR: DivideError: integer division error
Stacktrace:
 [1] add_fast
   @ ./fastmath.jl:172 [inlined]
 [2] mysum(x::Vector{Float64})
   @ Main ./REPL[59]:4
 [3] (::var"#35#36")()
   @ Main ./REPL[69]:2
 [4] setfpexceptions(f::var"#35#36", mode::UInt8)
   @ Main ./REPL[18]:4
 [5] top-level scope
   @ REPL[69]:1

In theory, you could try/catch.
If it fails, you could try again with BigInt.
Of if you're doing something like running a performance optimization pass, you could simply bail out without transforming anything to save on compile time.

@chriselrod
Copy link
Contributor

Unfortunately, this doesn't seem to work on my M1/ARM Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:error handling Handling of exceptions by Julia or the user
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants