Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN vs wild (or, what's a DomainError, really?) #5234

Open
jiahao opened this issue Dec 25, 2013 · 37 comments
Open

NaN vs wild (or, what's a DomainError, really?) #5234

jiahao opened this issue Dec 25, 2013 · 37 comments
Labels
design Design of APIs or of the language itself domain:complex Complex numbers domain:error handling Handling of exceptions by Julia or the user

Comments

@jiahao
Copy link
Member

jiahao commented Dec 25, 2013

(Context: this issue has come up recently in #4967 and elsewhere)

We currently don't treat the results of indeterminate computations consistently between real and complex arithmetic.

Example 1: cosine

julia> cos(Inf)
ERROR: DomainError
 in cos at math.jl:277

julia> cos(complex(Inf,0))
NaN - 0.0im

Example 2: inverse

inverse of 0

julia> inv(0.0)
Inf

julia> inv(complex(0.0, 0.0))
NaN + NaN*im

This illustrates a problem with infinity and signed zero. Unlike real infinities, of which there are only two FloatingPoint values, you can have 13 possible representations of complex infinities which convey different information about the phase (argument) of the complex infinity:

complex(+Inf, +0.0) #phase is exactly 0 or approaches 0 from the first quadrant
complex(+Inf, +Inf) #phase is in (0, pi/2), i.e. the first quadrant excluding the edges
complex(+0.0, +Inf) #phase is exactly pi/2 or approaches pi/2 from the first quadrant
complex(-0.0, +Inf) #phase approaches pi/2 from the second quadrant
complex(-Inf, +Inf) #phase is in (pi/2, pi), i.e. the second quadrant excluding the edges
complex(-Inf, +0.0) #phase is exactly pi or approaches pi from the second quadrant
complex(-Inf, -0.0) #phase approaches pi from the third quadrant
complex(-Inf, -Inf) #phase is in (pi, 3pi/2), i.e. the third quadrant excluding the edges
complex(-0.0, -Inf) #phase approaches 3pi/2 from the third quadrant
complex(+0.0, -Inf) #phase is exactly 3pi/2 or approaches 3pi/2 from the fourth quadrant
complex(+Inf, -Inf) #phase is in (3pi/2, 2pi), i.e. the fourth quadrant excluding the edges
complex(+Inf, -0.0) #phase approaches 2pi from the fourth quadrant
complex(NaN, NaN) #phase cannot be determined to lie in exactly one of the above regions
#                  and hence the infinity has no valid non-NaN representation in floating point

The result is correct if we work with unsigned zeros. However, after accounting for the signed zeros in complex(+0.0, +0.0), this result should be complex(+Inf, +Inf). A DivideError might also be a reasonable alternative here.

inverse of infinities

The inverse mapping is also problematic:

julia> inv(Inf)
0.0

julia> inv(complex(Inf,0))
0.0 - 0.0im

julia> inv(complex(Inf,Inf))
NaN + NaN*im

inverse of indeterminates

julia> inv(NaN)
NaN

julia> inv(complex(0,NaN))
NaN + NaN*im

julia> inv(complex(NaN, 0))
NaN + NaN*im

julia> inv(complex(NaN, NaN))
NaN + NaN*im

Is it meaningful to distinguish between these three possible complex NaNs? It seems silly in this example, but consider also:

julia> complex(0, NaN) + complex(0, NaN)
complex(0.0,NaN)

julia> complex(0, NaN) * complex(0, NaN)
NaN + NaN*im

etc.

Example 3: roots (nonintegral powers)

julia> sqrt(-1)
ERROR: DomainError

julia> sqrt(complex(-1))
0.0 + 1.0im

julia> (-1)^(-1/2)
NaN

julia> complex(-1)^(-1/2)
6.123233995736766e-17 - 1.0im

Whereas sqrt(-1) returns the notorious DomainError, the inverse square root (as computed by x->x^-1/2 does not, but returns a NaN instead. This use of NaN is sanctioned by IEEE 754 in the specific case of a real operation with no real output.

tl;dr: if a floating-point computation returns an indeterminate value, when should it return a NaN (or any of its complex variants), and when should it throw an error like DivideError or DomainError? By my reading of IEEE 754, both of these behaviors are allowed (throwing an error would correspond to a signaling NaN which is trapped by the error handler). Each of these examples in isolation show valid behavior; however, we should be consistent.

@StefanKarpinski
Copy link
Sponsor Member

I think we should throw an error whenever the condition was explicitly checked for anyway.

@jiahao
Copy link
Member Author

jiahao commented Jan 10, 2014

To summarize the prevailing thinking with @StefanKarpinski @loladiro @JeffBezanson:

Operations whose results would produce NaNs should throw Exceptions if the cost of testing a NaN would not dominate the cost of the operation.

By this rule of thumb, elementary arithmetic on real floating-point numbers would be allowed to return NaNs if necessary, but more complicated operations like complex division should throw a DivideError.

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 10, 2014

I don't understand. What operations should be allowed to return NaN, (unless the argument is NaN)? Only /(a::FloatingPoint, b::FloatingPoint)?

@jiahao
Copy link
Member Author

jiahao commented Jan 10, 2014

I think we only have a clear idea that real elementary arithmetic (+, -, x, /) should be allowed to produce and propagate NaNs, since the cost of testing for a NaN or inputs that would produce NaNs would be comparable (or more expensive) than the actual operation itself, but not much else.

@ivarne
Copy link
Sponsor Member

ivarne commented Jan 10, 2014

So all other functions (also those who does not return NaN for any real input) must check for NaN on input and throw DomainError?

@JeffBezanson
Copy link
Sponsor Member

No, NaN input can just propagate. We mostly don't want to produce NaNs from non-NaNs.

@StefanKarpinski
Copy link
Sponsor Member

The poster child is complex division: rather than generating a complex NaN when dividing by zero, just raise an error. Since complex division is so, um, complex, checking for a zero divisor is not exactly going to affect it's performance in any meaningful way. Raising and error is simpler and alerts the user to the problem sooner.

@nalimilan
Copy link
Member

nalimilan commented Jan 10, 2014

Why not, but if that's the only operation which would be affected, is it worth introducing an inconsistency? It would be good to have a simple rule for which functions return NaN, and which throw an error (I mean, an user-understandable rule, not something based on computing costs).

@JeffBezanson
Copy link
Sponsor Member

My guess at the moment is that the rule is: only real floating-point +, -, *, / can produce NaNs from non-NaNs.

@JeffBezanson
Copy link
Sponsor Member

I would also add complex floating-point +, -, *, / to that list. The 1/complex(0.0,0.0) case is a divide-by-zero error, so you can still get NaNs from complex / in other ways.

@simonbyrne
Copy link
Contributor

As outlined in this discussion, a more systematic solution would be to take advantage of floating-point exception masking. This would allow users to turn exceptions on and off as required (e.g. have exceptions on for development/debugging, and off for production code), reduce the overhead (by fewer range checks) and generally make functions more vectorisation friendly.

@elextr
Copy link

elextr commented Nov 20, 2014

As noted in the discussion linked above, it becomes necessary to turn off hardware floating point exceptions in some cases:

  1. calling libraries that take the valid design decision to allow NaNs to appear and then propagate and to test for them at the end
  2. calling libraries that are not expecting exceptions
  3. calling libraries that are written in non-exception aware languages (C, Fortran etc)
  4. preventing exceptions escaping Julia code that is called from languages that are not exception aware

So it becomes rather difficult to decide when to allow floating point (and by extension complex) exceptions and when to disallow them, and the one piece of Julia code may be used in many situations.

@simonbyrne
Copy link
Contributor

How about the following proposal:

  • We provide functions for masking/unmasking different floating point exceptions.
  • In a Julia session, the invalid exception is unmasked (i.e. 0.0/0.0 throws InvalidException) by default, but all others are masked (1.0/0.0 returns an Inf). We provide a command line argument (--mask-invalid?) to disable this behaviour.
  • Julia code embedded in other languages, or called via cfunction should use the floating point masks of the caller.
  • We can provide wrappers (ccallmask?) to enable masking for the duration of the ccall.
  • Functions and libraries should not change floating point masks without resetting them to their original setting.
  • Functions which return floating point values should be made to trigger appropriate behaviour, especially for invalid values (see point 2 below). The IEEE standard (chapter 9) provides guidance for basic math functions, which openlibm seems to follow. DomainErrors can be retained for integer-valued functions.

I feel that this should address the above concerns of @elextr, as well as address the needs of @eschnett and @mlubin mentioned on the mailing list discussion, and generally make us much closer to getting the proverbial Kahan tick of approval.

Possible issues:

  1. Users will not be able to reliably catch InvalidException (unless, of course, they explicitly disable masking beforehand).
  2. There doesn't appear to be a set way to trigger an InvalidException. Just using 0.0/0.0 doesn't work, as LLVM changes it to a constant NaN. The openlibm trick of using return (x-x)/0.0, where x is the argument to the function, seems to work though: this should always raise an invalid exception unless x is already a NaN (which is the usual desired behaviour).
  3. LLVM will sometimes rearrange operations involving MXCSR access incorrectly (see LLVM bug #6393). This shouldn't be a big issue for the above, as most changes will occur either globally (invoked by the user in the REPL), or around ccalls (for which LLVM should keep the correct order). But it does mean that you won't be able to enable it reliably for short snippets of Julia code.
    • Given the lack of activity on this issue, it doesn't seem like something LLVM are rushing to address. Perhaps this could be a good GSOC project for a student interested in compilers?

@elextr
Copy link

elextr commented Nov 20, 2014

@simonbyrne looks a good basic proposal but for the notes below.

In a Julia session, the invalid exception is unmasked (i.e. 0.0/0.0 throws InvalidException) by default, but all others are masked (1.0/0.0 returns an Inf). We provide a command line argument (--mask-invalid?) to disable this behaviour.

Thats ok, but it forces all ccalls to mask it off. So a default of all (or most?) exceptions on is not going to be any worse and that will provide the benefits to pure Julia code. Otherwise all off.

We can provide wrappers (ccallmask?) to enable masking for the duration of the ccall.

If at least one exception is going to be enabled by default (as above) then it is probably better that the default ccall turns them all off and restores the mask and ccallmask is used for the exceptional circumstances where it is known safe to leave some on. Turning all exceptions off is going to be needed for nearly all ccalls and doing it manually is a subtlety that shouldn't be required of newcomers to Julia. Note this will also need to be applied to all ccalls inside Julia, and its libraries and packages where it can't be proven that the code ccalled is exception safe or cannot generate exceptions.

Julia code embedded in other languages, or called via cfunction should use the floating point masks of the caller.

Functions and libraries should not change floating point masks without resetting them to their original setting.

I would just add "and must catch all exceptions" to both of these.

@simonbyrne
Copy link
Contributor

I would disagree that ccall should mask by default, as it does add to the overhead of calling external functions.

@elextr
Copy link

elextr commented Nov 20, 2014

Thinking about it some more (and at this point I am talking about Julia code only, the issues with foreign code remain as above) if the FPE masks can be changed:

  • code that is written expecting exceptions but which is called with exceptions off might still work if any NaNs just propogate. But it may also fail if it finds an unexpected NaN.
  • code that is written expecting NaNs to propagate, and then tests for them, will fail if called with exceptions on, since it will never get to the test if a NaN occurs.

So to avoid failures all Julia FP code is going to have to do one of:

  • every function will have to set and restore the FPE mask to match its expectations, or
  • callers must always know what the expectations of functions they call are and set and restore the masks if the called function needs something different to the current setting, or
  • callers must know which functions expect the same FP mask as the caller is using and only call those functions.

None of these is particularly attractive, the first is very noisy and the other two are exceptionally error prone. At this point I admit I don't have a "nice" solution.

@StefanKarpinski
Copy link
Sponsor Member

I fear that this would work about as well as dynamically scoped rounding modes – i.e. not well at all :-\

@simonbyrne
Copy link
Contributor

@elextr I don't think this should be an issue: the way I see it, the whole point of raising floating point exceptions is so that you can get rid of whatever is causing them, you don't actually want to try/catch them (which is why I don't think my issue 1 above is such a problem). So a function should not expect an exception to be raised (because the function will simply stop anyway if one is raised), and should always accept NaN as an argument.

Basically, masks should only be changed for two reasons:

  • by the user in the REPL
  • around a ccall for "exception-unaware" numerical libraries

I would go so far as to say that packages which modify masks in any other way shouldn't be accepted in METADATA unless they can provide a good justification for why they need to do it.

@simonbyrne
Copy link
Contributor

@StefanKarpinski I guess the flip side is that I feel that DomainError is basically trying to be an InvalidException, except that it's slow, inconsistent and inflexible.

@jiahao jiahao added the domain:error handling Handling of exceptions by Julia or the user label Nov 20, 2014
@eschnett
Copy link
Contributor

@simonbyrne It is often very difficult to ensure ahead of time that no nans appear in a calculation. Thus in my code, I often rather accept that the result is a nan, and fix up the problem later. These calculations happen in a tight inner loop, and handling exceptions in this loop is a non-starter, as it would prevent vectorization.

I appreciate that many people don't like having to deal with nans explicitly, because in most cases, nans result from errors e.g. in the input data. However, I quite like this part of the IEEE standard, and would like a way to use nans in Julia.

I'm not saying this should be the default, or that all library functions should be nan-safe in this way -- but there needs to be a way to use nans for efficiency in an HPC environment. I'd be happy to use a macro similar to @inbounds for this. I'd also be happy to re-write Julia's standard math functions (sqrt, exp, log) myself. However, there needs to be a reasonable way for experts to do so.

@mlubin
Copy link
Member

mlubin commented Nov 20, 2014

Maybe the solution is to introduce alternate versions of the built-in math functions which return NaN instead of throwing an exception?

@simonbyrne
Copy link
Contributor

@mlubin The problem is then the functions that call the built-in ones, and the ones that call those...

I do think the IEEE exception functionality can be made to work in a sensible fashion, and that it provides our best hope for ensuring consistency and interoperability with other programs.

@mlubin
Copy link
Member

mlubin commented Nov 20, 2014

@simonbyrne, any library/package functions that call log should be expecting it to throw an exception, changing that behavior with a flag seems crazy.

@mlubin
Copy link
Member

mlubin commented Nov 21, 2014

What I belatedly came to realise, is that being consistent throughout the program becomes impossible once you use external libraries that may be written to the opposite strategy to yours.

Agreed here. The logical alternative is to provide versions of the basic functions that return NaNs. This at least allows everyone to be consistent according to the situation. I'm guessing that typically one would want a DomainError to be thrown for user friendliness, but in specialized cases, e.g., vectorization, interacting with C code, etc., switching log to log_nan (horrible name) makes more sense.

@staticfloat
Copy link
Sponsor Member

log�()? ;)

@elextr
Copy link

elextr commented Nov 21, 2014

@mlubin that requires that the code of every library that you want to use (including the C ones) is changed to use log_nan since thats what the current behaviour of log is.

It would be better if the new behaviour has the new name that generates exceptions, eg ex_log as an example of an equally bad name :)

@mlubin
Copy link
Member

mlubin commented Nov 22, 2014

Following the open-source philosophy, I've created a package which solves my problem: https://github.com/mlubin/NaNMath.jl.

@simonbyrne
Copy link
Contributor

I had a bit of time on Friday to try out my idea:
https://github.com/simonbyrne/julia/tree/fpe
when combined with my mxcsr script gives some interesting results.

On Linux it seems to work fairly well, the only problem being that after throwing one error it seems re-enables the mask. Hopefully there is a sensible way to fix this. One idea would be to use the "official" feenableexcept interface, but I haven't had a chance to look at it yet.

On OS X, my experience was more frustrating: it does throw exceptions at the correct times, but it doesn't seem to return the correct exception code that indicates the type of exception (Invalid, Inexact, DivByZero, etc). Unfortunately, it seems that OS X doesn't officially support throwing floating point exceptions at all (e.g. see here), so we might have a hard time getting Apple to fix it.

If anyone has used this sort of stuff in C or C++, I'd appreciate any help or suggestions that I could try.

@eschnett
Copy link
Contributor

@simonbyrne This example http://www-personal.umich.edu/~williams/archive/computation/fe-handling-example.c can be found by chasing the link you provided. It seems to provide an implementation of the necessary functionality for OS X, both for PPC (!) and Intel Macs.

The respective function calls translate to a few machine instructions, so we don't need any Apple support for this.

@simonbyrne
Copy link
Contributor

Thanks @eschnett. It looks pretty similar to what I was doing, but I'll try it out. I seem to recall that the problem was the signals returned by siginfo were completely arbitrary.

@StefanKarpinski StefanKarpinski added design Design of APIs or of the language itself and removed needs decision A decision on this change is needed labels Sep 13, 2016
@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@StefanKarpinski
Copy link
Sponsor Member

Bump.

@oscardssmith
Copy link
Member

What would happen if functions had a nan mode in them, so sqrt would be sqrt(x,nan=false) with a default value? To me this seems to let libraries do what they want, let users do what they want, and not mess with code generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself domain:complex Complex numbers domain:error handling Handling of exceptions by Julia or the user
Projects
None yet
Development

No branches or pull requests