New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should min(0.0, NaN) be NaN? #7866

Closed
StefanKarpinski opened this Issue Aug 6, 2014 · 51 comments

Comments

Projects
None yet
@StefanKarpinski
Member

StefanKarpinski commented Aug 6, 2014

I know this has come up before and we opted to go with Matlab's behavior, but it strikes me that NaN poisoning really might be safer. After all a NaN could be larger or smaller than any value.

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Aug 6, 2014

Member

+1 for poisoning. The same should apply to minimum too then.

Member

ViralBShah commented Aug 6, 2014

+1 for poisoning. The same should apply to minimum too then.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 6, 2014

One subtlety is which NaN should you pick when Both arguments are NaN? I think in that case one valid choice is to use the IEEE specified total order.

cartazio commented Aug 6, 2014

One subtlety is which NaN should you pick when Both arguments are NaN? I think in that case one valid choice is to use the IEEE specified total order.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 6, 2014

Member

x86 floating-point instructions all use the leftmost NaN argument, which seems like a reasonable thing to do. Adding more work to pick an ordering among NaNs seems unnecessary – I can't imagine any situation where which of two NaNs one picks is significant.

Member

StefanKarpinski commented Aug 6, 2014

x86 floating-point instructions all use the leftmost NaN argument, which seems like a reasonable thing to do. Adding more work to pick an ordering among NaNs seems unnecessary – I can't imagine any situation where which of two NaNs one picks is significant.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 6, 2014

Member

We went with standards here, e.g. fmin and, I believe, IEEE. This is not just taken from matlab.

Member

JeffBezanson commented Aug 6, 2014

We went with standards here, e.g. fmin and, I believe, IEEE. This is not just taken from matlab.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 6, 2014

IEEE does no specify min or max, just >, ==, <, <=,>= and friends.

 #include <stdlib.h>
#include <stdio.h>
#include <math.h>

void main(){
  double nann = 0.0/0.0 ;
  printf("%f\n%f\n%f",fmax(nann,1.0),fmax(nann,1.0),fmax(1.20,3.0) );
}

prints

~/D/r/intFloatRegisterFoo $ ./a.out
1.000000
1.000000
3.000000⏎

additionally, according to the c language reference:

  Returns the larger of two floating point arguments, treating NaNs as missing data (between a NaN and a    
  numeric value, the numeric value is chosen)

that seems redundant in the context of language support for explicitly coding missing data.

cartazio commented Aug 6, 2014

IEEE does no specify min or max, just >, ==, <, <=,>= and friends.

 #include <stdlib.h>
#include <stdio.h>
#include <math.h>

void main(){
  double nann = 0.0/0.0 ;
  printf("%f\n%f\n%f",fmax(nann,1.0),fmax(nann,1.0),fmax(1.20,3.0) );
}

prints

~/D/r/intFloatRegisterFoo $ ./a.out
1.000000
1.000000
3.000000⏎

additionally, according to the c language reference:

  Returns the larger of two floating point arguments, treating NaNs as missing data (between a NaN and a    
  numeric value, the numeric value is chosen)

that seems redundant in the context of language support for explicitly coding missing data.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 6, 2014

Member

Section 5.3 of the IEEE fp standard specifies minNum and maxNum which behave this way.

Member

JeffBezanson commented Aug 6, 2014

Section 5.3 of the IEEE fp standard specifies minNum and maxNum which behave this way.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 6, 2014

you are correct sir! (though i must admit it bothers me, but it makes sense in the context of C)

section 5.3

― sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
sourceFormat minNumMag(source, source)
sourceFormat maxNumMag(source, source)

minNum(x, y) is the canonicalized number x if x<y, y if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.
maxNum(x, y) is the canonicalized number y if x<y, x if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.

section 6.2 Operations with NaNs 6.2.0

Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations. Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
Signaling NaNs shall be reserved operands that, under default exception handling, signal the invalid operation exception (see 7.2) for every general-computational and signaling-computational operation except for the conversions described in 5.12. For non-default treatment, see 8.

cartazio commented Aug 6, 2014

you are correct sir! (though i must admit it bothers me, but it makes sense in the context of C)

section 5.3

― sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
sourceFormat minNumMag(source, source)
sourceFormat maxNumMag(source, source)

minNum(x, y) is the canonicalized number x if x<y, y if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.
maxNum(x, y) is the canonicalized number y if x<y, x if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.

section 6.2 Operations with NaNs 6.2.0

Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations. Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
Signaling NaNs shall be reserved operands that, under default exception handling, signal the invalid operation exception (see 7.2) for every general-computational and signaling-computational operation except for the conversions described in 5.12. For non-default treatment, see 8.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 7, 2014

Member

Treating NaN as missing data seems incorrect and dangerous, despite the C standard.

Member

StefanKarpinski commented Aug 7, 2014

Treating NaN as missing data seems incorrect and dangerous, despite the C standard.

@staticfloat

This comment has been minimized.

Show comment
Hide comment
@staticfloat

staticfloat Aug 7, 2014

Member

I'd like to add a +1 for poisoning. I think if you're wanting to treat something as missing data, you have much better options than using NaN.

Member

staticfloat commented Aug 7, 2014

I'd like to add a +1 for poisoning. I think if you're wanting to treat something as missing data, you have much better options than using NaN.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 7, 2014

i'd favor poisoning too, I'm actually prepping a proposal to change how GHC haskell min and max work on floats to have the poisoning semantics (ie nan propagating).

cartazio commented Aug 7, 2014

i'd favor poisoning too, I'm actually prepping a proposal to change how GHC haskell min and max work on floats to have the poisoning semantics (ie nan propagating).

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 7, 2014

Member

I agree that not propagating NaNs here doesn't make sense. There's nothing special about min that grants it a definite value on undefined input. It's just that following IEEE is a good default decision.

One can have a vigorous debate about 1^NaN, but for min I'm not sure what the argument would be. Although this does raise the question of min(-Inf,NaN) :)

Member

JeffBezanson commented Aug 7, 2014

I agree that not propagating NaNs here doesn't make sense. There's nothing special about min that grants it a definite value on undefined input. It's just that following IEEE is a good default decision.

One can have a vigorous debate about 1^NaN, but for min I'm not sure what the argument would be. Although this does raise the question of min(-Inf,NaN) :)

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 7, 2014

Member

I would just keep it simple and say that if any argument is NaN, the result is NaN.

Member

StefanKarpinski commented Aug 7, 2014

I would just keep it simple and say that if any argument is NaN, the result is NaN.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Aug 8, 2014

Contributor

How about providing minnum and maxnum for programmers who want the IEEE semantics (or want the associated hardware efficiencies), and do NAN poisoning for min and max? Numpy calls the IEEE version nanmin.

I became curious about the rationale for the IEEE rules. William Kahan ("father of IEEE 754") seems to be the origin of the rules according to committee minutes:

Kahan proposed that it must be commutative & the default shall be to pass numbers over NaNs & he doesn't care about the sign of zero.

Kahan is an expert and so presumably had some motivation (possibly for experts), but I haven't been able to find it. I've sent him an email query.

Contributor

ArchRobison commented Aug 8, 2014

How about providing minnum and maxnum for programmers who want the IEEE semantics (or want the associated hardware efficiencies), and do NAN poisoning for min and max? Numpy calls the IEEE version nanmin.

I became curious about the rationale for the IEEE rules. William Kahan ("father of IEEE 754") seems to be the origin of the rules according to committee minutes:

Kahan proposed that it must be commutative & the default shall be to pass numbers over NaNs & he doesn't care about the sign of zero.

Kahan is an expert and so presumably had some motivation (possibly for experts), but I haven't been able to find it. I've sent him an email query.

@simonster

This comment has been minimized.

Show comment
Hide comment
@simonster

simonster Aug 9, 2014

Member

It seems that the minsd instruction is ifelse(x < y, x, y), which is neither the NaN poisioning nor IEEE definition.

Member

simonster commented Aug 9, 2014

It seems that the minsd instruction is ifelse(x < y, x, y), which is neither the NaN poisioning nor IEEE definition.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Aug 11, 2014

Contributor

I erred when I wrote "associated hardware efficiencies". I had misinterpreted the description of minsd. The semantics of minsd were designed back in the '90s, well before IEEE 754-2008, to that compilers could optimize the common C idiom ```x<y?x:y".

As for the semantics that we seem to favor, the Intel manual coyly states:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

I'm puzzling over the emulation that the author had in mind. I don't see how a single comparison suffices, since if one of the operands is a NaN, the information from the comparison does not distinguish which operand is the NaN.

Contributor

ArchRobison commented Aug 11, 2014

I erred when I wrote "associated hardware efficiencies". I had misinterpreted the description of minsd. The semantics of minsd were designed back in the '90s, well before IEEE 754-2008, to that compilers could optimize the common C idiom ```x<y?x:y".

As for the semantics that we seem to favor, the Intel manual coyly states:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

I'm puzzling over the emulation that the author had in mind. I don't see how a single comparison suffices, since if one of the operands is a NaN, the information from the comparison does not distinguish which operand is the NaN.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Aug 11, 2014

Contributor

FYI, I inquired with the experts here, and the recommended AVX sequence for NaN-propagating min is:

VMIN R, a, b           // result is b if a or b are NaN, min(a,b) otherwise
                       // so Nan is not propagated only if a is the NaN
VCMPNEQ M, a, a        // M=11…11  if a is NaN, 0 otherwise
VBLENDV Res, R, a, M   // Res = R if M=0 (a not NaN), otherwise Res=a (if a is NaN)

It's only one instruction shorter than the code currently generated from the NaN-propagating definition min{T<:FloatingPoint}(x::T, y::T) = ifelse((x < y) | (x != x), x, y). If we lose sleep over that extra instruction, we can submit a patch to LLVM. :-)

Contributor

ArchRobison commented Aug 11, 2014

FYI, I inquired with the experts here, and the recommended AVX sequence for NaN-propagating min is:

VMIN R, a, b           // result is b if a or b are NaN, min(a,b) otherwise
                       // so Nan is not propagated only if a is the NaN
VCMPNEQ M, a, a        // M=11…11  if a is NaN, 0 otherwise
VBLENDV Res, R, a, M   // Res = R if M=0 (a not NaN), otherwise Res=a (if a is NaN)

It's only one instruction shorter than the code currently generated from the NaN-propagating definition min{T<:FloatingPoint}(x::T, y::T) = ifelse((x < y) | (x != x), x, y). If we lose sleep over that extra instruction, we can submit a patch to LLVM. :-)

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 11, 2014

does that code sequence properly handle signedness of zero? Afaik, the VMIN operations don't distinguish -0 vs 0

does that code sequence properly handle signedness of zero? Afaik, the VMIN operations don't distinguish -0 vs 0

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Aug 11, 2014

Contributor

The sequence will compute min(-0,+0) as +0. Detecting signedness of zero is problematic for any comparison-based min routine, since IEEE 754-2008 says Comparisons shall ignore the sign of zero (so +0 = −0).

Contributor

ArchRobison commented Aug 11, 2014

The sequence will compute min(-0,+0) as +0. Detecting signedness of zero is problematic for any comparison-based min routine, since IEEE 754-2008 says Comparisons shall ignore the sign of zero (so +0 = −0).

@porterjamesj

This comment has been minimized.

Show comment
Hide comment
@porterjamesj

porterjamesj Aug 13, 2014

Member

As someone who has only an introductory systems course level understanding of IEEE 754, the one thing that sticks out in my mind about NaNs is that they tend to poison computations, so this change might be intuitive for the average user regardless of what the standard actually says :)

I like the idea of providing separate functions for those who do want the standard-specified behavior if we decide to do this.

Member

porterjamesj commented Aug 13, 2014

As someone who has only an introductory systems course level understanding of IEEE 754, the one thing that sticks out in my mind about NaNs is that they tend to poison computations, so this change might be intuitive for the average user regardless of what the standard actually says :)

I like the idea of providing separate functions for those who do want the standard-specified behavior if we decide to do this.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 13, 2014

one small detail that I think might play nice with future nan interactions, would be that, in the case that both arguments are nans, perhaps the result nan should be the bitwise OR of the two input nans (idea being that those 53 bits otherwise unused in NANs could represent different causes of the original nan error, and that would give a neat trick for saying "heres the set of ways your program borked the math!")

one small detail that I think might play nice with future nan interactions, would be that, in the case that both arguments are nans, perhaps the result nan should be the bitwise OR of the two input nans (idea being that those 53 bits otherwise unused in NANs could represent different causes of the original nan error, and that would give a neat trick for saying "heres the set of ways your program borked the math!")

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 18, 2014

this LLVM thread is relevant http://article.gmane.org/gmane.comp.compilers.llvm.cvs/201804
i'm going to ask them about the nan returning semantics option

this LLVM thread is relevant http://article.gmane.org/gmane.comp.compilers.llvm.cvs/201804
i'm going to ask them about the nan returning semantics option

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Aug 22, 2014

Contributor

William Kahan kindly replied to my enquiry about the IEEE definition of min. His note suggests a 4th possible definition of min that propagates NaN except when one of the arguments is -∞, in order to preserve the identity min(-∞,y)==-∞.

Here is his reply in full:

Arch:
The recommendation that  min{ x, NaN}  and  min{NaN, x}  both be  x  instead of  NaN 
arose at a time when graph plotting programs had nowhere to put a  NaN,  and could
abort instead.  In my experience the most common occasions when  min{x, NaN}  arose 
were in plotting graphs of functions whose domains had holes.  For instance,  let  Y(x) 
be the solutions of the equation   x^2 - y^2 = 1 ,  namely  Y(x) = */- sqrt( (x-1)(x+1) ) ;
and plot  Y(x)  vs.  x  over,  say,  -2 < x < 2 .  Then the result should be two branches of 
an hyperbola separated by a gap where  -1 < x < 1 .  To bridge the gap we perform a 
"Windowing"  operation by plotting  max{-2,  min{+2, Y(x)} }  in a square window  that
barely contains  -2 < x < 2   and  -2 < y < 2 ,  putting the  NaNs  on the window's upper
edge where they can be ignored.  This resolution of the problems posed by domains 
with holes is not perfect,  but better than the alternatives available at the time to cope 
with holes whose locations could.not easily be predicted before the attempt to plot.

There must be occasions where  minm(x, NaN}  and  maxm{x, NaN}  should be  NaN 
instead of  x  to serve a computational necessity.  Here I have used different names for 
the minimum and maximum functions;  you might disagree with my choice of names.

However,  maxm(+Infinity, NaN)  must be  +Infinity,  NOT NaN,  to honor the rule that 
if a function  f(x, y)  has the property for some value  X  that  f(X, y)  is independent of 
y ,  be it finite or infinite,  then that  f(X, NaN)  must be the same as  f(X, y) .  This may 
make  maxm  harder to implement than  max . The foregoing rule is crucially necessary. 
If there were no way to get rid of an irrelevant  NaN  then it might as well stop computation 
at its earliest encounter.

I hope that my explanation helps.  If you have a better rationale I will be glad to entertain
it.

With best wishes,
                                                                    Prof. W. Kahan
Contributor

ArchRobison commented Aug 22, 2014

William Kahan kindly replied to my enquiry about the IEEE definition of min. His note suggests a 4th possible definition of min that propagates NaN except when one of the arguments is -∞, in order to preserve the identity min(-∞,y)==-∞.

Here is his reply in full:

Arch:
The recommendation that  min{ x, NaN}  and  min{NaN, x}  both be  x  instead of  NaN 
arose at a time when graph plotting programs had nowhere to put a  NaN,  and could
abort instead.  In my experience the most common occasions when  min{x, NaN}  arose 
were in plotting graphs of functions whose domains had holes.  For instance,  let  Y(x) 
be the solutions of the equation   x^2 - y^2 = 1 ,  namely  Y(x) = */- sqrt( (x-1)(x+1) ) ;
and plot  Y(x)  vs.  x  over,  say,  -2 < x < 2 .  Then the result should be two branches of 
an hyperbola separated by a gap where  -1 < x < 1 .  To bridge the gap we perform a 
"Windowing"  operation by plotting  max{-2,  min{+2, Y(x)} }  in a square window  that
barely contains  -2 < x < 2   and  -2 < y < 2 ,  putting the  NaNs  on the window's upper
edge where they can be ignored.  This resolution of the problems posed by domains 
with holes is not perfect,  but better than the alternatives available at the time to cope 
with holes whose locations could.not easily be predicted before the attempt to plot.

There must be occasions where  minm(x, NaN}  and  maxm{x, NaN}  should be  NaN 
instead of  x  to serve a computational necessity.  Here I have used different names for 
the minimum and maximum functions;  you might disagree with my choice of names.

However,  maxm(+Infinity, NaN)  must be  +Infinity,  NOT NaN,  to honor the rule that 
if a function  f(x, y)  has the property for some value  X  that  f(X, y)  is independent of 
y ,  be it finite or infinite,  then that  f(X, NaN)  must be the same as  f(X, y) .  This may 
make  maxm  harder to implement than  max . The foregoing rule is crucially necessary. 
If there were no way to get rid of an irrelevant  NaN  then it might as well stop computation 
at its earliest encounter.

I hope that my explanation helps.  If you have a better rationale I will be glad to entertain
it.

With best wishes,
                                                                    Prof. W. Kahan
@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Aug 22, 2014

huh, this is a very interesting and good point about the "Laws" of min and max on the reals

huh, this is a very interesting and good point about the "Laws" of min and max on the reals

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 22, 2014

Member

That is a great email, as expected. I think propagating NaN except for +-Inf is a very good definition for min and max.

Honestly I find the plotting justification shockingly weak. There's no reason to assume plotting must be done by applying min and max to points, and nothing further. It would be far better to check for NaN or out-of-range values and simply omit them. If there is no better justification, I would hope this is changed in the next IEEE standard revision, if there is one. They could make it backwards compatible by adding new operations with this behavior.

Member

JeffBezanson commented Aug 22, 2014

That is a great email, as expected. I think propagating NaN except for +-Inf is a very good definition for min and max.

Honestly I find the plotting justification shockingly weak. There's no reason to assume plotting must be done by applying min and max to points, and nothing further. It would be far better to check for NaN or out-of-range values and simply omit them. If there is no better justification, I would hope this is changed in the next IEEE standard revision, if there is one. They could make it backwards compatible by adding new operations with this behavior.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 22, 2014

Member

We should get "I ❤️ Kahan" t-shirts made.

Member

StefanKarpinski commented Aug 22, 2014

We should get "I ❤️ Kahan" t-shirts made.

@sunfishcode

This comment has been minimized.

Show comment
Hide comment
@sunfishcode

sunfishcode Sep 18, 2014

"if a function f(x, y) has the property for some value X that f(X, y) is independent of y, be it finite or infinite, then that f(X, NaN) must be the same as f(X, y) ."

Does anyone know the justification for this rule? I'm not necessarily disputing it, I primarily want to understand it.

I have also asked Professor Kahan about this, and one thing he said was that this rule assumes that the environment has floating-point flags which record floating-point exceptions that have occurred, and which can be easily queried. We don't necessarily need the return value of an expression to record that a NaN happened if there's a flag which holds that information that we can test.

However, Julia, and every other programming language I'm aware of besides assembly, doesn't provide easy and reliable access to these flags (including C, even with fenv.h, because in practice compilers like clang and GCC don't implement #pragma STDC FENV_ACCESS). It's not a coincidence that modelling floating-point operations as operations that mutate global state is not something that easily fits into high-level languages.

Given that Julia doesn't expose the floating-point exception flags, it's not clear that this justification applies here.

"if a function f(x, y) has the property for some value X that f(X, y) is independent of y, be it finite or infinite, then that f(X, NaN) must be the same as f(X, y) ."

Does anyone know the justification for this rule? I'm not necessarily disputing it, I primarily want to understand it.

I have also asked Professor Kahan about this, and one thing he said was that this rule assumes that the environment has floating-point flags which record floating-point exceptions that have occurred, and which can be easily queried. We don't necessarily need the return value of an expression to record that a NaN happened if there's a flag which holds that information that we can test.

However, Julia, and every other programming language I'm aware of besides assembly, doesn't provide easy and reliable access to these flags (including C, even with fenv.h, because in practice compilers like clang and GCC don't implement #pragma STDC FENV_ACCESS). It's not a coincidence that modelling floating-point operations as operations that mutate global state is not something that easily fits into high-level languages.

Given that Julia doesn't expose the floating-point exception flags, it's not clear that this justification applies here.

@eschnett

This comment has been minimized.

Show comment
Hide comment
@eschnett

eschnett Sep 19, 2014

Contributor

This rule does not apply e.g. for adding 0.0. 0.0+y == y for all values of y, except if y is a nan. (I hope I'm not stumbling about a special case for -0.0 here.)

This actually inhibits compiler optimizations. With full IEEE compatibility, 0.0+y cannot be optimized to y.

Contributor

eschnett commented Sep 19, 2014

This rule does not apply e.g. for adding 0.0. 0.0+y == y for all values of y, except if y is a nan. (I hope I'm not stumbling about a special case for -0.0 here.)

This actually inhibits compiler optimizations. With full IEEE compatibility, 0.0+y cannot be optimized to y.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Sep 19, 2014

Contributor

My impression is that the rule is derived from the assumption that NaN represents "could be any real number". But I'm not so sure about the rule if the NaN came from sqrt(-1.0).

0.0+y is "the same as" y when y is NaN. It's just that == with a NaN is not the same as "same as". It's 0.0+(-0.0) that's the thorn in the side for optimizing 0.0+y, since 0.0+(-0.0) is 0.0. According to Muchnick's text the only safe optimization of IEEE arithmetic is replacing x/c by x*(1/c) when c and 1/c can be represented exactly (which means c has to be an integral power of 2).

The mutable global state of IEEE arithmetic was a big mistake in my opinion. It plays badly in highly pipelined processors and parallel programming environments. I once heard a talk by Guy Steele where he suggested it was time to revise IEEE 754 to remove the global state. I recall that he had to shorten the significand by one bit so that he could use the bit for other purposes, thus his proposal was not a backwards-compatible format.

Contributor

ArchRobison commented Sep 19, 2014

My impression is that the rule is derived from the assumption that NaN represents "could be any real number". But I'm not so sure about the rule if the NaN came from sqrt(-1.0).

0.0+y is "the same as" y when y is NaN. It's just that == with a NaN is not the same as "same as". It's 0.0+(-0.0) that's the thorn in the side for optimizing 0.0+y, since 0.0+(-0.0) is 0.0. According to Muchnick's text the only safe optimization of IEEE arithmetic is replacing x/c by x*(1/c) when c and 1/c can be represented exactly (which means c has to be an integral power of 2).

The mutable global state of IEEE arithmetic was a big mistake in my opinion. It plays badly in highly pipelined processors and parallel programming environments. I once heard a talk by Guy Steele where he suggested it was time to revise IEEE 754 to remove the global state. I recall that he had to shorten the significand by one bit so that he could use the bit for other purposes, thus his proposal was not a backwards-compatible format.

@danluu

This comment has been minimized.

Show comment
Hide comment
@danluu

danluu Sep 19, 2014

Contributor

I don't really follow the argument for making max(Infinity, NaN) = Infinity. This debate is very similar to the X-optimism vs. X-pessimism debate in hardware simulators, but there's a distinction, which is that X in a hardware simulator represents some unknown (but representable) value. As Arch points out, NaN can represent all sorts of crazy stuff. Why should max(Infinity, some nonsensical thing) be Infinity?

Contributor

danluu commented Sep 19, 2014

I don't really follow the argument for making max(Infinity, NaN) = Infinity. This debate is very similar to the X-optimism vs. X-pessimism debate in hardware simulators, but there's a distinction, which is that X in a hardware simulator represents some unknown (but representable) value. As Arch points out, NaN can represent all sorts of crazy stuff. Why should max(Infinity, some nonsensical thing) be Infinity?

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Sep 20, 2014

The infinity / infinity is also Nan.
On Sep 19, 2014 4:04 PM, "Dan Luu" notifications@github.com wrote:

I don't really follow the argument for making max(Infinity, NaN) =
Infinity. This debate is very similar to the X-optimism vs. X-pessimism
debate in hardware simulators, but there's a distinction, which is that X
in a hardware simulator represents "either 0 or 1", whereas NaN can
represent all sorts of crazy stuff, like Infinity/Infinity. What's max(Infinity,
Infinity/Infinity)?


Reply to this email directly or view it on GitHub
#7866 (comment).

The infinity / infinity is also Nan.
On Sep 19, 2014 4:04 PM, "Dan Luu" notifications@github.com wrote:

I don't really follow the argument for making max(Infinity, NaN) =
Infinity. This debate is very similar to the X-optimism vs. X-pessimism
debate in hardware simulators, but there's a distinction, which is that X
in a hardware simulator represents "either 0 or 1", whereas NaN can
represent all sorts of crazy stuff, like Infinity/Infinity. What's max(Infinity,
Infinity/Infinity)?


Reply to this email directly or view it on GitHub
#7866 (comment).

@danluu

This comment has been minimized.

Show comment
Hide comment
@danluu

danluu Sep 20, 2014

Contributor

Sorry, I edited the message on github so the email notification is out of
date.
On Sep 19, 2014 9:19 PM, "Carter Tazio Schonwald" notifications@github.com
wrote:

The infinity / infinity is also Nan.
On Sep 19, 2014 4:04 PM, "Dan Luu" notifications@github.com wrote:

I don't really follow the argument for making max(Infinity, NaN) =
Infinity. This debate is very similar to the X-optimism vs. X-pessimism
debate in hardware simulators, but there's a distinction, which is that
X
in a hardware simulator represents "either 0 or 1", whereas NaN can
represent all sorts of crazy stuff, like Infinity/Infinity. What's
max(Infinity,
Infinity/Infinity)?

Reply to this email directly or view it on GitHub
#7866 (comment).

Reply to this email directly or view it on GitHub
#7866 (comment).

Contributor

danluu commented Sep 20, 2014

Sorry, I edited the message on github so the email notification is out of
date.
On Sep 19, 2014 9:19 PM, "Carter Tazio Schonwald" notifications@github.com
wrote:

The infinity / infinity is also Nan.
On Sep 19, 2014 4:04 PM, "Dan Luu" notifications@github.com wrote:

I don't really follow the argument for making max(Infinity, NaN) =
Infinity. This debate is very similar to the X-optimism vs. X-pessimism
debate in hardware simulators, but there's a distinction, which is that
X
in a hardware simulator represents "either 0 or 1", whereas NaN can
represent all sorts of crazy stuff, like Infinity/Infinity. What's
max(Infinity,
Infinity/Infinity)?

Reply to this email directly or view it on GitHub
#7866 (comment).

Reply to this email directly or view it on GitHub
#7866 (comment).

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Sep 20, 2014

some nonsensical thing is always NaN

some nonsensical thing is always NaN

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Sep 20, 2014

though you do raise a good point indirectly, that the IEEE spec gives language implementors leeway on how they can use / treat / distinguish (different non signalling) nans. This isn't used in many languages, but it is possible!

though you do raise a good point indirectly, that the IEEE spec gives language implementors leeway on how they can use / treat / distinguish (different non signalling) nans. This isn't used in many languages, but it is possible!

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Sep 20, 2014

Member

The sqrt(-1) example shows that NaN might not be real at all, in which case max(Inf,NaN) == Inf would be erroneous. However, we raise an error for that case, rather than returning NaN, so I think that it's arguable that if you have a value of type Float64, you can assume that it's some kind of real value (even if it's not a number). I'm not sure this is super important though. Either max(Inf,NaN) => Inf or max(Inf,NaN) => NaN seem reasonable to me.

Member

StefanKarpinski commented Sep 20, 2014

The sqrt(-1) example shows that NaN might not be real at all, in which case max(Inf,NaN) == Inf would be erroneous. However, we raise an error for that case, rather than returning NaN, so I think that it's arguable that if you have a value of type Float64, you can assume that it's some kind of real value (even if it's not a number). I'm not sure this is super important though. Either max(Inf,NaN) => Inf or max(Inf,NaN) => NaN seem reasonable to me.

@danluu

This comment has been minimized.

Show comment
Hide comment
@danluu

danluu Sep 22, 2014

Contributor

I definitely don't have a strong opinion on this, since it's a corner case of a corner case. I especially don't have strong opinion if NaN can't be a stand-in for an unrepresentable value in Julia.

Unless there's something I'm missing, I'd vote for not special casing Inf because that would add an extra branch. The thing I might be missing is, does anyone actually use Inf to squash NaN, as Kahan implies above?

In hardware, knowing whether your simulator uses X-optimism or X-pessimism is critically important, because "regular" values can cause X to be squashed in one of the two cases (e.g., if you have X coming into a mux leg that isn't selected), and you have to adjust your coding style appropriately. This seems pretty different to me, unless people actually code things where they expect NaN to get killed by Inf. That seems odd to me, but I don't have nearly as much experience as Kahan.

Contributor

danluu commented Sep 22, 2014

I definitely don't have a strong opinion on this, since it's a corner case of a corner case. I especially don't have strong opinion if NaN can't be a stand-in for an unrepresentable value in Julia.

Unless there's something I'm missing, I'd vote for not special casing Inf because that would add an extra branch. The thing I might be missing is, does anyone actually use Inf to squash NaN, as Kahan implies above?

In hardware, knowing whether your simulator uses X-optimism or X-pessimism is critically important, because "regular" values can cause X to be squashed in one of the two cases (e.g., if you have X coming into a mux leg that isn't selected), and you have to adjust your coding style appropriately. This seems pretty different to me, unless people actually code things where they expect NaN to get killed by Inf. That seems odd to me, but I don't have nearly as much experience as Kahan.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Sep 22, 2014

Member

but I don't have nearly as much experience as Kahan.

LoL. Who does?

Member

StefanKarpinski commented Sep 22, 2014

but I don't have nearly as much experience as Kahan.

LoL. Who does?

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Sep 22, 2014

I don't think the cost of branch should determine what the default end user semantics are for min/max. For those who want last mile microoptimizations, having an alternative "whatever the hardware supports most efficiently" version somewhere else but easily found is probably best wrt balancing default UX

I don't think the cost of branch should determine what the default end user semantics are for min/max. For those who want last mile microoptimizations, having an alternative "whatever the hardware supports most efficiently" version somewhere else but easily found is probably best wrt balancing default UX

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Sep 22, 2014

Member

Honestly, I just don't think this matters very much – neither option is clearly right. We should just do whatever can be done most efficiently.

Member

StefanKarpinski commented Sep 22, 2014

Honestly, I just don't think this matters very much – neither option is clearly right. We should just do whatever can be done most efficiently.

@eschnett

This comment has been minimized.

Show comment
Hide comment
@eschnett

eschnett Sep 22, 2014

Contributor

We should by default do whatever the standard prescribes. If people want a different option, then we can introduce a new function for this, with a name that indicates that it doesn't adhere to the standard. And if people want performance (I do!), there can be macros or options or other mechanisms to ignore the inconvenient parts of the standard.

Contributor

eschnett commented Sep 22, 2014

We should by default do whatever the standard prescribes. If people want a different option, then we can introduce a new function for this, with a name that indicates that it doesn't adhere to the standard. And if people want performance (I do!), there can be macros or options or other mechanisms to ignore the inconvenient parts of the standard.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Sep 22, 2014

Member

If you look at the rest of this issue, the standard doesn't take a stand on this point, which is why @ArchRobison wrote to Kahan about this. But it's still unclear since his reasoning seems to assume properties of languages that just aren't true of any language but assembly.

Member

StefanKarpinski commented Sep 22, 2014

If you look at the rest of this issue, the standard doesn't take a stand on this point, which is why @ArchRobison wrote to Kahan about this. But it's still unclear since his reasoning seems to assume properties of languages that just aren't true of any language but assembly.

@eschnett

This comment has been minimized.

Show comment
Hide comment
@eschnett

eschnett Sep 22, 2014

Contributor

I believe the IEEE standard (as well as POSIX and OpenCL) do take a stand -- if one argument is a number and the other is nan, the number shall be returned. POSIX and OpenCL call this operation fmin, IEEE calls it minNum, but this corresponds to min for Julia.

Contributor

eschnett commented Sep 22, 2014

I believe the IEEE standard (as well as POSIX and OpenCL) do take a stand -- if one argument is a number and the other is nan, the number shall be returned. POSIX and OpenCL call this operation fmin, IEEE calls it minNum, but this corresponds to min for Julia.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Sep 22, 2014

This is one of the few cases where the standards body IEEE floating point defaults are wrong (for languages that are not assembly language or its equiv).

This is one of the few cases where the standards body IEEE floating point defaults are wrong (for languages that are not assembly language or its equiv).

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Sep 22, 2014

the point being, in a high level language, NAN values are used to denote erroneous answers, and having that propagate upwards visibly is good engineering (given the constraints in ease of seeing CPU flags about floating point state)

the point being, in a high level language, NAN values are used to denote erroneous answers, and having that propagate upwards visibly is good engineering (given the constraints in ease of seeing CPU flags about floating point state)

@danluu

This comment has been minimized.

Show comment
Hide comment
@danluu

danluu Sep 22, 2014

Contributor

The consensus in this thread seems to be that the behavior in the existing standard is both surprising and not a good idea. The rationale given for the standard behavior, given upthread, certainly doesn't make sense now.

FWIW, the IEEE standard behavior is non-standard among actual implementations (see table 1). The only modern language I could find that matches the standard is Julia; the other two that do are C and Octave/MATLAB.

Contributor

danluu commented Sep 22, 2014

The consensus in this thread seems to be that the behavior in the existing standard is both surprising and not a good idea. The rationale given for the standard behavior, given upthread, certainly doesn't make sense now.

FWIW, the IEEE standard behavior is non-standard among actual implementations (see table 1). The only modern language I could find that matches the standard is Julia; the other two that do are C and Octave/MATLAB.

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Sep 22, 2014

Member

The discussion here is going toward #5234

Member

jiahao commented Sep 22, 2014

The discussion here is going toward #5234

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Sep 13, 2016

Member

I say we go ahead with this in 0.6.

Member

StefanKarpinski commented Sep 13, 2016

I say we go ahead with this in 0.6.

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016

stevengj added a commit to stevengj/julia that referenced this issue Sep 14, 2016

@simonbyrne

This comment has been minimized.

Show comment
Hide comment
@simonbyrne

simonbyrne Sep 14, 2016

Contributor

We should probably have a function that gives the old behaviour for standards-junkies. numpy and matlab use nanmin.

Contributor

simonbyrne commented Sep 14, 2016

We should probably have a function that gives the old behaviour for standards-junkies. numpy and matlab use nanmin.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Sep 14, 2016

Member

Eh, let's wait and see how much complaining there is first. Can also go in a package.

Member

StefanKarpinski commented Sep 14, 2016

Eh, let's wait and see how much complaining there is first. Can also go in a package.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@AgnerF

This comment has been minimized.

Show comment
Hide comment
@AgnerF

AgnerF Mar 1, 2018

FYI, this question is also discussed here: []https://stackoverflow.com/questions/49011370/nan-propagation-and-ieee-754-standard/49040225
I agree that a global state does not work well with parallel processing. Now that more and more platforms support vector processing, it is better to rely on NAN propagation than fault trapping because you can have multiple faults simultaneously in one vector. In my opinion, min and max should propagate NAN always.

AgnerF commented Mar 1, 2018

FYI, this question is also discussed here: []https://stackoverflow.com/questions/49011370/nan-propagation-and-ieee-754-standard/49040225
I agree that a global state does not work well with parallel processing. Now that more and more platforms support vector processing, it is better to rely on NAN propagation than fault trapping because you can have multiple faults simultaneously in one vector. In my opinion, min and max should propagate NAN always.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Mar 1, 2018

Member

That's good to hear 👍 – that is what Julia does these days.

Member

StefanKarpinski commented Mar 1, 2018

That's good to hear 👍 – that is what Julia does these days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment