Error that doesn't cause failure when producing mrsos for Lmon #495

ehogan · 2019-06-03T15:06:48Z

I'm using CMOR 3.4.0 to produce mrsos for Lmon. The output netCDF file is produced successfully, yet the CMOR log contains:

C Traceback:
! In function: cmor_treat_axis_values
! called from: cmor_axis
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: axis sdepth1 (table: Lmon) , detected value at:
! 0.100000 when valid_max is 0.100000
!
!!!!!!!!!!!!!!!!!!!!!!!!!

The bounds I'm providing for sdepth1 are [0, 0.100000001490116] (this is what is written to the output netCDF file). Should cmor_treat_axis_values accept that 0.100000001490116 is close enough to 0.1 by performing an appropriate float comparison instead of producing this error (that doesn't cause CMOR to fail anyway)?

The text was updated successfully, but these errors were encountered:

taylor13 · 2019-07-26T00:30:42Z

I think we should either allow the value and not generate an error message, or error exit and genera te the message. It doesn't seem to me that this would be too difficult in theory. When it finds the error, why don't we throw an error? (i.e., I don't think we should allow it) Does the problem have to do with a conversion from single to double precision?

ehogan · 2019-07-26T12:57:03Z

I think we should either allow the value and not generate an error message, or error exit and generate the message.

Agreed :)

Does the problem have to do with a conversion from single to double precision?

It's a float representation issue:

% python
Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> my_float = np.float32(0.1)
>>> print('{0:.15f}'.format(my_float))
0.100000001490116

The line I linked to in my first comment performs a if value > valid_max test; this line (and others like it) should be adapted to ensure this is taken into account :)

taylor13 · 2019-07-26T16:07:20Z

It seems to me that valid_max and valid_min when applied to (or compared with) an "axis" value should be declared as "double precision", but it appears they have been declared single precision (with a value of +- 1.e20). I don't know c syntax, but in fortran that's a single precision float. If I'm right, this should be the first thing we correct.

In your code segment above (#495 (comment)), there must be a way to generate a 32 bit float with the value of 0.1000000000000000. Do you know how to do that?

Also, can we get the error message given in #495 (comment) to output the full precision of the value and the value_max?

mauzey1 · 2019-07-26T16:57:20Z

The valid_max and valid_min values are stored as doubles.

cmor/include/cmor.h

Lines 326 to 327 in 5f1844b

    
           double valid_min; 
        
           double valid_max;

The code is currently converting the axis values to double-precision floats before doing the min/max comparisons.

cmor/Src/cmor_axes.c

Lines 1831 to 1851 in 5f1844b

    
           if (type == 'f') { 
        
               for (i = 0; i < length; i++) { 
        
                   cmor_axes[cmor_naxes].values[i] = 
        
                     (double)((float *)coord_vals)[i]; 
        
               } 
        
           } else if (type == 'l') { 
        
               for (i = 0; i < length; i++) { 
        
                   cmor_axes[cmor_naxes].values[i] = 
        
                     (double)((long *)coord_vals)[i]; 
        
               } 
        
           } else if (type == 'i') { 
        
               for (i = 0; i < length; i++) { 
        
                   cmor_axes[cmor_naxes].values[i] = 
        
                     (double)((int *)coord_vals)[i]; 
        
               } 
        
           } else if (type == 'd') { 
        
               for (i = 0; i < length; i++) { 
        
                   cmor_axes[cmor_naxes].values[i] = 
        
                     (double)((double *)coord_vals)[i]; 
        
               } 
        
           }

Maybe we should have separate functions for processing each datatype before converting the values to double.

taylor13 · 2019-07-26T17:23:06Z

So If the axis values are passed to CMOR as double, then no conversion is done. Right? Maybe we should warn the user that they have passed an incorrect data type. All coordinate axes are supposed to be double precision. Other ideas?

mauzey1 · 2019-07-26T17:32:42Z

There is some code that is converting doubles to doubles, which is redundant and should be removed.

We could add a warning about the expected data type and the conversion to be performed if other data types are passed.

taylor13 · 2019-07-26T23:27:24Z

I agree; converting double to double is unnecessary.

Also agree that whenever a data type is passed that is different from the type output to the netCDF file, we should warn users. This should probably apply to attributes, variables, and axis values. This might require a number of changes to the coding in various places. How much work would it be?

If differences in data type are encountered, the error messages could read something like

"Warning: You have passed as an argument of ", <name of function>, " a ", <data type of
 passed variable(s)>, " for ", <description of variable>, "but the variable should be ", <data
 type required by the CMIP specifications>, ".  CMOR will convert the variable to the correct
 data type, but be aware that this can alter the value, which may be undesirable.  It is
 recommended that your code be corrected to pass arguments of the correct type."

I think we should consider making this a fatal error, rather than a warning. What do you think?

ehogan · 2019-07-29T08:27:46Z

In your code segment above (#495 (comment)), there must be a way to generate a 32 bit float with the value of 0.1000000000000000. Do you know how to do that?

It's not possible, this is just how floats are represented (see https://floating-point-gui.de/basic/ for more details).

taylor13 · 2019-07-29T17:13:19Z

Yes, my mistake. I meant to say "there must be a way to generate a 64 bit float with the value of 0.100000000000000. Do you know how to do that?"
Does my_float = np.float64(0.1) do the trick?
When you create longitude and latitude values for CMOR, shouldn't they be created initially as double precision, rather than having CMOR convert them from single to double (which I presume leads to a value with only 32 bit accuracy, not 64 bit accuracy.
I know how it all works in fortran, but not in python.

durack1 · 2019-07-29T18:50:50Z

@taylor13 @ehogan @mauzey1 here's the output from python at least

>>> import numpy as np
>>> my_float = np.float64(0.1)
>>> my_float
0.1
>>> print np.version.version
1.14.2

@ehogan be a little careful with the numpy version, this has changed a little over their numerous releases, 1.14.2 is relatively recent (1.17.0 is the latest, released 26th July 2019)

ehogan · 2019-08-05T12:54:02Z

@taylor13, this is a generic computer issue rather than a language-specific issue (I was using Python for demonstration purposes only).

@durack1, if you want to show the full float representation, string formatting should be used:

>>> import numpy as np
>>> my_float = np.float64(0.1)
>>> print(my_float)
0.1
>>> print('{0:.31f}'.format(my_float))
0.1000000000000000055511151231258
>>> print(np.version.version)
1.15.4

taylor13 · 2019-08-06T00:15:27Z

@ehogan, I agree this is a generic computer issue. I think, however, it is odd that the default for a print statement is to display more digits than can be precisely represented in binary. For float32, I would prefer to see only 7 or so digits be displayed, since anything beyond that is beyond the precision that can be accurately represented. Similarly 16 or so digits would seem to be appropriate, as a default, for float64 numbers.

But all that is somewhat beside the point. The question is why does CMOR raise an error when comparing two floats that are near 0.1. Is it possible that valid_max has been assigned originally as a single precision variable (with the nominal value 0.1) and then converted to double precision, whereas sdepth1 is defined all along to be double precision (again with nominal value 0.1)? If so, then we can't expect sdepth1 to be precisely the same as valid_max. This could explain the error.

To work around this we should add epsilon x valid_max to valid_max before comparing (and subtract epsilon x valid_min from valid_min before comparing), where epsilon might be 1.e-6 (near the fractional precision limit of float32.)

Do you think that would be o.k.?

mauzey1 · 2019-08-06T00:56:54Z

@taylor13
The valid_max value of sdepth1 is stored as a float64 in CMOR. If the sdepth1 data being passed into cmor_axis has float32 values of 0.1 they will be a number that is larger then the float64 value of 0.1 stored in CMOR.

In [10]: print(numpy.version.version)                                                                                                                                                          
1.15.4

In [11]: f = numpy.float32(0.1)                                                                                                                                                                

In [12]: d = numpy.float64(0.1)                                                                                                                                                                

In [13]: print('{0:.31f}'.format(f))                                                                                                                                                           
0.1000000014901161193847656250000

In [14]: print('{0:.31f}'.format(d))                                                                                                                                                           
0.1000000000000000055511151231258

In [15]: print('{0:.31f}'.format(numpy.float64(f)))                                                                                                                                            
0.1000000014901161193847656250000

In [16]: numpy.float64(f) > d                                                                                                                                                                  
Out[16]: True

taylor13 · 2019-08-07T16:43:24Z

Yes, this makes sense.

All coordinate values should be originally defined by the user as float64, but some users will define these as float32. So for float32 coordinate values passed by users to be acceptable, we will have to allow for truncation errors. I think an epsilon of 1.e-6 for fractional error should allow float32's and probably not make a practical difference. We might consider resetting the coordinate value to the actually valid_max (in float64) if it falls within epsilon of valid_max. @ehogan, what do you think of this idea?

ehogan · 2019-08-08T12:11:57Z

Thanks @taylor13; a few things:

according to the documentation, the type specified by the user can be 'd' (double), 'f' (float), 'l' (long) or 'i' (int); there's no requirement to pass d (float64)
https://floating-point-gui.de/errors/comparison/ might be useful when writing a solution
I would strongly recommend not changing the value of the data (valid_max is just an upper limit; the value of the data can be anything up to that value)

taylor13 · 2019-08-09T14:35:38Z

O.K. let's just relax the strictness of the test and let data through that is within epsilon*valid_max (or valid_min) of the limit. Any objections?

@ehogan Thanks for identifying this problem and helping to find a solution.

taylor13 · 2019-10-15T17:58:31Z

@mauzey1 Is this still on the "to do" list?

mauzey1 · 2020-02-24T19:57:29Z

@ehogan @taylor13 Now that changes that allow the axis values to be within a fractional error have been added to CMOR, can we close this issue?

taylor13 · 2020-02-25T18:19:48Z

Yes, that's fine with me. Let's give @ehogan until next week to respond, and if we don't hear from her otherwise, close it then.
thanks!

matthew-mizielinski · 2020-02-26T12:29:37Z

Hi @taylor13, @mauzey1, I've had a quick chat with Emma, who has moved to a new role within the Met Office, and we think this change looks fine.

mauzey1 added this to the 4.0/Future milestone Jul 26, 2019

mauzey1 added this to To do in 3.6.0 Aug 13, 2019

mauzey1 moved this from To do to In progress in 3.6.0 Jan 10, 2020

mauzey1 mentioned this issue Feb 1, 2020

Allow axis values to be within 1.e-6 fractional error of the valid_min and valid_max #577

Merged

mauzey1 closed this as completed Feb 26, 2020

mauzey1 moved this from In progress to Done in 3.6.0 Feb 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error that doesn't cause failure when producing mrsos for Lmon #495

Error that doesn't cause failure when producing mrsos for Lmon #495

ehogan commented Jun 3, 2019

taylor13 commented Jul 26, 2019

ehogan commented Jul 26, 2019 •

edited

Loading

taylor13 commented Jul 26, 2019

mauzey1 commented Jul 26, 2019

taylor13 commented Jul 26, 2019

mauzey1 commented Jul 26, 2019

taylor13 commented Jul 26, 2019 •

edited

Loading

ehogan commented Jul 29, 2019

taylor13 commented Jul 29, 2019 •

edited

Loading

durack1 commented Jul 29, 2019

ehogan commented Aug 5, 2019

taylor13 commented Aug 6, 2019

mauzey1 commented Aug 6, 2019

taylor13 commented Aug 7, 2019

ehogan commented Aug 8, 2019

taylor13 commented Aug 9, 2019 •

edited

Loading

taylor13 commented Oct 15, 2019

mauzey1 commented Feb 24, 2020

taylor13 commented Feb 25, 2020

matthew-mizielinski commented Feb 26, 2020

Error that doesn't cause failure when producing mrsos for Lmon #495

Error that doesn't cause failure when producing mrsos for Lmon #495

Comments

ehogan commented Jun 3, 2019

taylor13 commented Jul 26, 2019

ehogan commented Jul 26, 2019 • edited Loading

taylor13 commented Jul 26, 2019

mauzey1 commented Jul 26, 2019

taylor13 commented Jul 26, 2019

mauzey1 commented Jul 26, 2019

taylor13 commented Jul 26, 2019 • edited Loading

ehogan commented Jul 29, 2019

taylor13 commented Jul 29, 2019 • edited Loading

durack1 commented Jul 29, 2019

ehogan commented Aug 5, 2019

taylor13 commented Aug 6, 2019

mauzey1 commented Aug 6, 2019

taylor13 commented Aug 7, 2019

ehogan commented Aug 8, 2019

taylor13 commented Aug 9, 2019 • edited Loading

taylor13 commented Oct 15, 2019

mauzey1 commented Feb 24, 2020

taylor13 commented Feb 25, 2020

matthew-mizielinski commented Feb 26, 2020

ehogan commented Jul 26, 2019 •

edited

Loading

taylor13 commented Jul 26, 2019 •

edited

Loading

taylor13 commented Jul 29, 2019 •

edited

Loading

taylor13 commented Aug 9, 2019 •

edited

Loading