Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error that doesn't cause failure when producing mrsos for Lmon #495

Closed
ehogan opened this issue Jun 3, 2019 · 20 comments
Closed

Error that doesn't cause failure when producing mrsos for Lmon #495

ehogan opened this issue Jun 3, 2019 · 20 comments
Projects
Milestone

Comments

@ehogan
Copy link
Contributor

ehogan commented Jun 3, 2019

I'm using CMOR 3.4.0 to produce mrsos for Lmon. The output netCDF file is produced successfully, yet the CMOR log contains:

C Traceback:
! In function: cmor_treat_axis_values
! called from: cmor_axis
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: axis sdepth1 (table: Lmon) , detected value at:
! 0.100000 when valid_max is 0.100000
!
!!!!!!!!!!!!!!!!!!!!!!!!!

The bounds I'm providing for sdepth1 are [0, 0.100000001490116] (this is what is written to the output netCDF file). Should cmor_treat_axis_values accept that 0.100000001490116 is close enough to 0.1 by performing an appropriate float comparison instead of producing this error (that doesn't cause CMOR to fail anyway)?

@taylor13
Copy link
Collaborator

I think we should either allow the value and not generate an error message, or error exit and genera te the message. It doesn't seem to me that this would be too difficult in theory. When it finds the error, why don't we throw an error? (i.e., I don't think we should allow it) Does the problem have to do with a conversion from single to double precision?

@ehogan
Copy link
Contributor Author

ehogan commented Jul 26, 2019

I think we should either allow the value and not generate an error message, or error exit and generate the message.

Agreed :)

Does the problem have to do with a conversion from single to double precision?

It's a float representation issue:

% python
Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> my_float = np.float32(0.1)
>>> print('{0:.15f}'.format(my_float))
0.100000001490116

The line I linked to in my first comment performs a if value > valid_max test; this line (and others like it) should be adapted to ensure this is taken into account :)

@taylor13
Copy link
Collaborator

It seems to me that valid_max and valid_min when applied to (or compared with) an "axis" value should be declared as "double precision", but it appears they have been declared single precision (with a value of +- 1.e20). I don't know c syntax, but in fortran that's a single precision float. If I'm right, this should be the first thing we correct.

In your code segment above (#495 (comment)), there must be a way to generate a 32 bit float with the value of 0.1000000000000000. Do you know how to do that?

Also, can we get the error message given in #495 (comment) to output the full precision of the value and the value_max?

@mauzey1
Copy link
Collaborator

mauzey1 commented Jul 26, 2019

The valid_max and valid_min values are stored as doubles.

cmor/include/cmor.h

Lines 326 to 327 in 5f1844b

double valid_min;
double valid_max;

The code is currently converting the axis values to double-precision floats before doing the min/max comparisons.

cmor/Src/cmor_axes.c

Lines 1831 to 1851 in 5f1844b

if (type == 'f') {
for (i = 0; i < length; i++) {
cmor_axes[cmor_naxes].values[i] =
(double)((float *)coord_vals)[i];
}
} else if (type == 'l') {
for (i = 0; i < length; i++) {
cmor_axes[cmor_naxes].values[i] =
(double)((long *)coord_vals)[i];
}
} else if (type == 'i') {
for (i = 0; i < length; i++) {
cmor_axes[cmor_naxes].values[i] =
(double)((int *)coord_vals)[i];
}
} else if (type == 'd') {
for (i = 0; i < length; i++) {
cmor_axes[cmor_naxes].values[i] =
(double)((double *)coord_vals)[i];
}
}

Maybe we should have separate functions for processing each datatype before converting the values to double.

@mauzey1 mauzey1 added this to the 4.0/Future milestone Jul 26, 2019
@taylor13
Copy link
Collaborator

So If the axis values are passed to CMOR as double, then no conversion is done. Right? Maybe we should warn the user that they have passed an incorrect data type. All coordinate axes are supposed to be double precision. Other ideas?

@mauzey1
Copy link
Collaborator

mauzey1 commented Jul 26, 2019

There is some code that is converting doubles to doubles, which is redundant and should be removed.

We could add a warning about the expected data type and the conversion to be performed if other data types are passed.

@taylor13
Copy link
Collaborator

taylor13 commented Jul 26, 2019

I agree; converting double to double is unnecessary.

Also agree that whenever a data type is passed that is different from the type output to the netCDF file, we should warn users. This should probably apply to attributes, variables, and axis values. This might require a number of changes to the coding in various places. How much work would it be?

If differences in data type are encountered, the error messages could read something like

"Warning: You have passed as an argument of ", <name of function>, " a ", <data type of
 passed variable(s)>, " for ", <description of variable>, "but the variable should be ", <data
 type required by the CMIP specifications>, ".  CMOR will convert the variable to the correct
 data type, but be aware that this can alter the value, which may be undesirable.  It is
 recommended that your code be corrected to pass arguments of the correct type."  

I think we should consider making this a fatal error, rather than a warning. What do you think?

@ehogan
Copy link
Contributor Author

ehogan commented Jul 29, 2019

In your code segment above (#495 (comment)), there must be a way to generate a 32 bit float with the value of 0.1000000000000000. Do you know how to do that?

It's not possible, this is just how floats are represented (see https://floating-point-gui.de/basic/ for more details).

@taylor13
Copy link
Collaborator

taylor13 commented Jul 29, 2019

Yes, my mistake. I meant to say "there must be a way to generate a 64 bit float with the value of 0.100000000000000. Do you know how to do that?"
Does my_float = np.float64(0.1) do the trick?
When you create longitude and latitude values for CMOR, shouldn't they be created initially as double precision, rather than having CMOR convert them from single to double (which I presume leads to a value with only 32 bit accuracy, not 64 bit accuracy.
I know how it all works in fortran, but not in python.

@durack1
Copy link
Contributor

durack1 commented Jul 29, 2019

@taylor13 @ehogan @mauzey1 here's the output from python at least

>>> import numpy as np
>>> my_float = np.float64(0.1)
>>> my_float
0.1
>>> print np.version.version
1.14.2

@ehogan be a little careful with the numpy version, this has changed a little over their numerous releases, 1.14.2 is relatively recent (1.17.0 is the latest, released 26th July 2019)

@ehogan
Copy link
Contributor Author

ehogan commented Aug 5, 2019

@taylor13, this is a generic computer issue rather than a language-specific issue (I was using Python for demonstration purposes only).

@durack1, if you want to show the full float representation, string formatting should be used:

>>> import numpy as np
>>> my_float = np.float64(0.1)
>>> print(my_float)
0.1
>>> print('{0:.31f}'.format(my_float))
0.1000000000000000055511151231258
>>> print(np.version.version)
1.15.4

@taylor13
Copy link
Collaborator

taylor13 commented Aug 6, 2019

@ehogan, I agree this is a generic computer issue. I think, however, it is odd that the default for a print statement is to display more digits than can be precisely represented in binary. For float32, I would prefer to see only 7 or so digits be displayed, since anything beyond that is beyond the precision that can be accurately represented. Similarly 16 or so digits would seem to be appropriate, as a default, for float64 numbers.

But all that is somewhat beside the point. The question is why does CMOR raise an error when comparing two floats that are near 0.1. Is it possible that valid_max has been assigned originally as a single precision variable (with the nominal value 0.1) and then converted to double precision, whereas sdepth1 is defined all along to be double precision (again with nominal value 0.1)? If so, then we can't expect sdepth1 to be precisely the same as valid_max. This could explain the error.

To work around this we should add epsilon x valid_max to valid_max before comparing (and subtract epsilon x valid_min from valid_min before comparing), where epsilon might be 1.e-6 (near the fractional precision limit of float32.)

Do you think that would be o.k.?

@mauzey1
Copy link
Collaborator

mauzey1 commented Aug 6, 2019

@taylor13
The valid_max value of sdepth1 is stored as a float64 in CMOR. If the sdepth1 data being passed into cmor_axis has float32 values of 0.1 they will be a number that is larger then the float64 value of 0.1 stored in CMOR.

In [10]: print(numpy.version.version)                                                                                                                                                          
1.15.4

In [11]: f = numpy.float32(0.1)                                                                                                                                                                

In [12]: d = numpy.float64(0.1)                                                                                                                                                                

In [13]: print('{0:.31f}'.format(f))                                                                                                                                                           
0.1000000014901161193847656250000

In [14]: print('{0:.31f}'.format(d))                                                                                                                                                           
0.1000000000000000055511151231258

In [15]: print('{0:.31f}'.format(numpy.float64(f)))                                                                                                                                            
0.1000000014901161193847656250000

In [16]: numpy.float64(f) > d                                                                                                                                                                  
Out[16]: True

@taylor13
Copy link
Collaborator

taylor13 commented Aug 7, 2019

Yes, this makes sense.

All coordinate values should be originally defined by the user as float64, but some users will define these as float32. So for float32 coordinate values passed by users to be acceptable, we will have to allow for truncation errors. I think an epsilon of 1.e-6 for fractional error should allow float32's and probably not make a practical difference. We might consider resetting the coordinate value to the actually valid_max (in float64) if it falls within epsilon of valid_max. @ehogan, what do you think of this idea?

@ehogan
Copy link
Contributor Author

ehogan commented Aug 8, 2019

Thanks @taylor13; a few things:

  • according to the documentation, the type specified by the user can be 'd' (double), 'f' (float), 'l' (long) or 'i' (int); there's no requirement to pass d (float64)
  • https://floating-point-gui.de/errors/comparison/ might be useful when writing a solution
  • I would strongly recommend not changing the value of the data (valid_max is just an upper limit; the value of the data can be anything up to that value)

@taylor13
Copy link
Collaborator

taylor13 commented Aug 9, 2019

O.K. let's just relax the strictness of the test and let data through that is within epsilon*valid_max (or valid_min) of the limit. Any objections?

@ehogan Thanks for identifying this problem and helping to find a solution.

@mauzey1 mauzey1 added this to To do in 3.6.0 Aug 13, 2019
@taylor13
Copy link
Collaborator

@mauzey1 Is this still on the "to do" list?

@mauzey1
Copy link
Collaborator

mauzey1 commented Feb 24, 2020

@ehogan @taylor13 Now that changes that allow the axis values to be within a fractional error have been added to CMOR, can we close this issue?

@taylor13
Copy link
Collaborator

Yes, that's fine with me. Let's give @ehogan until next week to respond, and if we don't hear from her otherwise, close it then.
thanks!

@matthew-mizielinski
Copy link

Hi @taylor13, @mauzey1, I've had a quick chat with Emma, who has moved to a new role within the Met Office, and we think this change looks fine.

@mauzey1 mauzey1 closed this as completed Feb 26, 2020
@mauzey1 mauzey1 moved this from In progress to Done in 3.6.0 Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
3.6.0
  
Done
Development

No branches or pull requests

5 participants