Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve floating point representation in generation of regular grids that use a reference point and spacings #3559

Open
alastair-gemmell opened this issue Nov 22, 2019 · 4 comments

Comments

@alastair-gemmell
Copy link
Contributor

alastair-gemmell commented Nov 22, 2019

Some regular grids in Iris are generated from a reference point along a given coordinate direction along with an associated spacing.

The way this is currently implemented uses numpy.arange as follows:

points = (zeroth + step) + step * np.arange(count, dtype=np.float32)

Where zeroth is the value prior to the first point value, step is the numeric difference between successive point values and count is the number of point values.

As discussed at https://stackoverflow.com/questions/47243190/numpy-arange-how-to-make-precise-array-of-floats (amongst other places), this results in grids that contain fractionally incorrectly represented floats. E.g. using zeroth = 352.84, step = 0.11, count = 146, it produces the following array:

array([352.95 , 353.06 , 353.17 , 353.28 , 353.39 , 353.5 , 353.61002, 353.72 , 353.83002, 353.94 , 354.05002, 354.16 , 354.27002, 354.38 , 354.49002, 354.6 , 354.71002, 354.82 , 354.93002, 355.04 , 355.15002, 355.26 , 355.37003, 355.48 , 355.59003, 355.7 , 355.81 , 355.92 , 356.03 , 356.14 , 356.25 , 356.36002, 356.47 , 356.58002, 356.69 , 356.80002, 356.91 , 357.02002, 357.13 , 357.24002, 357.35 , 357.46002, 357.57 , 357.68002, 357.79 , 357.90002, 358.01 , 358.12003, 358.23 , 358.34003, 358.45 , 358.56 , 358.67 , 358.78 , 358.89 , 359. , 359.11002, 359.22 , 359.33002, 359.44 , 359.55002, 359.66 , 359.77002, 359.88 , 359.99002, 360.1 , 360.21002, 360.32 , 360.43002, 360.54 , 360.65002, 360.76 ], dtype=float32)

One consequence of this is that two grids that should be identical cannot be manipulated together (e.g. subtract one from the other) if one is fully specified from a source file, and the other is generated in this way from a reference point and spacings.

A potential solution as mentioned in the above link is to use an approach involving numpy.linspace instead such as:

start = zeroth + step
end = zeroth + (count * step)
points = np.linspace(start, end, num=count, dtype=np.float32)

Using the same values of zeroth, step and count as above this produces the following array:

array([352.95, 353.06, 353.17, 353.28, 353.39, 353.5 , 353.61, 353.72, 353.83, 353.94, 354.05, 354.16, 354.27, 354.38, 354.49, 354.6 , 354.71, 354.82, 354.93, 355.04, 355.15, 355.26, 355.37, 355.48, 355.59, 355.7 , 355.81, 355.92, 356.03, 356.14, 356.25, 356.36, 356.47, 356.58, 356.69, 356.8 , 356.91, 357.02, 357.13, 357.24, 357.35, 357.46, 357.57, 357.68, 357.79, 357.9 , 358.01, 358.12, 358.23, 358.34, 358.45, 358.56, 358.67, 358.78, 358.89, 359. , 359.11, 359.22, 359.33, 359.44, 359.55, 359.66, 359.77, 359.88, 359.99, 360.1 , 360.21, 360.32, 360.43, 360.54, 360.65, 360.76])

@rcomer
Copy link
Member

rcomer commented Jan 24, 2023

I believe this issue triggered odd behaviour in Cartopy for me: SciTools/cartopy#2128

@rcomer
Copy link
Member

rcomer commented Jan 24, 2023

My problem pp-fields have

lbrow: 325
bzy: -90.55556
bdy: 0.5555556

The linspace solution would still give me a maximum latitude of 90.00001

@larsbarring
Copy link
Contributor

We have seen similar problems in various regional model output datasets. This is why CORDEX specifies that projection (CRS) parameters and coordinate data have to be saved as double precision. But despite this, it occasionally happens that the historic simulation is not perfectly consistent with one (or several) of the future projection simulations. They may have been run on different machines, or using different library or OS versions, or ...

When I way back first stumbled on the problem that concatenation did not work as expected it was suggested that I should filter (round) the coordinate data. OK, this is a reasonable solution when the difference in double precision coordinate values amounts to nanometres in the real world (see this old google group comment).

Another solution is to disregard the 2d latitude/longitude data and recreate this information from the model grid (in my case either rotated pole or Lambert conformal conic grid coordinates). This approach has generally be en successful in alleviating the problem.

But recently a colleague struggled with a historic and future part of an experiment that still could not be concatenated. After quite some data debugging we found out that the problem was that for one of the projection parameters the values in the two files differed at the double precision last significant bit. This was not at all easy to spot and only underlines that generally exact comparisons should not be used with floating point numbers.

If I run the same example as in the first post, but change to float64 the arange and linspace alternatives still do not match each other exactly (16 elements differs by 5.68434189e-14). The question then is how the coordinate data in the OP's source file actually was created, was it with python, the same libraries (and versions)....?

@rcomer
Copy link
Member

rcomer commented Feb 9, 2023

Thanks @larsbarring, yes I think I've concluded that for my case the file format was the problem. The header of my file states that there are 325 rows with spacing of 0.5555556 degrees, so if Iris takes that at face value it will come out with a latitude range of slightly more than 180 degrees. It's not obvious to me what, if anything Iris should do instead. The file is generated by the Met Office Unified Model and so decisions about that format were made decades before we thought about using python!

I don't know about the OP, but it seems likely that that was also Met Office UM data.

I think we should remove the "Good First Issue" label as this is problem is clearly more complicated that was originally assumed.

@rcomer rcomer removed Experience: Low Good First Issue A good issue to take on if you're just getting started with Iris development labels Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants