RFC: add support for closeness comparison #170

pmeier · 2021-04-26T14:13:13Z

Due to the limitations of floating point arithmetic, comparing floating point values for bitwise equality is only required in very few situations. In usual sitatuations, for example comparing the output of a function against an expected result, it has thus become best practice to compare the values for closeness rather than equality. Python added built-in support for closeness comparisons (math.isclose) with PEP485 which was introduced in Python 3.5.

With this I'm proposing to add an elementwise isclose operator:

def isclose(x1, x2, *, rtol: float, atol: float):
    pass

Similar to equal, x1 and x2 as well as the return value are arrays. The returned array will be of type bool.

The relative tolerance rtol and absolute tolerance atol should have default values which are discussed below.

Status quo

All actively considered libraries already at least partially support closeness comparisons. In addition to the elementwise isclose operation, usually also allclose is defined. Since allclose(a, b) == all(isclose(a, b)) and all is already part of the standard, I don't think adding allclose is helpful. Otherwise, we would also need to consider allequal and so on.

Library	`isclose`	`allclose`
NumPy	`numpy.isclose`	`numpy.allclose`
TensorFlow	`tensorflow.experimental.numpy.isclose`	`tensorflow.experimental.numpy.allclose`
PyTorch	`torch.isclose`	`torch.allclose`
MXNet		`mxnet.contrib.ndarray.allclose`
JAX	`jax.numpy.isclose`	`jax.numpy.allclose`
Dask	`dask.array.isclose`	`dask.array.allclose`
CuPy	`cupy.isclose`	`cupy.allclose`

Closeness definition

All the libraries above define closeness like this:

abs(actual - expected) <= atol + rtol * abs(expected)

PEP485 states about this:

In this approach, the absolute and relative tolerance are added together, rather than the or method used in [math.isclose]. This is computationally more simple, and if relative tolerance is larger than the absolute tolerance, then the addition will have no effect. However, if the absolute and relative tolerances are of similar magnitude, then the allowed difference will be about twice as large as expected.
[...]
Even more critically, if the values passed in are small compared to the absolute tolerance, then the relative tolerance will be completely swamped, perhaps unexpectedly.

math.isclose overcomes this and additionally is symmetric:

abs(actual - expected) <= max(atol, rtol * max(abs(actual, expected)))

Thus, in addition to adding the isclose operator, I think it should stick to the objectively better definition of math.isclose.

Non-finite numbers

In addition to finite numbers, the standard should also define how non-finite numbers (NaN, inf, and -inf) are to be handled. Again, I propose to stick to the rationale of PEP485, which in turn is based on IEEE 754:

NaN is never close to anything. All library implementations add a equal_nan: bool = False flag to the functions. If True two NaN values are considered close. Still, comparison between any other value and a NaN is never considered close.
inf, and -inf are only close to themselves.

Default tolerances

In addition to fixed default values (math.isclose: rtol=1e-9, atol=0.0, all libraries: rtol=1e-5, atol=1e-8) the default tolerances could also be varied by the promoted dtype. For example, arrays of dtype float64 could use stricter default tolerances as float32.

For integer dtypes, I propose using rtol = atol = 0.0 which would be identical to comparing them for equality. For floating point dtypes I would use the rationale of PEP485 as base:

rtol: Approximately half the precision of the promoted dtype
atol: 0.0

The text was updated successfully, but these errors were encountered:

oleksandr-pavlyk · 2021-04-26T14:23:20Z

👍 for the adoption of symmetric proximity comparator.

All library implementations add a equal_nan: bool = False flag to the functions.

Can you indicate where this might be useful? PEP485 stipulates "NaN is not considered close to any other value, including NaN."

pmeier · 2021-04-26T14:33:29Z

Can you indicate where this might be useful? PEP485 stipulates "NaN is not considered close to any other value, including NaN."

True, and I haven't encountered a use case for this yet. I just wanted to add it for completeness.

I'm only familiar with the PyTorch test suite and there it is only used in very few cases. I'll try to dig up why it is necessary there. Plus, I try to dig up why it was added to NumPy in the first place. I believe all the other libraries simply adopted the flag.

rgommers · 2021-04-26T15:01:50Z

Thanks @pmeier!

equal_nan is quite useful for testing. There you typically compare against expected values; if the expected result of some function call is [1.5, 2.5, nan] then clearly you want equal_nan=True otherwise you have to special-case nan's everywhere.

For library code though, it's typically the opposite - you want the regular IEEE 754 behavior where nans are not equal.

That does of course make it a little questionable to have equal_nan in isclose/allclose rather than only in testing functions like assert_allclose. Maybe for statistical algorithms like the hypothesis tests in scipy.stats it still makes sense.

pmeier · 2021-04-26T15:28:55Z

That does of course make it a little questionable to have equal_nan in isclose/allclose rather than only in testing functions like assert_allclose

Good point. Since boolean operations and isnan are supported, the behavior can simply be re-added in test functions:

matches = np.isclose(a, b, equal_nan=False) 
if equal_nan:
    matches |= np.isnan(a) & np.isnan(b)

rgommers · 2021-04-27T11:56:42Z

I think there's a discussion elsewhere about nanfunctions (nansum, nanmax, etc.), which we all left out. The NumPy selection for what functions get a nan* variant is pretty ad hoc. There's probably a better solution: use the where= keyword consistently. For example, see equal:

equal(x1, x2, /, out=None, *, where=True, ...

>>> x1 = np.array([1.5, 2.5, np.nan])
>>> x2 = x1
>>> np.equal(x1, x1)
array([ True,  True, False])
>>> np.equal(x1, x1, where=~np.isnan(x1))
array([ True,  True,  True])

>>> np.sum(x1, where=~np.isnan(x1))
4.0
>>> np.sum(x1)
nan

Most other libraries do not support a where= keyword, and numpy only has it in ufuncs and a random selection of other functions - but leaving equal_nan= out for isclose/allclose and leaving the option open to add where= in the future is probably the sensible thing to do.

And as you show, it's easy enough to work around for now.

leofang · 2021-04-27T16:47:34Z

I think where in this case is an overkill, and there is value for having equal_nan in isclose() beyond the purpose of testing library codes. For example, when developing and debugging algorithms it's useful to check if the output contains nonsensical outcomes.

leofang · 2021-04-27T16:48:58Z

Sent too fast: If we remove equal_nan from the argument, we should definitely include an example for how to test NaNs in the docstring.

rgommers · 2021-05-06T18:23:02Z

We had a discussion about this today. The most important point of that was: while the PEP 485 definition is preferred overall if we would design this API today, all libraries use the atol + rtol * abs(expected) definition today - so in order to deviate from that definition, we must have a good introduction story to get from one definition to the other in a gradual fashion. And make sure especially NumPy would accept that; then other libraries will follow.

Overall there was agreement that these are important functions to have, and we should add them.

Other points regarding the definition differences:

max(abs(actual), abs(expected)) and rtol * abs(expected) are both valid choices; the latter is actually better in case expected is a known reference value. Hence the "symmetry" argument is not a strong one.
The use of atol + rtol... vs max(atol, rtol...) is the more important difference. There's no good reason for the +.

tl;dr see if there's a way to introduce the "nicer" behavior.

pmeier · 2021-05-07T07:32:35Z

we must have a good introduction story to get from one definition to the other in a gradual fashion. And make sure especially NumPy would accept that; then other libraries will follow.

True, and unfortunately I can't offer one apart from "we would be compatible with the standard library" and "it is objectively better".

From my own perspective (and we should definitively hear other opinions here), the only times I fiddled with the default tolerances is when I knew that I need to expect more deviation than normal. In these cases I didn't really care about the actual values as long as they where "low enough" and I've manually checked that it was fine.

I imagine the libraries are using there isclose / allclose functions for there test suite already. If that is the case we could send a PR changing the defintions and see how many tests will break in CI. That could give us a good picture about how much pushback we need to expect when changing the closeness defintion.

[rtol * ] max(abs(actual), abs(expected)) and rtol * abs(expected) are both valid choices; the latter is actually better in case expected is a known reference value. Hence the "symmetry" argument is not a strong one.

Trying to look at this without bias, the name "close" does not imply asymmetric behavior for me. It is in the same category as "equal" and not "greater then" or the like. From that perspective the current implementation can be confusing:

import numpy as np

a = 1
b = 10
rtol = 1
atol = 0
np.isclose(a, b, rtol=rtol, atol=atol)  # True
np.isclose(b, a, rtol=rtol, atol=atol)  # False

The only practical difference between the two implementations happens when abs(actual) is significantly larger than abs(expected). I've struggled to come up with a practical example, where a asymmetric behavior is desired and the values can still be considered close in a "human understandable" way. Maybe someone else can come up with one?

rgommers · 2021-05-08T15:32:55Z

I imagine the libraries are using there isclose / allclose functions for there test suite already. If that is the case we could send a PR changing the defintions and see how many tests will break in CI. That could give us a good picture about how much pushback we need to expect when changing the closeness defintion.

xref numpy/numpy#10161. Changing the existing definition looks like a nonstarter. And I haven't been able to come up with a good introduction strategy. In the linked numpy issue, even the PEP author says he would himself have preferred compatibility rather than do something incompatible for the sake of what is objectively best. "But is it "better" enough to break backward compatibility? I don't think so."

It looks like this has been very extensively litigated. We should stay with the existing definition that NumPy and other libraries have.

pmeier · 2021-05-18T15:07:36Z

To investigate the impact of the proposed change, I've implemented it for numpy and run the numpy, scipy, and scikit-learn test suite against this patched version.

numpy

Only a single test fails with respect to the numerics (there are < 10 failures for tests of the test function that break due to my PoC implementation):

python runtests.py \
    -t numpy/random/tests/test_generator_mt19937.py::TestMultivariateHypergeometric::test_typical_cases \
    -- \
    -k "150000-count-8"

E           Not equal to tolerance rtol=0.001, atol=0.005
E           
E           Mismatched elements: 1 / 4 (25%)
E           Max absolute difference: 0.00528
E           Max relative difference: 0.00228
E            x: array([1.330967, 0.665147, 2.671947, 3.33194 ])
E            y: array([1.333333, 0.666667, 2.666667, 3.333333])

Since rtol and atol have a similar magnitude, the addition of both tolerances made this test pass. Individually the maximum absolute difference is greater than atol (0.00528 > 0.005) and the maximum relative difference is greater than rtol (0.00228 > 0.001).

scipy

Internally scipy relies at one point on the asymmetry of np.isclose. I think this is not intended, but I opened scipy.optimize._linprog_simplex._apply_pivot relies on asymmetric closeness definition scipy/scipy#14081 to make sure. After patching this we have 10 failing tests left with 3 of them again related to the PoC implementation. 3 fall same category as the failing numpy test:

python runtests.py \
    -t scipy.optimize.tests \
    -- \
    -k "test_L6 or test_bug_7237 or test_singleton_row_ub_2"

4 more tests have very strict tolerances which are manually set:

python runtests.py \
    -t scipy \
    -- \
    -k "test_bug_10124 or test_bisect or test_simple_exact or test_check_finite"

For example:

E   Not equal to tolerance rtol=1e-08, atol=1e-08
E   converged to an unexpected solution
E   Mismatched elements: 1 / 2 (50%)
E   Max absolute difference: 1.09516519e-07
E   Max relative difference: 1.09516519e-08
E    x: array([10., -3.])
E    y: array([10, -3])

scikit-learn

No failing tests.

IMO, this means two things:

The number of tests we would be breaking by going for this change is fairly low if one recalls that each test suite comprises > 10k tests.
Most of test that would fail shouldn't have passed in the first place, since the checks for maximum and relative tolerance would both fail individually and only pass because the sum of the tolerances is checked.

leofang · 2021-05-19T04:59:18Z

To investigate the impact of the proposed change, I've implemented it for numpy and run the numpy, scipy, and scikit-learn test suite against this patched version.

Cool, thanks for looking into this @pmeier. Just wanna clarify: your patch used the math.isclose() relation, right?

pmeier · 2021-05-19T05:29:59Z

Just wanna clarify: your patch used the math.isclose() relation, right?

Yes, here are the details:

I've used numpy-fb586a4 and applied this patch:

diff --git a/numpy/core/numeric.py b/numpy/core/numeric.py
index 8bb37e291..a75eb1959 100644
--- a/numpy/core/numeric.py
+++ b/numpy/core/numeric.py
@@ -2336,7 +2336,9 @@ def isclose(a, b, rtol=1.e-5, atol=1.e-8, equal_nan=False):
     """
     def within_tol(x, y, atol, rtol):
         with errstate(invalid='ignore'):
-            return less_equal(abs(x-y), atol + rtol * abs(y))
+            diff = abs(x - y)
+            tolerance = np.maximum(atol, rtol * np.maximum(abs(x), abs(y)))
+            return less_equal(diff, tolerance)
 
     x = asanyarray(a)
     y = asanyarray(b)

I've run this against the test suites of numpy-fb586a4, scipy-6667d4b (applying the path I outlined in scipy/scipy#14081), and scikit-learn-053d2d1.

kgryte · 2024-04-04T03:12:17Z

As this proposal lacks consensus for the best path forward (e.g., (1) standardizing what many consider to be a less-than-optimal closeness definition as implemented by NumPy or (2) moving the ecosystem to a more desirable definition), I'll go ahead and close this issue.

Given that there's general agreement that the NumPy definition is not ideal, I'm personally leery about codifying such behavior in the specification and thus forcing every future array library adopting the standard to adopt. IMO, libraries should be free to innovate here and implement more ideal APIs and behavior.

pmeier mentioned this issue May 18, 2021

scipy.optimize._linprog_simplex._apply_pivot relies on asymmetric closeness definition scipy/scipy#14081

Closed

This was referenced Jun 1, 2021

numpy.isclose vs math.isclose numpy/numpy#10161

Open

replace np.isclose for pivot selection warning scipy/scipy#14167

Closed

kgryte added this to the v2022 milestone Oct 4, 2021

kgryte added the API extension Adds new functions or objects to the API. label Oct 4, 2021

kgryte added this to Stage 0 in Proposals Oct 4, 2021

sadielbartholomew mentioned this issue Nov 12, 2021

Data.equals: add unit test & migrate to Dask NCAS-CMS/cf-python#254

Merged

keewis mentioned this issue Sep 22, 2022

More Array API changes pydata/xarray#7067

Merged

1 task

rgommers removed this from the v2022 milestone Nov 28, 2022

pmeier mentioned this issue Mar 7, 2023

improve memory footprint of torch.testing.assert_close pytorch/pytorch#96131

Closed

tomwhite mentioned this issue Mar 15, 2023

np.nanmean executes eagerly on cubed arrays but lazily on dask arrays cubed-dev/cubed#153

Open

asmeurer mentioned this issue Jun 14, 2023

ENH: add machinery to support Array API scipy/scipy#18668

Merged

kgryte mentioned this issue Jun 29, 2023

Tracking issue for the 2023 revision of the array API specification #643

Closed

17 tasks

kgryte changed the title ~~Proposal: support for closeness comparison~~ RFC: add support for closeness comparison Apr 4, 2024

kgryte added RFC Request for comments. Feature requests and proposed changes. Needs Discussion Needs further discussion. labels Apr 4, 2024

kgryte closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add support for closeness comparison #170

RFC: add support for closeness comparison #170

pmeier commented Apr 26, 2021

oleksandr-pavlyk commented Apr 26, 2021 •

edited

pmeier commented Apr 26, 2021

rgommers commented Apr 26, 2021

pmeier commented Apr 26, 2021 •

edited

rgommers commented Apr 27, 2021

leofang commented Apr 27, 2021

leofang commented Apr 27, 2021

rgommers commented May 6, 2021

pmeier commented May 7, 2021 •

edited

rgommers commented May 8, 2021

pmeier commented May 18, 2021

leofang commented May 19, 2021

pmeier commented May 19, 2021

kgryte commented Apr 4, 2024 •

edited

RFC: add support for closeness comparison #170

RFC: add support for closeness comparison #170

Comments

pmeier commented Apr 26, 2021

Status quo

Closeness definition

Non-finite numbers

Default tolerances

oleksandr-pavlyk commented Apr 26, 2021 • edited

pmeier commented Apr 26, 2021

rgommers commented Apr 26, 2021

pmeier commented Apr 26, 2021 • edited

rgommers commented Apr 27, 2021

leofang commented Apr 27, 2021

leofang commented Apr 27, 2021

rgommers commented May 6, 2021

pmeier commented May 7, 2021 • edited

rgommers commented May 8, 2021

pmeier commented May 18, 2021

leofang commented May 19, 2021

pmeier commented May 19, 2021

kgryte commented Apr 4, 2024 • edited

oleksandr-pavlyk commented Apr 26, 2021 •

edited

pmeier commented Apr 26, 2021 •

edited

pmeier commented May 7, 2021 •

edited

kgryte commented Apr 4, 2024 •

edited