Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add support for closeness comparison #170

Closed
pmeier opened this issue Apr 26, 2021 · 14 comments
Closed

RFC: add support for closeness comparison #170

pmeier opened this issue Apr 26, 2021 · 14 comments
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes.
Projects

Comments

@pmeier
Copy link

pmeier commented Apr 26, 2021

Due to the limitations of floating point arithmetic, comparing floating point values for bitwise equality is only required in very few situations. In usual sitatuations, for example comparing the output of a function against an expected result, it has thus become best practice to compare the values for closeness rather than equality. Python added built-in support for closeness comparisons (math.isclose) with PEP485 which was introduced in Python 3.5.

With this I'm proposing to add an elementwise isclose operator:

def isclose(x1, x2, *, rtol: float, atol: float):
    pass

Similar to equal, x1 and x2 as well as the return value are arrays. The returned array will be of type bool.

The relative tolerance rtol and absolute tolerance atol should have default values which are discussed below.

Status quo

All actively considered libraries already at least partially support closeness comparisons. In addition to the elementwise isclose operation, usually also allclose is defined. Since allclose(a, b) == all(isclose(a, b)) and all is already part of the standard, I don't think adding allclose is helpful. Otherwise, we would also need to consider allequal and so on.

Library isclose allclose
NumPy numpy.isclose numpy.allclose
TensorFlow tensorflow.experimental.numpy.isclose tensorflow.experimental.numpy.allclose
PyTorch torch.isclose torch.allclose
MXNet mxnet.contrib.ndarray.allclose
JAX jax.numpy.isclose jax.numpy.allclose
Dask dask.array.isclose dask.array.allclose
CuPy cupy.isclose cupy.allclose

Closeness definition

All the libraries above define closeness like this:

abs(actual - expected) <= atol + rtol * abs(expected)

PEP485 states about this:

In this approach, the absolute and relative tolerance are added together, rather than the or method used in [math.isclose]. This is computationally more simple, and if relative tolerance is larger than the absolute tolerance, then the addition will have no effect. However, if the absolute and relative tolerances are of similar magnitude, then the allowed difference will be about twice as large as expected.
[...]
Even more critically, if the values passed in are small compared to the absolute tolerance, then the relative tolerance will be completely swamped, perhaps unexpectedly.

math.isclose overcomes this and additionally is symmetric:

abs(actual - expected) <= max(atol, rtol * max(abs(actual, expected)))

Thus, in addition to adding the isclose operator, I think it should stick to the objectively better definition of math.isclose.

Non-finite numbers

In addition to finite numbers, the standard should also define how non-finite numbers (NaN, inf, and -inf) are to be handled. Again, I propose to stick to the rationale of PEP485, which in turn is based on IEEE 754:

  • NaN is never close to anything. All library implementations add a equal_nan: bool = False flag to the functions. If True two NaN values are considered close. Still, comparison between any other value and a NaN is never considered close.
  • inf, and -inf are only close to themselves.

Default tolerances

In addition to fixed default values (math.isclose: rtol=1e-9, atol=0.0, all libraries: rtol=1e-5, atol=1e-8) the default tolerances could also be varied by the promoted dtype. For example, arrays of dtype float64 could use stricter default tolerances as float32.

For integer dtypes, I propose using rtol = atol = 0.0 which would be identical to comparing them for equality. For floating point dtypes I would use the rationale of PEP485 as base:

  • rtol: Approximately half the precision of the promoted dtype
  • atol: 0.0
@oleksandr-pavlyk
Copy link
Contributor

oleksandr-pavlyk commented Apr 26, 2021

👍 for the adoption of symmetric proximity comparator.

All library implementations add a equal_nan: bool = False flag to the functions.

Can you indicate where this might be useful? PEP485 stipulates "NaN is not considered close to any other value, including NaN."

@pmeier
Copy link
Author

pmeier commented Apr 26, 2021

Can you indicate where this might be useful? PEP485 stipulates "NaN is not considered close to any other value, including NaN."

True, and I haven't encountered a use case for this yet. I just wanted to add it for completeness.

I'm only familiar with the PyTorch test suite and there it is only used in very few cases. I'll try to dig up why it is necessary there. Plus, I try to dig up why it was added to NumPy in the first place. I believe all the other libraries simply adopted the flag.

@rgommers
Copy link
Member

Thanks @pmeier!

equal_nan is quite useful for testing. There you typically compare against expected values; if the expected result of some function call is [1.5, 2.5, nan] then clearly you want equal_nan=True otherwise you have to special-case nan's everywhere.

For library code though, it's typically the opposite - you want the regular IEEE 754 behavior where nans are not equal.

That does of course make it a little questionable to have equal_nan in isclose/allclose rather than only in testing functions like assert_allclose. Maybe for statistical algorithms like the hypothesis tests in scipy.stats it still makes sense.

@pmeier
Copy link
Author

pmeier commented Apr 26, 2021

That does of course make it a little questionable to have equal_nan in isclose/allclose rather than only in testing functions like assert_allclose

Good point. Since boolean operations and isnan are supported, the behavior can simply be re-added in test functions:

matches = np.isclose(a, b, equal_nan=False) 
if equal_nan:
    matches |= np.isnan(a) & np.isnan(b)

@rgommers
Copy link
Member

I think there's a discussion elsewhere about nanfunctions (nansum, nanmax, etc.), which we all left out. The NumPy selection for what functions get a nan* variant is pretty ad hoc. There's probably a better solution: use the where= keyword consistently. For example, see equal:

equal(x1, x2, /, out=None, *, where=True, ...
>>> x1 = np.array([1.5, 2.5, np.nan])
>>> x2 = x1
>>> np.equal(x1, x1)
array([ True,  True, False])
>>> np.equal(x1, x1, where=~np.isnan(x1))
array([ True,  True,  True])

>>> np.sum(x1, where=~np.isnan(x1))
4.0
>>> np.sum(x1)
nan

Most other libraries do not support a where= keyword, and numpy only has it in ufuncs and a random selection of other functions - but leaving equal_nan= out for isclose/allclose and leaving the option open to add where= in the future is probably the sensible thing to do.

And as you show, it's easy enough to work around for now.

@leofang
Copy link
Contributor

leofang commented Apr 27, 2021

I think where in this case is an overkill, and there is value for having equal_nan in isclose() beyond the purpose of testing library codes. For example, when developing and debugging algorithms it's useful to check if the output contains nonsensical outcomes.

@leofang
Copy link
Contributor

leofang commented Apr 27, 2021

Sent too fast: If we remove equal_nan from the argument, we should definitely include an example for how to test NaNs in the docstring.

@rgommers
Copy link
Member

rgommers commented May 6, 2021

We had a discussion about this today. The most important point of that was: while the PEP 485 definition is preferred overall if we would design this API today, all libraries use the atol + rtol * abs(expected) definition today - so in order to deviate from that definition, we must have a good introduction story to get from one definition to the other in a gradual fashion. And make sure especially NumPy would accept that; then other libraries will follow.

Overall there was agreement that these are important functions to have, and we should add them.

Other points regarding the definition differences:

  • max(abs(actual), abs(expected)) and rtol * abs(expected) are both valid choices; the latter is actually better in case expected is a known reference value. Hence the "symmetry" argument is not a strong one.
  • The use of atol + rtol... vs max(atol, rtol...) is the more important difference. There's no good reason for the +.

tl;dr see if there's a way to introduce the "nicer" behavior.

@pmeier
Copy link
Author

pmeier commented May 7, 2021

we must have a good introduction story to get from one definition to the other in a gradual fashion. And make sure especially NumPy would accept that; then other libraries will follow.

True, and unfortunately I can't offer one apart from "we would be compatible with the standard library" and "it is objectively better".

From my own perspective (and we should definitively hear other opinions here), the only times I fiddled with the default tolerances is when I knew that I need to expect more deviation than normal. In these cases I didn't really care about the actual values as long as they where "low enough" and I've manually checked that it was fine.

I imagine the libraries are using there isclose / allclose functions for there test suite already. If that is the case we could send a PR changing the defintions and see how many tests will break in CI. That could give us a good picture about how much pushback we need to expect when changing the closeness defintion.


  • [rtol * ] max(abs(actual), abs(expected)) and rtol * abs(expected) are both valid choices; the latter is actually better in case expected is a known reference value. Hence the "symmetry" argument is not a strong one.

Trying to look at this without bias, the name "close" does not imply asymmetric behavior for me. It is in the same category as "equal" and not "greater then" or the like. From that perspective the current implementation can be confusing:

import numpy as np

a = 1
b = 10
rtol = 1
atol = 0
np.isclose(a, b, rtol=rtol, atol=atol)  # True
np.isclose(b, a, rtol=rtol, atol=atol)  # False

The only practical difference between the two implementations happens when abs(actual) is significantly larger than abs(expected). I've struggled to come up with a practical example, where a asymmetric behavior is desired and the values can still be considered close in a "human understandable" way. Maybe someone else can come up with one?

@rgommers
Copy link
Member

rgommers commented May 8, 2021

I imagine the libraries are using there isclose / allclose functions for there test suite already. If that is the case we could send a PR changing the defintions and see how many tests will break in CI. That could give us a good picture about how much pushback we need to expect when changing the closeness defintion.

xref numpy/numpy#10161. Changing the existing definition looks like a nonstarter. And I haven't been able to come up with a good introduction strategy. In the linked numpy issue, even the PEP author says he would himself have preferred compatibility rather than do something incompatible for the sake of what is objectively best. "But is it "better" enough to break backward compatibility? I don't think so."

It looks like this has been very extensively litigated. We should stay with the existing definition that NumPy and other libraries have.

@pmeier
Copy link
Author

pmeier commented May 18, 2021

To investigate the impact of the proposed change, I've implemented it for numpy and run the numpy, scipy, and scikit-learn test suite against this patched version.

  • numpy

    Only a single test fails with respect to the numerics (there are < 10 failures for tests of the test function that break due to my PoC implementation):

    python runtests.py \
        -t numpy/random/tests/test_generator_mt19937.py::TestMultivariateHypergeometric::test_typical_cases \
        -- \
        -k "150000-count-8"
    
    E           Not equal to tolerance rtol=0.001, atol=0.005
    E           
    E           Mismatched elements: 1 / 4 (25%)
    E           Max absolute difference: 0.00528
    E           Max relative difference: 0.00228
    E            x: array([1.330967, 0.665147, 2.671947, 3.33194 ])
    E            y: array([1.333333, 0.666667, 2.666667, 3.333333])
    

    Since rtol and atol have a similar magnitude, the addition of both tolerances made this test pass. Individually the maximum absolute difference is greater than atol (0.00528 > 0.005) and the maximum relative difference is greater than rtol (0.00228 > 0.001).

  • scipy

    Internally scipy relies at one point on the asymmetry of np.isclose. I think this is not intended, but I opened scipy.optimize._linprog_simplex._apply_pivot relies on asymmetric closeness definition scipy/scipy#14081 to make sure. After patching this we have 10 failing tests left with 3 of them again related to the PoC implementation. 3 fall same category as the failing numpy test:

    python runtests.py \
        -t scipy.optimize.tests \
        -- \
        -k "test_L6 or test_bug_7237 or test_singleton_row_ub_2"
    

    4 more tests have very strict tolerances which are manually set:

    python runtests.py \
        -t scipy \
        -- \
        -k "test_bug_10124 or test_bisect or test_simple_exact or test_check_finite"
    

    For example:

    E   Not equal to tolerance rtol=1e-08, atol=1e-08
    E   converged to an unexpected solution
    E   Mismatched elements: 1 / 2 (50%)
    E   Max absolute difference: 1.09516519e-07
    E   Max relative difference: 1.09516519e-08
    E    x: array([10., -3.])
    E    y: array([10, -3])
    
  • scikit-learn

    No failing tests.

IMO, this means two things:

  1. The number of tests we would be breaking by going for this change is fairly low if one recalls that each test suite comprises > 10k tests.
  2. Most of test that would fail shouldn't have passed in the first place, since the checks for maximum and relative tolerance would both fail individually and only pass because the sum of the tolerances is checked.

@leofang
Copy link
Contributor

leofang commented May 19, 2021

To investigate the impact of the proposed change, I've implemented it for numpy and run the numpy, scipy, and scikit-learn test suite against this patched version.

Cool, thanks for looking into this @pmeier. Just wanna clarify: your patch used the math.isclose() relation, right?

@pmeier
Copy link
Author

pmeier commented May 19, 2021

Just wanna clarify: your patch used the math.isclose() relation, right?

Yes, here are the details:

I've used numpy-fb586a4 and applied this patch:

diff --git a/numpy/core/numeric.py b/numpy/core/numeric.py
index 8bb37e291..a75eb1959 100644
--- a/numpy/core/numeric.py
+++ b/numpy/core/numeric.py
@@ -2336,7 +2336,9 @@ def isclose(a, b, rtol=1.e-5, atol=1.e-8, equal_nan=False):
     """
     def within_tol(x, y, atol, rtol):
         with errstate(invalid='ignore'):
-            return less_equal(abs(x-y), atol + rtol * abs(y))
+            diff = abs(x - y)
+            tolerance = np.maximum(atol, rtol * np.maximum(abs(x), abs(y)))
+            return less_equal(diff, tolerance)
 
     x = asanyarray(a)
     y = asanyarray(b)

I've run this against the test suites of numpy-fb586a4, scipy-6667d4b (applying the path I outlined in scipy/scipy#14081), and scikit-learn-053d2d1.

@kgryte kgryte added this to the v2022 milestone Oct 4, 2021
@kgryte kgryte added the API extension Adds new functions or objects to the API. label Oct 4, 2021
@kgryte kgryte added this to Stage 0 in Proposals Oct 4, 2021
@rgommers rgommers removed this from the v2022 milestone Nov 28, 2022
@kgryte kgryte changed the title Proposal: support for closeness comparison RFC: add support for closeness comparison Apr 4, 2024
@kgryte kgryte added RFC Request for comments. Feature requests and proposed changes. Needs Discussion Needs further discussion. labels Apr 4, 2024
@kgryte kgryte closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024
@kgryte
Copy link
Contributor

kgryte commented Apr 4, 2024

As this proposal lacks consensus for the best path forward (e.g., (1) standardizing what many consider to be a less-than-optimal closeness definition as implemented by NumPy or (2) moving the ecosystem to a more desirable definition), I'll go ahead and close this issue.

Given that there's general agreement that the NumPy definition is not ideal, I'm personally leery about codifying such behavior in the specification and thus forcing every future array library adopting the standard to adopt. IMO, libraries should be free to innovate here and implement more ideal APIs and behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes.
Projects
Proposals
Stage 0
Development

No branches or pull requests

5 participants