BUG: fix an unintended exception being raised when attempting to compare two unequal `Table` instances. #15845

neutrinoceros · 2024-01-10T15:35:11Z

Description

Will fix #13421

The essence of the patch is to wrap existing logic in a try/except block as discussed in the issue, but I also tried to reduce code duplication, which resulted in a larger diff.

The test is also incomplete at the moment (see embedded comment). I'll come back to it when I'm confident that the current state survives CI.

By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

github-actions · 2024-01-10T15:35:38Z

github-actions · 2024-01-10T15:35:42Z

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

pllim · 2024-01-10T16:12:12Z

docs/changes/table/15845.bugfix.rst

Might want to add a few more words on what behavior was actually fixed. Thanks!

better now ?

The original example in #13421 is a little confusing because it has two different things, the length and the column names. As noted earlier comparing a length=2 and length=3 table will fail in a different way. So maybe this test should only change one thing at a time. Or maybe test the three cases:

Data same, names different

Data different, names same

Data different, names different

neutrinoceros · 2024-01-11T09:30:44Z

astropy/table/table.py

+        eq = self.__eq__(other)
+        if isinstance(eq, bool):
+            # bitwise operators on bool values not reliable (e.g. `bool(~True) == True`)
+            # and are deprecated in Python 3.12
+            # see https://github.com/python/cpython/pull/103487
+            return not eq
+        else:
+            return ~eq


this fixes a secondary buglet that I discovered with the test I added. I made an attempt at separating it into its own PR, but I couldn't find a way to test it without fixing the first bug too. I could however move it to a follow up PR if requested.

Good catch.

taldcroft

@neutrinoceros - overall this looks nice! I have a number of comments, some nit-picky. 😄

taldcroft · 2024-01-11T11:55:33Z

astropy/table/table.py

+        eq = self.__eq__(other)
+        if isinstance(eq, bool):
+            # bitwise operators on bool values not reliable (e.g. `bool(~True) == True`)
+            # and are deprecated in Python 3.12
+            # see https://github.com/python/cpython/pull/103487
+            return not eq
+        else:
+            return ~eq


Good catch.

taldcroft · 2024-01-11T11:59:23Z

astropy/table/table.py

+        try:
+            self_is_masked = self.has_masked_columns
+            other_is_masked = isinstance(other, np.ma.MaskedArray)
+            if (self_is_masked + other_is_masked) == 1:


I'm not a fan of relying on the int value of bool. Basically this is the XOR, which doesn't exist in Python but for bools can be written as below:

Suggested change

if (self_is_masked + other_is_masked) == 1:

# One table is masked and the other is not

if self_is_masked != other_is_masked:

agreed, that's cleaner. Thank you !

I agree that if (self_is_masked + other_is_masked) == 1: isn't the best code, but interpreting True as 1 and False as 0 is safe.

>>> issubclass(bool, int) True

I don't know if the XOR operator ^ should be preferred over != here, but it does exist:

>>> for x in (True, False): ... for y in (True, False): ... print(f"{x = }, {y = }, {x ^ y = }") ... x = True, y = True, x ^ y = False x = True, y = False, x ^ y = True x = False, y = True, x ^ y = True x = False, y = False, x ^ y = False

oh ! I think I learned about it a while back but probably never encountered it in production. It's indeed a perfect use case, but I'm still hesitant to use it since it's so rarely seen that it might affect readability.

I agree that it is "safe", but I've always thought of it as an accidental implementation detail not a feature. As we get more into typing it just feels wrong. There is no argument that the strict rules say that bool + bool => int, but it feels ugly.

That's funny about the ^ XOR. I wasn't sure so I googled it and got to a stackoverflow page that said it doesn't exist. Go figure, and I learned something new today. Definitely using that is better than !=.

I won't debate further, I'm too happy that I get to use something that feels new 😄

taldcroft · 2024-01-11T11:59:44Z

astropy/table/table.py

+            self_is_masked = self.has_masked_columns
+            other_is_masked = isinstance(other, np.ma.MaskedArray)
+            if (self_is_masked + other_is_masked) == 1:
+                # remap variables to a and b where a is masked and b isn't


Nice refactor to remove duplication!

taldcroft · 2024-01-11T12:05:16Z

astropy/table/table.py

+        except TypeError:
+            # numpy may complain that structured array are not comparable
+            # see https://github.com/astropy/astropy/issues/13421
+            return False


There is also the potential for a ValueError if the tables are not broadcastable (this was also previously a warning). E.g.

In [48]: np.array([1, 2]) == np.array([1, 2, 3]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[48], line 1 ----> 1 np.array([1, 2]) == np.array([1, 2, 3]) ValueError: operands could not be broadcast together with shapes (2,) (3,)

Even though it is a bit of duplication, you should try/except for (TypeError, ValueError) around only the two lines which actually do the comparison. Otherwise those broad catches could be masking real errors in all the other (non-trivial) code.

If there isn't a test for this case you should add it.

Good thinking. I'm on it

taldcroft · 2024-01-11T12:12:11Z

astropy/table/table.py

-                false_mask = np.zeros(1, dtype=[(n, bool) for n in other.dtype.names])
-                result = (self.as_array() == other.data) & (other.mask == false_mask)
+                false_mask = np.zeros(1, dtype=[(n, bool) for n in a.dtype.names])
+                return (a.data == b) & (a.mask == false_mask)


Can you revert to setting result and returning that at the end? Overall I prefer this pattern and try to avoid early return if it doesn't require contorted logic or a giant if block.

This is one thing in astropy that each subpackage has its own style and it helps to maintain that style FWIW. 😄

sure, no problem

taldcroft · 2024-01-11T12:17:40Z

docs/changes/table/15845.bugfix.rst

The original example in #13421 is a little confusing because it has two different things, the length and the column names. As noted earlier comparing a length=2 and length=3 table will fail in a different way. So maybe this test should only change one thing at a time. Or maybe test the three cases:

Data same, names different

Data different, names same

Data different, names different

taldcroft · 2024-01-11T12:29:08Z

Another thing is that this is a good opportunity to update the narrative docs: https://docs.astropy.org/en/stable/table/access_table.html#table-equality

What is said there is incomplete or not quite true any more: "This is the same as the behavior of numpy structured arrays.". That used to be correct, but in modern numpy comparing structured arrays with different columns or not-broadcastable will give an exception. So for == it should be noted that it returns False if the arrays have different columns or are not broadcastable, unlike comparing structured arrays.

It might also be worth adding a note that both == and .values_equal() report equality after broadcasting. Honestly I find this a bit confusing so I might not be alone. It matches numpy so maybe it makes sense, but a user might expect that if t1 has 100 rows and t2 has 1 rows that t1 == t2 would be False. Indeed if t2 has 2 rows then t1 == t2 is False.

neutrinoceros · 2024-01-11T13:02:39Z

So for == it should be noted that it returns False if the arrays have different columns or are not broadcastable, unlike comparing structured arrays.

@taldcroft Actually Table._rows_equal's docstring explicitly states that it "Returns a 1-D boolean numpy array", so I used np.array([False]) as the default return value instead of False (this was also necessary to avoid conditional branches in the now generalised test). Let me know if I should change it. I'll update narrative docs depending on your reply :)

neutrinoceros · 2024-01-11T14:07:18Z

switching to draft until I get old deps CI green again

taldcroft · 2024-01-11T14:08:06Z

So for == it should be noted that it returns False if the arrays have different columns or are not broadcastable, unlike comparing structured arrays.

@taldcroft Actually Table._rows_equal's docstring explicitly states that it "Returns a 1-D boolean numpy array", so I used np.array([False]) as the default return value instead of False (this was also necessary to avoid conditional branches in the now generalised test). Let me know if I should change it. I'll update narrative docs depending on your reply :)

I also noticed that and was originally going to say your update to __ne__ was not necessary if there can never be a scalar bool. But in fact the == has always returned a scalar bool in certain cases, following the old behavior of numpy. So I think the answer is to fix the docstrings and documentation to match the existing legacy behavior. As we saw in #13421 this behavior is expected by users who mostly don't read the docs 😄.

…are two unequal ``Table`` instances.

… bool values are deprecated in Python 3.12)

neutrinoceros · 2024-01-11T14:47:46Z

Ok, this makes our new test significantly more complicated so let's see how this performs. @taldcroft I'd appreciate if you could review the current state before I commit changes to documentation. Thanks !

taldcroft

This is looking great now, thanks! Just a couple of minor suggestions.

taldcroft · 2024-01-11T15:10:29Z

astropy/table/table.py

+        self_is_masked = self.has_masked_columns
+        other_is_masked = isinstance(other, np.ma.MaskedArray)
+
+        whitelist = (TypeError, ValueError if not NUMPY_LT_1_25 else DeprecationWarning)


We don't use whitelist now, in favor of allowlist / blocklist or other equivalents. Here you might just be more descriptive and call this allowed_numpy_exceptions. This will probably also have a good side effect of formatting so that it is bit more clear:

allowed_numpy_exceptions = ( TypeError, ValueError if not NUMPY_LT_1_25 else DeprecationWarning )

taldcroft · 2024-01-11T15:10:35Z

astropy/table/tests/test_operations.py

+    except (TypeError, ValueError):
+        # dtypes are not comparable or arrays can't be broadcasted:
+        # a simple bool should be returned
+        assert not t1 == t2


While we are here what about explicitly checking the return type in all 8 asserts. I think this can be done compactly with this but you should check.

assert not (cmp := t1 == t2) and isinstance(cmp, bool)

technically true, but it is not needed: the expression t1 == t2 raises a DeprecationWarning on current numpy (treated as an error), so the test would already fail if the return type didn't match the expectation.

neutrinoceros · 2024-01-11T15:28:55Z

Alright, I think I've taken all your suggestions into account, including docs updates. Also, CI should be green now, so undrafting !

mhvk

Looks good! Two suggestions, one you probably should not implement, the other (for the tests), maybe something to think about: I personally like tests that are explicit about what is expected. But absolutely fine without this too, so approving.

mhvk · 2024-01-11T15:38:10Z

astropy/table/table.py

@@ -3683,15 +3684,26 @@ def __eq__(self, other):
        return self._rows_equal(other)

    def __ne__(self, other):
-        return ~self.__eq__(other)
+        eq = self.__eq__(other)
+        if isinstance(eq, bool):


Could do return self.__eq__(other) == False, but arguably trying to be too clever...

Actually this would probably be more performant (isinstance is infamously slow). If this function is considered performance-critical in any way I think it's worth considering, otherwise I think it just hurts readability (I know it's hard to resist the call of golfing sometimes !)

Indeed, readability is why I felt I was trying to be too clever. Let's stick with what you have!

mhvk · 2024-01-11T15:48:47Z

astropy/table/tests/test_operations.py

@@ -2424,3 +2424,63 @@ def test_mixin_join_regression():
    t12 = table.join(t1, t2, keys=("index", "flux1", "flux2"), join_type="outer")

    assert len(t12) == 6
+
+
+@pytest.mark.parametrize(


I wonder if the test would be clearer if it gave the result in the parametrization rather than do a check to see what would be expected. I.e., "t1, t2, eq"

I asked myself the same question as I wrote the test, but I found that the try/except pattern expressed the documented behaviour more clearly than what could feel like a list of ad-hoc expectations. Honestly it seems arguable both ways. Maybe @taldcroft has stronger opinions that could help forming a decision ?

Also reasonable. As said, I am absolutely fine with just merging the PR as is! (in another PR, @taldcroft pointed out the old adage of perfect being the enemy of good enough)

When I looked at this I had the same idea of "t1, t2, eq" being a little more clear, but then I reminded myself of perfect and good. I think this is quite good. 😄

Having said that @neutrinoceros - I generally agree with @mhvk's sentiment so that is something to keep in mind going forward.

mhvk · 2024-01-11T15:56:23Z

Ah, one more point: in MaskedArray, equality tests for structured arrays explicitly ignore masked entries. This is similar to (np.array([0, 1, 2]) == np.ma.MaskedArray([0, 1, 1], mask=[False, False, True])).all() being True. For this PR, I think it makes sense to just follow what was there and fix the actual bug, but it may be worth thinking a bit whether this is actually the behaviour that is wanted.

neutrinoceros · 2024-01-11T16:01:38Z

I my opinion this is the most reasonable choice of behaviour. As a user I think I'd be unpleasantly surprised if masked values were taken into account here.

mhvk · 2024-01-11T16:06:48Z

As a user I think I'd be unpleasantly surprised if masked values were taken into account here.

The thing is that for the table comparison, masked values are explicitly counted as always unequal to unmasked ones, a bit as if they were NaN. Which is a reasonable choice, but perhaps one worth thinking more about (if so, in a separate issue!).

pllim · 2024-01-11T16:19:33Z

docs/table/access_table.rst

Does this count as behavior change that should not be backported? If so, please update milestone and remove backport label. Thanks!

@taldcroft should answer here but my understanding is that we're just making the intended behaviour actually work and align out-of-dat documentation, so I think it's okay to backport.

@pllim - I agree with what @neutrinoceros said, with the conclusion that this should be backported.

taldcroft

Looks good to me!

…sed when attempting to compare two unequal ``Table`` instances.

…n attempting to compare two unequal ``Table`` instances.

github-actions bot added the table label Jan 10, 2024

neutrinoceros changed the title ~~TST: add a regression test for bug 13421~~ BUG: fix comparing unequal tables Jan 10, 2024

neutrinoceros force-pushed the table/bug/table_comparison branch from b1995e7 to 452571d Compare January 10, 2024 15:36

pllim reviewed Jan 10, 2024

View reviewed changes

pllim added this to the v6.0.1 milestone Jan 10, 2024

pllim added Bug 💤 backport-v6.0.x on-merge: backport to v6.0.x labels Jan 10, 2024

neutrinoceros force-pushed the table/bug/table_comparison branch from 452571d to 8a8f419 Compare January 11, 2024 09:09

neutrinoceros changed the title ~~BUG: fix comparing unequal tables~~ BUG: fix an unintended exception being raised when attempting to compare two unequal Table instances. Jan 11, 2024

neutrinoceros force-pushed the table/bug/table_comparison branch from 8a8f419 to f1a5b29 Compare January 11, 2024 09:28

neutrinoceros commented Jan 11, 2024

View reviewed changes

neutrinoceros marked this pull request as ready for review January 11, 2024 09:31

neutrinoceros requested a review from taldcroft as a code owner January 11, 2024 09:31

taldcroft requested changes Jan 11, 2024

View reviewed changes

neutrinoceros force-pushed the table/bug/table_comparison branch from f1a5b29 to e79f465 Compare January 11, 2024 12:54

TST: add a regression test for bug 13421

966b373

neutrinoceros force-pushed the table/bug/table_comparison branch from e79f465 to b370b9e Compare January 11, 2024 12:56

neutrinoceros marked this pull request as draft January 11, 2024 14:07

neutrinoceros added 3 commits January 11, 2024 15:45

BUG: fix an unintended exception being raised when attempting to comp…

d016355

…are two unequal ``Table`` instances.

BUG: avoid a deprecation warning in Table.__ne__ (bitwise operator on…

367d63c

… bool values are deprecated in Python 3.12)

return a simple bool for uncomparable data

5bc36ec

neutrinoceros force-pushed the table/bug/table_comparison branch from d354fbb to 5bc36ec Compare January 11, 2024 14:45

xor

7bf55f4

taldcroft reviewed Jan 11, 2024

View reviewed changes

neutrinoceros added 2 commits January 11, 2024 16:16

rename whitelist->allowed_numpy_exceptions

5c7e471

DOC: update narrative docs

f4ff2f7

neutrinoceros marked this pull request as ready for review January 11, 2024 15:29

mhvk approved these changes Jan 11, 2024

View reviewed changes

pllim reviewed Jan 11, 2024

View reviewed changes

taldcroft approved these changes Jan 11, 2024

View reviewed changes

taldcroft merged commit c714978 into astropy:main Jan 11, 2024
25 of 26 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request Jan 11, 2024

Backport PR astropy#15845: BUG: fix an unintended exception being rai…

8d1ac7e

…sed when attempting to compare two unequal ``Table`` instances.

meeseeksmachine mentioned this pull request Jan 11, 2024

Backport PR #15845 on branch v6.0.x (BUG: fix an unintended exception being raised when attempting to compare two unequal Table instances.) #15869

Closed

neutrinoceros deleted the table/bug/table_comparison branch January 11, 2024 20:57

saimn pushed a commit that referenced this pull request Mar 25, 2024

Backport PR #15845: BUG: fix an unintended exception being raised whe…

e122e88

…n attempting to compare two unequal ``Table`` instances.

neutrinoceros mentioned this pull request Mar 25, 2024

Finalizing changelog for v6.0.1 #16230

Merged

1 task

	if (self_is_masked + other_is_masked) == 1:
	# One table is masked and the other is not
	if self_is_masked != other_is_masked:

BUG: fix an unintended exception being raised when attempting to compare two unequal Table instances. #15845

BUG: fix an unintended exception being raised when attempting to compare two unequal Table instances. #15845

Conversation

neutrinoceros commented Jan 10, 2024

Description

github-actions bot commented Jan 10, 2024 • edited by taldcroft

github-actions bot commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taldcroft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taldcroft commented Jan 11, 2024

neutrinoceros commented Jan 11, 2024

neutrinoceros commented Jan 11, 2024

taldcroft commented Jan 11, 2024

neutrinoceros commented Jan 11, 2024

taldcroft left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neutrinoceros commented Jan 11, 2024

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Jan 11, 2024

neutrinoceros commented Jan 11, 2024

mhvk commented Jan 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taldcroft left a comment

Choose a reason for hiding this comment

BUG: fix an unintended exception being raised when attempting to compare two unequal `Table` instances. #15845

BUG: fix an unintended exception being raised when attempting to compare two unequal `Table` instances. #15845

github-actions bot commented Jan 10, 2024 •

edited by taldcroft

taldcroft left a comment •

edited