New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix an unintended exception being raised when attempting to compare two unequal Table
instances.
#15845
BUG: fix an unintended exception being raised when attempting to compare two unequal Table
instances.
#15845
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
👋 Thank you for your draft pull request! Do you know that you can use |
b1995e7
to
452571d
Compare
docs/changes/table/15845.bugfix.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to add a few more words on what behavior was actually fixed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original example in #13421 is a little confusing because it has two different things, the length and the column names. As noted earlier comparing a length=2 and length=3 table will fail in a different way. So maybe this test should only change one thing at a time. Or maybe test the three cases:
- Data same, names different
- Data different, names same
- Data different, names different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done !
452571d
to
8a8f419
Compare
Table
instances.
8a8f419
to
f1a5b29
Compare
eq = self.__eq__(other) | ||
if isinstance(eq, bool): | ||
# bitwise operators on bool values not reliable (e.g. `bool(~True) == True`) | ||
# and are deprecated in Python 3.12 | ||
# see https://github.com/python/cpython/pull/103487 | ||
return not eq | ||
else: | ||
return ~eq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixes a secondary buglet that I discovered with the test I added. I made an attempt at separating it into its own PR, but I couldn't find a way to test it without fixing the first bug too. I could however move it to a follow up PR if requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neutrinoceros - overall this looks nice! I have a number of comments, some nit-picky. 😄
eq = self.__eq__(other) | ||
if isinstance(eq, bool): | ||
# bitwise operators on bool values not reliable (e.g. `bool(~True) == True`) | ||
# and are deprecated in Python 3.12 | ||
# see https://github.com/python/cpython/pull/103487 | ||
return not eq | ||
else: | ||
return ~eq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
astropy/table/table.py
Outdated
try: | ||
self_is_masked = self.has_masked_columns | ||
other_is_masked = isinstance(other, np.ma.MaskedArray) | ||
if (self_is_masked + other_is_masked) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of relying on the int
value of bool
. Basically this is the XOR, which doesn't exist in Python but for bools can be written as below:
if (self_is_masked + other_is_masked) == 1: | |
# One table is masked and the other is not | |
if self_is_masked != other_is_masked: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, that's cleaner. Thank you !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if (self_is_masked + other_is_masked) == 1:
isn't the best code, but interpreting True
as 1
and False
as 0
is safe.
>>> issubclass(bool, int)
True
I don't know if the XOR operator ^
should be preferred over !=
here, but it does exist:
>>> for x in (True, False):
... for y in (True, False):
... print(f"{x = }, {y = }, {x ^ y = }")
...
x = True, y = True, x ^ y = False
x = True, y = False, x ^ y = True
x = False, y = True, x ^ y = True
x = False, y = False, x ^ y = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ! I think I learned about it a while back but probably never encountered it in production. It's indeed a perfect use case, but I'm still hesitant to use it since it's so rarely seen that it might affect readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is "safe", but I've always thought of it as an accidental implementation detail not a feature. As we get more into typing it just feels wrong. There is no argument that the strict rules say that bool + bool => int
, but it feels ugly.
That's funny about the ^
XOR. I wasn't sure so I googled it and got to a stackoverflow page that said it doesn't exist. Go figure, and I learned something new today. Definitely using that is better than !=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't debate further, I'm too happy that I get to use something that feels new 😄
astropy/table/table.py
Outdated
self_is_masked = self.has_masked_columns | ||
other_is_masked = isinstance(other, np.ma.MaskedArray) | ||
if (self_is_masked + other_is_masked) == 1: | ||
# remap variables to a and b where a is masked and b isn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor to remove duplication!
astropy/table/table.py
Outdated
except TypeError: | ||
# numpy may complain that structured array are not comparable | ||
# see https://github.com/astropy/astropy/issues/13421 | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the potential for a ValueError
if the tables are not broadcastable (this was also previously a warning). E.g.
In [48]: np.array([1, 2]) == np.array([1, 2, 3])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[48], line 1
----> 1 np.array([1, 2]) == np.array([1, 2, 3])
ValueError: operands could not be broadcast together with shapes (2,) (3,)
Even though it is a bit of duplication, you should try/except for (TypeError, ValueError)
around only the two lines which actually do the comparison. Otherwise those broad catches could be masking real errors in all the other (non-trivial) code.
If there isn't a test for this case you should add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking. I'm on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
astropy/table/table.py
Outdated
false_mask = np.zeros(1, dtype=[(n, bool) for n in other.dtype.names]) | ||
result = (self.as_array() == other.data) & (other.mask == false_mask) | ||
false_mask = np.zeros(1, dtype=[(n, bool) for n in a.dtype.names]) | ||
return (a.data == b) & (a.mask == false_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert to setting result
and returning that at the end? Overall I prefer this pattern and try to avoid early return if it doesn't require contorted logic or a giant if
block.
This is one thing in astropy that each subpackage has its own style and it helps to maintain that style FWIW. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, no problem
docs/changes/table/15845.bugfix.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original example in #13421 is a little confusing because it has two different things, the length and the column names. As noted earlier comparing a length=2 and length=3 table will fail in a different way. So maybe this test should only change one thing at a time. Or maybe test the three cases:
- Data same, names different
- Data different, names same
- Data different, names different
Another thing is that this is a good opportunity to update the narrative docs: https://docs.astropy.org/en/stable/table/access_table.html#table-equality What is said there is incomplete or not quite true any more: "This is the same as the behavior of numpy structured arrays.". That used to be correct, but in modern numpy comparing structured arrays with different columns or not-broadcastable will give an exception. So for It might also be worth adding a note that both |
f1a5b29
to
e79f465
Compare
e79f465
to
b370b9e
Compare
@taldcroft Actually |
switching to draft until I get old deps CI green again |
I also noticed that and was originally going to say your update to |
…are two unequal ``Table`` instances.
… bool values are deprecated in Python 3.12)
d354fbb
to
5bc36ec
Compare
Ok, this makes our new test significantly more complicated so let's see how this performs. @taldcroft I'd appreciate if you could review the current state before I commit changes to documentation. Thanks ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great now, thanks! Just a couple of minor suggestions.
astropy/table/table.py
Outdated
self_is_masked = self.has_masked_columns | ||
other_is_masked = isinstance(other, np.ma.MaskedArray) | ||
|
||
whitelist = (TypeError, ValueError if not NUMPY_LT_1_25 else DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use whitelist
now, in favor of allowlist
/ blocklist
or other equivalents. Here you might just be more descriptive and call this allowed_numpy_exceptions
. This will probably also have a good side effect of formatting so that it is bit more clear:
allowed_numpy_exceptions = (
TypeError,
ValueError if not NUMPY_LT_1_25 else DeprecationWarning
)
except (TypeError, ValueError): | ||
# dtypes are not comparable or arrays can't be broadcasted: | ||
# a simple bool should be returned | ||
assert not t1 == t2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are here what about explicitly checking the return type in all 8 asserts. I think this can be done compactly with this but you should check.
assert not (cmp := t1 == t2) and isinstance(cmp, bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically true, but it is not needed: the expression t1 == t2
raises a DeprecationWarning
on current numpy (treated as an error), so the test would already fail if the return type didn't match the expectation.
Alright, I think I've taken all your suggestions into account, including docs updates. Also, CI should be green now, so undrafting ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Two suggestions, one you probably should not implement, the other (for the tests), maybe something to think about: I personally like tests that are explicit about what is expected. But absolutely fine without this too, so approving.
@@ -3683,15 +3684,26 @@ def __eq__(self, other): | |||
return self._rows_equal(other) | |||
|
|||
def __ne__(self, other): | |||
return ~self.__eq__(other) | |||
eq = self.__eq__(other) | |||
if isinstance(eq, bool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do return self.__eq__(other) == False
, but arguably trying to be too clever...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this would probably be more performant (isinstance
is infamously slow). If this function is considered performance-critical in any way I think it's worth considering, otherwise I think it just hurts readability (I know it's hard to resist the call of golfing sometimes !)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, readability is why I felt I was trying to be too clever. Let's stick with what you have!
@@ -2424,3 +2424,63 @@ def test_mixin_join_regression(): | |||
t12 = table.join(t1, t2, keys=("index", "flux1", "flux2"), join_type="outer") | |||
|
|||
assert len(t12) == 6 | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the test would be clearer if it gave the result in the parametrization rather than do a check to see what would be expected. I.e., "t1, t2, eq"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked myself the same question as I wrote the test, but I found that the try/except
pattern expressed the documented behaviour more clearly than what could feel like a list of ad-hoc expectations. Honestly it seems arguable both ways. Maybe @taldcroft has stronger opinions that could help forming a decision ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also reasonable. As said, I am absolutely fine with just merging the PR as is! (in another PR, @taldcroft pointed out the old adage of perfect being the enemy of good enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I looked at this I had the same idea of "t1, t2, eq" being a little more clear, but then I reminded myself of perfect and good. I think this is quite good. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that @neutrinoceros - I generally agree with @mhvk's sentiment so that is something to keep in mind going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try !
Ah, one more point: in |
I my opinion this is the most reasonable choice of behaviour. As a user I think I'd be unpleasantly surprised if masked values were taken into account here. |
The thing is that for the table comparison, masked values are explicitly counted as always unequal to unmasked ones, a bit as if they were |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this count as behavior change that should not be backported? If so, please update milestone and remove backport label. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taldcroft should answer here but my understanding is that we're just making the intended behaviour actually work and align out-of-dat documentation, so I think it's okay to backport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pllim - I agree with what @neutrinoceros said, with the conclusion that this should be backported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
…sed when attempting to compare two unequal ``Table`` instances.
…n attempting to compare two unequal ``Table`` instances.
Description
Will fix #13421
The essence of the patch is to wrap existing logic in a
try/except
block as discussed in the issue, but I also tried to reduce code duplication, which resulted in a larger diff.The test is also incomplete at the moment (see embedded comment). I'll come back to it when I'm confident that the current state survives CI.