Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix an unintended exception being raised when attempting to compare two unequal Table instances. #15845

Merged
merged 7 commits into from
Jan 11, 2024
57 changes: 41 additions & 16 deletions astropy/table/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from astropy.io.registry import UnifiedReadWriteMethod
from astropy.units import Quantity, QuantityInfo
from astropy.utils import ShapedLikeNDArray, isiterable
from astropy.utils.compat import NUMPY_LT_1_25
from astropy.utils.console import color_print
from astropy.utils.data_info import BaseColumnInfo, DataInfo, MixinInfo
from astropy.utils.decorators import format_doc
Expand Down Expand Up @@ -3683,15 +3684,26 @@
return self._rows_equal(other)

def __ne__(self, other):
return ~self.__eq__(other)
eq = self.__eq__(other)
if isinstance(eq, bool):

Check warning on line 3688 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3687-L3688

Added lines #L3687 - L3688 were not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do return self.__eq__(other) == False, but arguably trying to be too clever...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this would probably be more performant (isinstance is infamously slow). If this function is considered performance-critical in any way I think it's worth considering, otherwise I think it just hurts readability (I know it's hard to resist the call of golfing sometimes !)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, readability is why I felt I was trying to be too clever. Let's stick with what you have!

# bitwise operators on bool values not reliable (e.g. `bool(~True) == True`)
# and are deprecated in Python 3.12
# see https://github.com/python/cpython/pull/103487
return not eq

Check warning on line 3692 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3692

Added line #L3692 was not covered by tests
else:
return ~eq

Check warning on line 3694 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3694

Added line #L3694 was not covered by tests
Comment on lines +3687 to +3694
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixes a secondary buglet that I discovered with the test I added. I made an attempt at separating it into its own PR, but I couldn't find a way to test it without fixing the first bug too. I could however move it to a follow up PR if requested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.


def _rows_equal(self, other):
"""
Row-wise comparison of table with any other object.

This is actual implementation for __eq__.

Returns a 1-D boolean numpy array showing result of row-wise comparison.
Returns a 1-D boolean numpy array showing result of row-wise comparison,
or a bool (False) in cases where comparison isn't possible (uncomparable dtypes
or unbroadcastable shapes). Intended to follow legacy numpy's elementwise
comparison rules.

This is the same as the ``==`` comparison for tables.

Parameters
Expand All @@ -3712,22 +3724,35 @@
if isinstance(other, Table):
other = other.as_array()

if self.has_masked_columns:
if isinstance(other, np.ma.MaskedArray):
result = self.as_array() == other
else:
# If mask is True, then by definition the row doesn't match
# because the other array is not masked.
false_mask = np.zeros(1, dtype=[(n, bool) for n in self.dtype.names])
result = (self.as_array().data == other) & (self.mask == false_mask)
self_is_masked = self.has_masked_columns
other_is_masked = isinstance(other, np.ma.MaskedArray)

Check warning on line 3728 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3727-L3728

Added lines #L3727 - L3728 were not covered by tests

allowed_numpy_exceptions = (

Check warning on line 3730 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3730

Added line #L3730 was not covered by tests
TypeError,
ValueError if not NUMPY_LT_1_25 else DeprecationWarning,
)
# One table is masked and the other is not
if self_is_masked ^ other_is_masked:

Check warning on line 3735 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3735

Added line #L3735 was not covered by tests
# remap variables to a and b where a is masked and b isn't
a, b = (

Check warning on line 3737 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3737

Added line #L3737 was not covered by tests
(self.as_array(), other) if self_is_masked else (other, self.as_array())
)

# If mask is True, then by definition the row doesn't match
# because the other array is not masked.
false_mask = np.zeros(1, dtype=[(n, bool) for n in a.dtype.names])
try:
result = (a.data == b) & (a.mask == false_mask)
except allowed_numpy_exceptions:

Check warning on line 3746 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3743-L3746

Added lines #L3743 - L3746 were not covered by tests
# numpy may complain that structured array are not comparable (TypeError)
# or that operands are not brodcastable (ValueError)
# see https://github.com/astropy/astropy/issues/13421
result = False

Check warning on line 3750 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3750

Added line #L3750 was not covered by tests
else:
if isinstance(other, np.ma.MaskedArray):
# If mask is True, then by definition the row doesn't match
# because the other array is not masked.
false_mask = np.zeros(1, dtype=[(n, bool) for n in other.dtype.names])
result = (self.as_array() == other.data) & (other.mask == false_mask)
else:
try:

Check warning on line 3752 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3752

Added line #L3752 was not covered by tests
result = self.as_array() == other
except allowed_numpy_exceptions:
result = False

Check warning on line 3755 in astropy/table/table.py

View check run for this annotation

Codecov / codecov/patch

astropy/table/table.py#L3754-L3755

Added lines #L3754 - L3755 were not covered by tests

return result

Expand Down
60 changes: 60 additions & 0 deletions astropy/table/tests/test_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -2424,3 +2424,63 @@ def test_mixin_join_regression():
t12 = table.join(t1, t2, keys=("index", "flux1", "flux2"), join_type="outer")

assert len(t12) == 6


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the test would be clearer if it gave the result in the parametrization rather than do a check to see what would be expected. I.e., "t1, t2, eq"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked myself the same question as I wrote the test, but I found that the try/except pattern expressed the documented behaviour more clearly than what could feel like a list of ad-hoc expectations. Honestly it seems arguable both ways. Maybe @taldcroft has stronger opinions that could help forming a decision ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also reasonable. As said, I am absolutely fine with just merging the PR as is! (in another PR, @taldcroft pointed out the old adage of perfect being the enemy of good enough)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I looked at this I had the same idea of "t1, t2, eq" being a little more clear, but then I reminded myself of perfect and good. I think this is quite good. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that @neutrinoceros - I generally agree with @mhvk's sentiment so that is something to keep in mind going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try !

"t1, t2",
[
# different names
(
Table([np.array([1])], names=["a"]),
Table([np.array([1])], names=["b"]),
),
# different data (broadcastable)
(
Table([np.array([])], names=["a"]),
Table([np.array([1])], names=["a"]),
),
# different data (not broadcastable)
(
Table([np.array([1, 2])], names=["a"]),
Table([np.array([1, 2, 3])], names=["a"]),
),
# different names and data (broadcastable)
(
Table([np.array([])], names=["a"]),
Table([np.array([1])], names=["b"]),
),
# different names and data (not broadcastable)
(
Table([np.array([1, 2])], names=["a"]),
Table([np.array([1, 2, 3])], names=["b"]),
),
# different data and array type (broadcastable)
(
Table([np.array([])], names=["a"]),
Table([np.ma.MaskedArray([1])], names=["a"]),
),
# different data and array type (not broadcastable)
(
Table([np.array([1, 2])], names=["a"]),
Table([np.ma.MaskedArray([1, 2, 3])], names=["a"]),
),
],
)
def test_table_comp(t1, t2):
# see https://github.com/astropy/astropy/issues/13421
try:
np.result_type(t1.dtype, t2.dtype)
np.broadcast_shapes((len(t1),), (len(t2),))
except (TypeError, ValueError):
# dtypes are not comparable or arrays can't be broadcasted:
# a simple bool should be returned
assert not t1 == t2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are here what about explicitly checking the return type in all 8 asserts. I think this can be done compactly with this but you should check.

        assert not (cmp := t1 == t2) and isinstance(cmp, bool)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically true, but it is not needed: the expression t1 == t2 raises a DeprecationWarning on current numpy (treated as an error), so the test would already fail if the return type didn't match the expectation.

assert not t2 == t1
assert t1 != t2
assert t2 != t1
else:
# otherwise, the general case is to return a 1D array with dtype=bool
assert not any(t1 == t2)
assert not any(t2 == t1)
assert all(t1 != t2)
assert all(t2 != t1)
1 change: 1 addition & 0 deletions docs/changes/table/15845.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix an unintended exception being raised when attempting to compare two unequal ``Table`` instances.
12 changes: 9 additions & 3 deletions docs/table/access_table.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this count as behavior change that should not be backported? If so, please update milestone and remove backport label. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcroft should answer here but my understanding is that we're just making the intended behaviour actually work and align out-of-dat documentation, so I think it's okay to backport.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pllim - I agree with what @neutrinoceros said, with the conclusion that this should be backported.

Original file line number Diff line number Diff line change
Expand Up @@ -351,13 +351,19 @@ Table Equality

We can check table data equality using two different methods:

- The ``==`` comparison operator. This returns a `True` or `False` for
each row if the *entire row* matches. This is the same as the behavior of
``numpy`` structured arrays.
- The ``==`` comparison operators. In the general case, this returns a 1D array
with ``dtype=bool`` mapping each row to ``True`` if and only if the *entire row*
matches. For incomparable data (different ``dtype`` or unbroacastable lengths),
a boolean ``False`` is returned.
This is in contrast to the behavior of ``numpy`` where trying to compare
structured arrays might raise exceptions.
- Table :meth:`~astropy.table.Table.values_equal` to compare table values
element-wise. This returns a boolean `True` or `False` for each table
*element*, so you get a `~astropy.table.Table` of values.

.. note:: both methods will report equality *after* broadcasting, which
matches ``numpy`` array comparison.

Examples
^^^^^^^^

Expand Down