Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix a crash when retrieving rows from a multiple-index table #15826

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

neutrinoceros
Copy link
Contributor

Description

Fixes #13176
I need to refine the test, it's currently run 3 times with no variations because of how I introduced it in a heavily parametrized test class.
Opening as a draft for now to see if it already survives full CI.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@github-actions github-actions bot added the table label Jan 8, 2024
Copy link

github-actions bot commented Jan 8, 2024

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link

github-actions bot commented Jan 8, 2024

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

Comment on lines +837 to +839
def _get_rows(
self,
item: int | Iterable[int] | slice | tuple[str, slice],
) -> list[np.int64]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously, these annotations are not part of the fix, rather a byproduct of my time spent looking for the bug :)

Comment on lines 448 to 450
if isinstance(main_col, Time):
pytest.skip()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about whether this case should be supported too, and since it doesn't pass the test yet, I just skip it. I can look more into it if needed.

@@ -853,7 +864,7 @@ def _get_rows(self, item):
# item should be a list or ndarray of values
rows = []
for key in item:
p = index.find((key,))
p = index.find(key if isinstance(key, tuple) else (key,))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcroft this part of the fix is yours. Let me know if I should explicitly add you as a co-author of the commit.

@neutrinoceros neutrinoceros marked this pull request as ready for review January 8, 2024 14:36
@pllim pllim added this to the v6.0.1 milestone Jan 8, 2024
@pllim pllim added Bug 💤 backport-v6.0.x on-merge: backport to v6.0.x labels Jan 8, 2024
"""
Retrieve Table rows indexes by value slice.
"""
if isinstance(item, tuple):
if (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that this block is identical to

if isinstance(item, tuple):
key, item = item
else:
key = self.table.primary_key

So I suppose it should be updated as well, and probably even abstracted away in a helper function to avoid repeated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand I don't know if that would fix anything currently broken. In other words, I don't know that iloc needs it.

@neutrinoceros
Copy link
Contributor Author

rebased to resolve a merge conflict

@neutrinoceros
Copy link
Contributor Author

I'm not sure whether RTD failures are related or even reproducible. @pllim can you retrigger please ?

@pllim

This comment was marked as resolved.

@pllim

This comment was marked as resolved.

@pllim
Copy link
Member

pllim commented Feb 12, 2024

Interesting... It fails again but at a new place docs/convolution/index.rst. I guess now I am convinced that the failures are unrelated.

@neutrinoceros
Copy link
Contributor Author

thank you very much !

@pllim
Copy link
Member

pllim commented Feb 12, 2024

RTD green now!

@taldcroft
Copy link
Member

I'm looking at this now...

@taldcroft
Copy link
Member

One quick comment is that we should treat this as a feature enhancement not a bug. The original code specifically disallowed this case, so it was not really a bug. It does make me wonder if the original developer had some reason to be worried about opening up the multi-index case, but so far it looks ok.

I'm going to need to play around with this a little out of your branch so no need to make the bug -> feature change just yet.

@pllim pllim added Enhancement and removed Bug 💤 backport-v6.0.x on-merge: backport to v6.0.x labels Feb 12, 2024
@pllim pllim modified the milestones: v6.0.1, v6.1.0 Feb 12, 2024
@taldcroft
Copy link
Member

@neutrinoceros - here is my playing in the gist below. Interestingly for me it actually works fine with Time but not Quantity. That doesn't quite match the testing code. I don't know what is happening for Quantity but I guess that is your task. 😄
https://gist.github.com/taldcroft/c3d9f56a509a16bf3e402d41ecdf2d6c

I might suggest writing the test more like what I did. For this test we can trust that QTable works the same as Table or a subclass so no need for table_types. I'm not a huge fan of the way the indexing tests are written for this case. Using the inherited table and fixtures just makes it so much harder to look at the test and see if it is doing the right thing. Also the standard column values are not ideal because we want the test to be able to return both a Row (1 match) and a Table (2 matches).

@neutrinoceros
Copy link
Contributor Author

Very interesting, thank you very much !
Indeed, I'll be the first one to admit I don't understand what my test actually does a month after writing it. I'll work out something more useful using your notebook as a base, and I'll try to figure out what's wrong with Quantities. Let's switch this to draft in the mean time.

@neutrinoceros neutrinoceros marked this pull request as draft February 13, 2024 10:35
@neutrinoceros
Copy link
Contributor Author

neutrinoceros commented Feb 13, 2024

I don't know what is happening for Quantity but I guess that is your task. 😄

I dug into it, and found that the root cause is that index.find(...) ultimately resolves to SortedArray.find(...), where we find the following line

if t == len(key_slice) or key_slice[t] != key[i]:

and in the example you wrote using a Quantity column, key_slice ends up being a Column and key[i] is a Quantity, so even if a match is found (it is !), key_slice[t] and key[I] compare unequal !
Now I'm wondering if the bug is that key_slice is a Column instead of a Quantity, but my understanding of the object hierarchy is too limited to answer that right away. Is it obvious to you or should I research it ?
I'll push an incomplete test based on your notebook now.

@neutrinoceros
Copy link
Contributor Author

Actually, even a simple index will break:

import astropy.units as u
from astropy.table import QTable

t = QTable({"a": u.Quantity([0], unit="m")})
t.add_index(["a"])
t.loc[t["a"][0]]

fails with

Traceback (most recent call last):
  File "/Users/clm/dev/astropy-project/coordinated/astropy/t.py", line 6, in <module>
    t.loc[t["a"][0]]
    ~~~~~^^^^^^^^^^^
  File "/Users/clm/dev/astropy-project/coordinated/astropy/astropy/table/index.py", line 888, in __getitem__
    rows = self._get_rows(item)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/clm/dev/astropy-project/coordinated/astropy/astropy/table/index.py", line 867, in _get_rows
    for key in item:
  File "/Users/clm/dev/astropy-project/coordinated/astropy/astropy/units/quantity.py", line 1283, in __iter__
    raise TypeError(
TypeError: 'Quantity' object with a scalar value is not iterable

(same error on main)
but this works

t.loc[t["a"][0].value]

So, it seems to me that either we have a different bug on our hand (in which case it should be reported separately and not block this PR), or the test case needs to be adapted, depending on wether this behaviour is intentional or not. What do you think @taldcroft ?

@neutrinoceros
Copy link
Contributor Author

I opened #16036 to discuss this issue separately. Meanwhile, I'll drop the test case using Quantity here and undraft.

@neutrinoceros neutrinoceros marked this pull request as ready for review February 14, 2024 10:25
@astrofrog astrofrog modified the milestones: v6.1.0, v7.0.0 Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retrieving rows in a table via a multi-column index fails (with possible fix)
4 participants