-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix a crash when calling Column.pprint
on a scalar column
#15749
base: main
Are you sure you want to change the base?
BUG: fix a crash when calling Column.pprint
on a scalar column
#15749
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
Column.pprint
on a scalar column
astropy/table/_column_mixins.pyx
Outdated
@@ -52,6 +52,9 @@ ctypedef object (*item_getter)(object, object) | |||
|
|||
|
|||
cdef inline object base_getitem(object self, object item, item_getter getitem): | |||
if (<np.ndarray>self).ndim == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change may be the one causing problems... Though see larger comment on what would be the right solution in the original issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like my latest attempt actually works (and doesn't break anything) !
astropy/table/tests/test_pprint.py
Outdated
def test_pprint_scalar(self, scalar, show_dtype): | ||
# see https://github.com/astropy/astropy/issues/12584 | ||
c = Column(scalar) | ||
c.pprint(show_dtype=show_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I am deliberately not checking what's actually printed because I think it may be incorrect at the moment for reasons that go beyond the scope of the PR.
Specifically I'm looking at
astropy/astropy/table/column.py
Lines 1279 to 1281 in 7a7b6ba
# If scalar then just convert to correct numpy type and use numpy repr | |
if self.ndim == 0: | |
return repr(self.item()) |
data
is a Quantity
!). I'll open a separate issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be valuable to see the scalar aspect of the printing, even if the unit is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @nstarman here, we should test the actual output, especially now that I think we have a more permanent solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly #15754 should be closed as "not a bug" and the behaviour, as of the current state of this branch, should be made the baseline of the test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for .pprint
, we can call the current behaviour fine. But #15754 is about repr
, which I'm not sure about. Since it is not addressed here, I think we should just leave it open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on testing the output. In fact this whole issue has been focused on pprint
but really the problem is in pformat
. I would not worry about show_dtype
here and just have the two cases of 1
and 1.0 eV
. (There is no need here to test formatting of the Quantity itself, so it helps in testing to pick values that have an exact floating point repr).
In a lot of tests I use a pattern like:
c = Column(scalar, name="a")
out = c.pformat()
exp = [' a ', '---', ' 1']
assert out == exp
This way if there are diffs then running pytest with -vv
will show what's going on. You can parametrize the scalar
and exp
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done !
This comment was marked as resolved.
This comment was marked as resolved.
f1367da
to
e2eb735
Compare
@pllim actually I think it's good now. Hopefully this is simple enough ! |
@mhvk can I ask you to review this again ? |
@neutrinoceros - can you do some quick performance checking to see how much speed impact there is to |
@taldcroft Here's a very quick benchmark #benchmark.py
from time import monotonic_ns
from astropy.table import Column
import numpy as np
SMALL_C = Column(np.random.random_sample(16))
BIG_C = Column(np.random.random_sample(2048))
NRUNS = 1_000_000
SMALL_random_points = np.random.randint(0, len(SMALL_C) - 1, NRUNS)
BIG_random_points = np.random.randint(0, len(BIG_C) - 1, NRUNS)
for data, points, label in [
(SMALL_C, SMALL_random_points, "small column"),
(BIG_C, BIG_random_points, "big column"),
]:
tstart = monotonic_ns()
for p in points:
data[p]
tstop = monotonic_ns()
res_ns = (tstop - tstart) / NRUNS
print(
f"Accessed one item from {label} in {res_ns:.2f} ns (averaged over {NRUNS:g} runs)"
) Main
This branch
So I see about 4% overhead on |
Copying from #12584 (comment), I think we first need to decide how a zero-length column would actually be pretty-printed. Note that the
I do think |
@neutrinoceros - is there any solution that does not require changing that Cython mixin code? I haven't dug into this, but what is driving that exactly? Can't we just do something trivial in I'm a little hesitant to sacrifice 3-4% performance for this bugfix given the low likelihood of normal users hitting it. |
@taldcroft it's very hard to inspect what happening exactly before the cython function is called because the callsite is actually visited many times before it crashes and it happens in repr-related code, making it difficult/impossible to inspect variables in a debugger REPL, so patching the Cython function is the only way I was able to find so far. |
@neutrinoceros - below is a patch to your PR that passes the new tests. This reverts the Cython update and deals with a scalar column only in the pprint code.
It also occurred to me that the Cython update would have the undesired effect of ignoring the index in getitem. I didn't try, but I think that code would mean that Here is the diff:
|
e2eb735
to
5ef71ba
Compare
I confirm that's exactly what happens, nice catch ! I've added this check to the test to make sure this behaviour never gets checked in. I took your patch in and rebase the whole branch too; thanks a lot for your help |
5ef71ba
to
ae5e2ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good - just a few nits.
astropy/table/tests/test_pprint.py
Outdated
def test_pprint_scalar(self, scalar, show_dtype): | ||
# see https://github.com/astropy/astropy/issues/12584 | ||
c = Column(scalar) | ||
c.pprint(show_dtype=show_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @nstarman here, we should test the actual output, especially now that I think we have a more permanent solution.
5236612
to
c9f1ca6
Compare
c9f1ca6
to
62ad3e2
Compare
rebased to refresh CI. ping @hamogu and @taldcroft for second review |
62ad3e2
to
c501c93
Compare
c501c93
to
46c2e30
Compare
Description
Fixes #12584
This is a proof of concept fix because it breaks another regression test, however I haven't been able to find a general solution that passes all tests yet, so instead of going in circles I'd rather open this to early feedback.
ping @hamogu @mhvk @taldcroft and @nstarman