BUG: fix a crash when calling `Column.pprint` on a scalar column #15749

neutrinoceros · 2023-12-15T17:17:04Z

Description

This is a proof of concept fix because it breaks another regression test, however I haven't been able to find a general solution that passes all tests yet, so instead of going in circles I'd rather open this to early feedback.

ping @hamogu @mhvk @taldcroft and @nstarman

By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

github-actions · 2023-12-15T17:17:33Z

mhvk · 2023-12-15T17:40:15Z

astropy/table/_column_mixins.pyx

@@ -52,6 +52,9 @@ ctypedef object (*item_getter)(object, object)


 cdef inline object base_getitem(object self, object item, item_getter getitem):
+    if (<np.ndarray>self).ndim == 0:


This change may be the one causing problems... Though see larger comment on what would be the right solution in the original issue.

it looks like my latest attempt actually works (and doesn't break anything) !

neutrinoceros · 2023-12-15T20:07:25Z

astropy/table/tests/test_pprint.py

+    def test_pprint_scalar(self, scalar, show_dtype):
+        # see https://github.com/astropy/astropy/issues/12584
+        c = Column(scalar)
+        c.pprint(show_dtype=show_dtype)


Note that I am deliberately not checking what's actually printed because I think it may be incorrect at the moment for reasons that go beyond the scope of the PR.
Specifically I'm looking at

astropy/astropy/table/column.py

Lines 1279 to 1281 in 7a7b6ba

# If scalar then just convert to correct numpy type and use numpy repr

if self.ndim == 0:

return repr(self.item())

, which effectively forces scalar Columns to print like pure numerical scalars (leaving the unit out if data is a Quantity !). I'll open a separate issue for this.

I think it would be valuable to see the scalar aspect of the printing, even if the unit is wrong.

I'm with @nstarman here, we should test the actual output, especially now that I think we have a more permanent solution.

If I understand correctly #15754 should be closed as "not a bug" and the behaviour, as of the current state of this branch, should be made the baseline of the test ?

I think for .pprint , we can call the current behaviour fine. But #15754 is about repr, which I'm not sure about. Since it is not addressed here, I think we should just leave it open.

Agreed on testing the output. In fact this whole issue has been focused on pprint but really the problem is in pformat. I would not worry about show_dtype here and just have the two cases of 1 and 1.0 eV. (There is no need here to test formatting of the Quantity itself, so it helps in testing to pick values that have an exact floating point repr).

In a lot of tests I use a pattern like:

c = Column(scalar, name="a") out = c.pformat() exp = [' a ', '---', ' 1'] assert out == exp

This way if there are diffs then running pytest with -vv will show what's going on. You can parametrize the scalar and exp values.

neutrinoceros · 2023-12-15T20:33:04Z

@pllim actually I think it's good now. Hopefully this is simple enough !

neutrinoceros · 2023-12-18T08:45:59Z

@mhvk can I ask you to review this again ?

taldcroft · 2023-12-18T14:15:45Z

@neutrinoceros - can you do some quick performance checking to see how much speed impact there is to Column getitem? I would not want to slow down astropy for this fix.

neutrinoceros · 2023-12-18T15:35:54Z

@taldcroft Here's a very quick benchmark

#benchmark.py
from time import monotonic_ns
from astropy.table import Column
import numpy as np


SMALL_C = Column(np.random.random_sample(16))
BIG_C = Column(np.random.random_sample(2048))
NRUNS = 1_000_000

SMALL_random_points = np.random.randint(0, len(SMALL_C) - 1, NRUNS)
BIG_random_points = np.random.randint(0, len(BIG_C) - 1, NRUNS)
for data, points, label in [
    (SMALL_C, SMALL_random_points, "small column"),
    (BIG_C, BIG_random_points, "big column"),
]:
    tstart = monotonic_ns()
    for p in points:
        data[p]
    tstop = monotonic_ns()
    res_ns = (tstop - tstart) / NRUNS
    print(
        f"Accessed one item from {label} in {res_ns:.2f} ns (averaged over {NRUNS:g} runs)"
    )

Main

Accessed one item from small column in 145.71 ns (averaged over 1e+06 runs)
Accessed one item from big column in 135.75 ns (averaged over 1e+06 runs)

This branch

Accessed one item from small column in 149.53 ns (averaged over 1e+06 runs)
Accessed one item from big column in 139.42 ns (averaged over 1e+06 runs)

So I see about 4% overhead on Column.__getitem__ from this branch. Is this acceptable to you ?

astropy/table/pprint.py

mhvk · 2023-12-18T15:59:57Z

Copying from #12584 (comment), I think we first need to decide how a zero-length column would actually be pretty-printed. Note that the repr is consciously different:

In [11]: Column(1)
Out[11]: 1

In [12]: Column([1])
Out[12]: 
<Column dtype='int64' length=1>
1

I do think pprint() should not fail, but it may be OK to just typeset the number with the format function without having the column name, etc. I.e., I'd advocate some form of if self.ndim == 0; return <something-simple>.

neutrinoceros · 2023-12-18T16:29:08Z

@mhvk if we want to just go with your suggestion (just typesetting the number), then this patch is sufficient and we can close #15754 as "not a bug". In that case I'll just need to complete my test

taldcroft · 2023-12-18T16:36:43Z

@neutrinoceros - is there any solution that does not require changing that Cython mixin code? I haven't dug into this, but what is driving that exactly? Can't we just do something trivial in pprint() to catch this like what @mhvk said if self.ndim == 0; return <something-simple>?

I'm a little hesitant to sacrifice 3-4% performance for this bugfix given the low likelihood of normal users hitting it.

neutrinoceros · 2023-12-20T15:05:09Z

@taldcroft it's very hard to inspect what happening exactly before the cython function is called because the callsite is actually visited many times before it crashes and it happens in repr-related code, making it difficult/impossible to inspect variables in a debugger REPL, so patching the Cython function is the only way I was able to find so far.
Is it possible that my benchmark isn't representative of actual user code that may be impacted ? In other words, do we know of an actual performance-critical use case ?

taldcroft · 2023-12-22T11:12:51Z

@neutrinoceros - below is a patch to your PR that passes the new tests. This reverts the Cython update and deals with a scalar column only in the pprint code.

In [2]: c = Column(0, name="scalar")

In [3]: c.pprint()  # Not obvious it is scalar, but mostly we just want to prevent a crash
scalar
------
     0

It also occurred to me that the Cython update would have the undesired effect of ignoring the index in getitem. I didn't try, but I think that code would mean that c[250] == 0 for the above column would be True instead of raising IndexError.

Here is the diff:

diff --git a/astropy/table/_column_mixins.pyx b/astropy/table/_column_mixins.pyx
index bdd075528b..5ab4fe66d3 100644
--- a/astropy/table/_column_mixins.pyx
+++ b/astropy/table/_column_mixins.pyx
@@ -52,9 +52,6 @@ ctypedef object (*item_getter)(object, object)
 
 
 cdef inline object base_getitem(object self, object item, item_getter getitem):
-    if (<np.ndarray>self).ndim == 0:
-        return self.data
-
     if (<np.ndarray>self).ndim > 1 and isinstance(item, INTEGER_TYPES):
         return self.data[item]
 
diff --git a/astropy/table/pprint.py b/astropy/table/pprint.py
index 25d990ec0c..e78f099d35 100644
--- a/astropy/table/pprint.py
+++ b/astropy/table/pprint.py
@@ -424,6 +424,7 @@ class TableFormatter:
         """
         max_lines, _ = self._get_pprint_size(max_lines, -1)
         dtype = getattr(col, "dtype", None)
+        is_onedim = getattr(col, "ndim", 1) == 1
         multidims = getattr(col, "shape", [0])[1:]
         if multidims:
             multidim0 = tuple(0 for n in multidims)
@@ -524,8 +525,11 @@ class TableFormatter:
                     left = format_func(col_format, col[(idx,) + multidim0])
                     right = format_func(col_format, col[(idx,) + multidim1])
                     return f"{left} .. {right}"
-            else:
+            elif is_onedim:
                 return format_func(col_format, col[idx])
+            else:
+                # Scalar column
+                return format_func(col_format, col)
 
         # Add formatted values if within bounds allowed by max_lines
         for idx in indices:

neutrinoceros · 2023-12-22T13:35:17Z

I didn't try, but I think that code would mean that c[250] == 0 for the above column would be True instead of raising IndexError.

I confirm that's exactly what happens, nice catch ! I've added this check to the test to make sure this behaviour never gets checked in. I took your patch in and rebase the whole branch too; thanks a lot for your help

mhvk

This looks good - just a few nits.

astropy/table/pprint.py

mhvk · 2023-12-22T18:53:40Z

astropy/table/tests/test_pprint.py

+    def test_pprint_scalar(self, scalar, show_dtype):
+        # see https://github.com/astropy/astropy/issues/12584
+        c = Column(scalar)
+        c.pprint(show_dtype=show_dtype)


I'm with @nstarman here, we should test the actual output, especially now that I think we have a more permanent solution.

neutrinoceros · 2024-02-12T10:05:47Z

rebased to refresh CI. ping @hamogu and @taldcroft for second review

neutrinoceros requested a review from taldcroft as a code owner December 15, 2023 17:17

neutrinoceros marked this pull request as draft December 15, 2023 17:17

github-actions bot added table visualization labels Dec 15, 2023

neutrinoceros changed the title ~~TST: add a regression test for bug 15736~~ BUG: fix a crash when calling Column.pprint on a scalar column Dec 15, 2023

neutrinoceros mentioned this pull request Dec 15, 2023

Column.pprint fails for scalars #12584

Open

mhvk reviewed Dec 15, 2023

View reviewed changes

neutrinoceros commented Dec 15, 2023

View reviewed changes

pllim added Bug and removed visualization labels Dec 15, 2023

pllim added this to the v6.1.0 milestone Dec 15, 2023

neutrinoceros mentioned this pull request Dec 15, 2023

BUG: unit is not included in repr of a scalar Column #15754

Open

This comment was marked as resolved.

Sign in to view

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch from f1367da to e2eb735 Compare December 15, 2023 20:32

neutrinoceros marked this pull request as ready for review December 15, 2023 20:33

pllim modified the milestones: v6.1.0, v6.0.1 Dec 15, 2023

pllim added the 💤 backport-v6.0.x on-merge: backport to v6.0.x label Dec 15, 2023

nstarman reviewed Dec 18, 2023

View reviewed changes

astropy/table/pprint.py Show resolved Hide resolved

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch from e2eb735 to 5ef71ba Compare December 22, 2023 13:34

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch from 5ef71ba to ae5e2ae Compare December 22, 2023 15:12

mhvk reviewed Dec 22, 2023

View reviewed changes

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch 2 times, most recently from 5236612 to c9f1ca6 Compare December 26, 2023 10:37

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch from c9f1ca6 to 62ad3e2 Compare February 12, 2024 10:05

saimn modified the milestones: v6.0.1, v6.0.2 Mar 25, 2024

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch from 62ad3e2 to c501c93 Compare April 2, 2024 09:23

astrofrog modified the milestones: v6.0.2, v6.1.1 Apr 4, 2024

pllim added backport-v6.1.x on-merge: backport to v6.1.x and removed 💤 backport-v6.0.x on-merge: backport to v6.0.x labels May 6, 2024

BUG: fix a crash when calling Column.pprint on a scalar column

46c2e30

neutrinoceros force-pushed the table/bug/pprint_scalar_12584 branch from c501c93 to 46c2e30 Compare May 16, 2024 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix a crash when calling `Column.pprint` on a scalar column #15749

BUG: fix a crash when calling `Column.pprint` on a scalar column #15749

neutrinoceros commented Dec 15, 2023

github-actions bot commented Dec 15, 2023

mhvk Dec 15, 2023

neutrinoceros Dec 15, 2023 •

edited

neutrinoceros Dec 15, 2023 •

edited

nstarman Dec 18, 2023

mhvk Dec 22, 2023

neutrinoceros Dec 24, 2023

mhvk Dec 24, 2023

taldcroft Dec 26, 2023

neutrinoceros Dec 26, 2023

This comment was marked as resolved.

neutrinoceros commented Dec 15, 2023

neutrinoceros commented Dec 18, 2023

taldcroft commented Dec 18, 2023

neutrinoceros commented Dec 18, 2023

mhvk commented Dec 18, 2023

neutrinoceros commented Dec 18, 2023

taldcroft commented Dec 18, 2023

neutrinoceros commented Dec 20, 2023

taldcroft commented Dec 22, 2023

neutrinoceros commented Dec 22, 2023

mhvk left a comment

mhvk Dec 22, 2023

neutrinoceros commented Feb 12, 2024

		@@ -52,6 +52,9 @@ ctypedef object (*item_getter)(object, object)


		cdef inline object base_getitem(object self, object item, item_getter getitem):
		if (<np.ndarray>self).ndim == 0:

	# If scalar then just convert to correct numpy type and use numpy repr
	if self.ndim == 0:
	return repr(self.item())

BUG: fix a crash when calling Column.pprint on a scalar column #15749

Are you sure you want to change the base?

BUG: fix a crash when calling Column.pprint on a scalar column #15749

Conversation

neutrinoceros commented Dec 15, 2023

Description

github-actions bot commented Dec 15, 2023

Choose a reason for hiding this comment

neutrinoceros Dec 15, 2023 • edited

Choose a reason for hiding this comment

neutrinoceros Dec 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

neutrinoceros commented Dec 15, 2023

neutrinoceros commented Dec 18, 2023

taldcroft commented Dec 18, 2023

neutrinoceros commented Dec 18, 2023

mhvk commented Dec 18, 2023

neutrinoceros commented Dec 18, 2023

taldcroft commented Dec 18, 2023

neutrinoceros commented Dec 20, 2023

taldcroft commented Dec 22, 2023

neutrinoceros commented Dec 22, 2023

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neutrinoceros commented Feb 12, 2024

BUG: fix a crash when calling `Column.pprint` on a scalar column #15749

BUG: fix a crash when calling `Column.pprint` on a scalar column #15749

neutrinoceros Dec 15, 2023 •

edited

neutrinoceros Dec 15, 2023 •

edited