Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column.pprint fails for scalars #12584

Open
nstarman opened this issue Dec 10, 2021 · 16 comments · May be fixed by #15749
Open

Column.pprint fails for scalars #12584

nstarman opened this issue Dec 10, 2021 · 16 comments · May be fixed by #15749

Comments

@nstarman
Copy link
Member

nstarman commented Dec 10, 2021

Description

Raises TypeError because object is scalar and the code expects an array object.

Expected behavior

If it can be made into a Column, pprint works.

Actual behavior

See Description.

Steps to Reproduce

import astropy.units as u
from astropy.table import Column

Column(1).pprint()

TypeError: len() of unsized object

System Details

macOS-10.16-x86_64-i386-64bit
Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ]
Numpy 1.21.4
pyerfa 2.0.0.1
astropy 5.1.dev207+gbfb9252df.d20211130
Scipy 1.7.1
Matplotlib 3.3.4

@nstarman
Copy link
Member Author

@mhvk, I suspect many issues similar to this one exist across Astropy. numpy.void is consistently an edge case: a pseudo-scalar that can contain heterogenous data, including non-scalars.

@mhvk
Copy link
Contributor

mhvk commented Dec 10, 2021

I think this may not be related to void, but rather to .pprint being unable to print scalars generally:

Column(1).pprint()
TypeError: len() of unsized object

@dhomeier
Copy link
Contributor

Yep

table.Column([m_nu]).pprint(show_dtype=True)
     None    
   void192   
-------------
(0., 0., 0.6)

@nstarman nstarman changed the title Column.pprint can't display dtype for numpy.void Column.pprint fails for scalars Dec 14, 2021
@nstarman
Copy link
Member Author

Thanks @mhvk, @dhomeier. Changing the title to reflect the true issue.

@datajungler
Copy link

m_nu = u.Quantity((0, 0, 0.6), unit="eV")
Column(m_nu).pprint(show_dtype=True)

pprint method for the "Column" class is originally designed for handling monotonous units. Therefore, the issue can be resolved by declaring a new Quantity with single-string unit.

Output:

None
float64
-------
    0.0
    0.0
    0.6

@nstarman
Copy link
Member Author

nstarman commented Jan 22, 2022

@datajungler, u.Quantity((0, 0, 0.6), unit="eV") and u.Quantity((0, 0, 0.6), unit="(eV, eV, eV)") are fundamentally different objects. The former is a (3,) vector, while the latter is a () scalar. A better comparison is u.Quantity(0.6, unit="eV") , which will also fail with Column.pprint:

m_nu = u.Quantity(0.6, unit="eV")
Column(m_nu).pprint(show_dtype=True)

@taldcroft
Copy link
Member

I edited the original description to show the real problem, since the use of a structured Quantity is just a misdirection that is confusing since it appears somewhat array-like at first glance.

At some level I feel like Column itself should raise an exception for a scalar input since it is really meant to be part of a table. I can't see where a scalar Column makes any sense, but I suspect that raising an exception in that case will break stuff.

So what should pprint() do in this case? The current behavior of outputting as a column really implies that it has a length.

@nstarman
Copy link
Member Author

nstarman commented Feb 5, 2022

At some level I feel like Column itself should raise an exception for a scalar input since it is really meant to be part of a table. I can't see where a scalar Column makes any sense, but I suspect that raising an exception in that case will break stuff.

While I agree on the principle, pragmatically I think the most important thing is consistency with np.array behavior. numpy.ndarray allows for array scalars, so I think Column should as well.
Agreed that a behavior change will probably break stuff.

So what should pprint() do in this case? The current behavior of outputting as a column really implies that it has a length.

I think array scalars should be special cased. Unfortunate, but probably necessary.

@neutrinoceros
Copy link
Contributor

I've opened #15749 to attempt to fix this. I went with what felt like the most natural approach (also suggested by @nstarman) to special-case scalar columns. The patch is really small at the moment but it's not completely functional as it breaks at least one existing test. Feedback and suggestions are most welcome !

@mhvk
Copy link
Contributor

mhvk commented Dec 15, 2023

@neutrinoceros - I'm less sure we even want to print scalar column that way, since also the regular repr is different:

In [11]: Column(1)
Out[11]: 1

In [12]: Column([1])
Out[12]: 
<Column dtype='int64' length=1>
1

I do think pprint() should not fail, but it may be OK to just typeset the number with the format function without having the column name, etc. I.e., I'd advocate some form of if self.ndim == 0; return <something-simple>.

@tactipus
Copy link

I've opened #15749 to attempt to fix this. I went with what felt like the most natural approach (also suggested by @nstarman) to special-case scalar columns. The patch is really small at the moment but it's not completely functional as it breaks at least one existing test. Feedback and suggestions are most welcome !

hi! would you like to share the testing with us? like how you wrote it & what the results were. TYIA!

@neutrinoceros
Copy link
Contributor

@mhvk sorry I opened #15754 before I saw your comment, I just noticed that independently and assumed it could be considered a bug. If you believe that behaviour to be desirable feel free to close my issue !

@tactipus My pull request is publicly available. Actually by now I think I figured it out !

@taldcroft
Copy link
Member

I'm still on the fence about this. I think of Column as an nd.array subclass that is required to have a length. Like you can have a Box subclass of Rectangle where the edge lengths are required to be the same.

@taldcroft
Copy link
Member

However, I've confirmed that not allowing Column to be initialized with a scalar does indeed create problems in astropy code that are not immediately trivial. This suggests that user code might break as well, so it looks like #15749 is the better choice if it doesn't impact performance.

@tactipus
Copy link

sup. just saw the notif for this thread. are we still good on this?

@neutrinoceros
Copy link
Contributor

@tactipus We're working on a fix in the linked PR #15749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants