Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantities from dtype object arrays #14014

Closed
HealthyPear opened this issue Nov 16, 2022 · 10 comments
Closed

Quantities from dtype object arrays #14014

HealthyPear opened this issue Nov 16, 2022 · 10 comments

Comments

@HealthyPear
Copy link
Contributor

Description

This might be related to #13836, but I am not sure.

Expected behavior

u.Quantity(np.array([[1,2], [1,2,3], [1, 2]], dtype=object),u.m)

should work and give back

[[1, 2], [1, 2, 3], [1, 2]] m

Similarly, I would expect that

u.Quantity(np.array([[1,2], [], [1, 2]], dtype=object),u.m)

would give

[[1, 2], [], [1, 2]] m

In case all elements are long the same it works already,

u.Quantity(np.array([[1,2], [1,3], [1, 2]], dtype=object),u.m)

gives

[[1, 2], [1, 3], [1, 2]] m

Actual behavior

see above

Steps to Reproduce

See above

System Details

macOS-11.7.1-x86_64-i386-64bit
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00)
[Clang 13.0.1 ]
Numpy 1.23.4
pyerfa 2.0.0.1
astropy 5.1.1
Scipy 1.9.3
Matplotlib 3.6.2

@pllim pllim added the units label Nov 16, 2022
@pllim
Copy link
Member

pllim commented Nov 16, 2022

FWIW

>>> np.array([[1,2], [], [1, 2]], dtype=object)
array([list([1, 2]), list([]), list([1, 2])], dtype=object)
>>> np.array([[1,2], [1,3], [1, 2]], dtype=object)
array([[1, 2],
       [1, 3],
       [1, 2]], dtype=object)

Maybe @mhvk or @nstarman have ideas.

@mhvk
Copy link
Contributor

mhvk commented Nov 16, 2022

This mostly is a numpy weird behaviour, that np.array with object dtype doesn't quite know what level should be an object, so it tries to "guess", deciding that anything that can still be a regular shaped array "clearly" is not the thing that should be an object:

In [2]: np.array([[1,2], [1,2,3], [1, 2]], dtype=object)
Out[2]: array([list([1, 2]), list([1, 2, 3]), list([1, 2])], dtype=object)

In [3]: np.array([[1,2], [1,3], [1, 2]], dtype=object)
Out[3]: 
array([[1, 2],
       [1, 3],
       [1, 2]], dtype=object)

Now, Quantity is not designed to work with object elements since units really only make sense with numbers. The only reason that the regularly spaced example does work, is that it can convert all the objects in the array to numbers directly:

In [9]: u.Quantity(np.array([[1,2], [1,3], [1, 2]], dtype=object), u.m).dtype
Out[9]: dtype('float64')

In contrast, for the ragged arrays, the objects are lists and Quantity does not attempt to convert them; it is not possible to change this without support for adding ragged arrays to Quantity (which are poorly supported in numpy too).

@mhvk
Copy link
Contributor

mhvk commented Nov 16, 2022

Note that I think this is a "bug" only in the sense that perhaps we should not accept any object arrays. The main reason we do is to support arrays of Decimal and Fraction.

@nstarman
Copy link
Member

Complex thoughts on whether / how we should support object arrays. For now I think we should, for the reasons @mhvk mentioned. numpy has tricky issues with object arrays (e.g. see the very bottom of https://docs.astropy.org/en/latest/api/astropy.cosmology.z_at_value.html), so I think this is mostly their problem.

@github-actions
Copy link

Hi humans 👋 - this issue was labeled as Close? approximately 9 hours ago. If you think this issue should not be closed, a maintainer should remove the Close? label - otherwise, I will close this issue in 7 days.

If you believe I commented on this issue incorrectly, please report this here

@HealthyPear
Copy link
Contributor Author

Hello,

Thanks to all, it is indeed the same root cause then...

It is quite annoying because I have columns (with units) like this one,

image

and even though it works for visualization purposes, I don't know how to make use of it to extract or manipulate data (bonuns: as quantities).

A friend redirected me to the Awkward Array library, which is dedicated to this kind of data (in which there are fields with variable-length arrays) and I plan to look into it.

I have no idea how difficult it is, but do you think it would be cool if QTables supported this kind of arrays?

@mhvk
Copy link
Contributor

mhvk commented Nov 17, 2022

@HealthyPear - you could create an object Column with Quantity inside. The annoying this is that to avoid problems if they all are the same length, you have to create it first and then fill it, as in (not tested),

col = Column(len=..., dtype=object)
col[:] = [q0, q1, q2...]

@mhvk
Copy link
Contributor

mhvk commented Nov 17, 2022

p.s. In principle, QTable should be able to hold the Column, but may have to set the unit afterwards, not sure.

@pllim
Copy link
Member

pllim commented Nov 17, 2022

I never used awkward array but looks like they have API to convert back to Numpy, so theoretically, after you scrub things clean over there and gets back a "less awkward" array, you can then build Quantity with it.

@github-actions
Copy link

I'm going to close this issue as per my previous message, but if you feel that this issue should stay open, then feel free to re-open and remove the Close? label.

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants