Table cannot read numpy void dtype #12577

nstarman · 2021-12-09T14:08:33Z

Description

Cannot read a Table from a file with a numpy.void dtype column.

Expected behavior

Roundtripping for all numpy array types.

Actual behavior

A ValueError is raised that void is not a recognized dtype

Steps to Reproduce

import tempfile
import os

with tempfile.TemporaryDirectory() as tempdir:

    tbl = QTable({"a": [u.Quantity((0, 0, 0.6), unit="(eV, eV, eV)")]})
    tbl.write(os.path.join(tempdir, "temp.ecsv"))

    QTable.read(os.path.join(tempdir, "temp.ecsv"))

ValueError: datatype 'void192' of column 'a' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string')

System Details

macOS-10.16-x86_64-i386-64bit
Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ]
Numpy 1.21.4
pyerfa 2.0.0.1
astropy 5.1.dev207+gbfb9252df.d20211130
Scipy 1.7.1
Matplotlib 3.3.4

The text was updated successfully, but these errors were encountered:

nstarman · 2021-12-09T14:19:34Z

@mhvk, I'm not sure if ECSV is saving enough info to reconstruct the full dtype. It might be, because the type isn't just void but void192, but I'm not sure what's the difference...

mhvk · 2021-12-09T15:58:38Z

Yikes, I thought we had just solved that... Though it may just be to include 'void' in the first place in that list (which I'm guessing is not in the same io.misc.yaml place... but I haven't looked yet)

nstarman · 2021-12-10T11:23:41Z

Unfortunately it doesn't appear that easy a fix. Adding void or void192 to ECSV_DATATYPES in ecsv.py does not fix the problem.

mhvk · 2021-12-10T19:28:24Z

Probably good to get @taldcroft to look at this case. Let me also give a more minimal example, since it is not related to Quantity but just to the structured dtype (I add a string column as that is special-cased as well).

import numpy as np
from astropy.table import Table, Column

tbl = Table({"a": Column([(0, 0, 0.6)], dtype='f8,f8,f8'), "b": ['abcdef']})
tbl.write('temp.ecsv', overwrite=True)
Table.read('temp.ecsv')
ValueError: datatype 'void192' of column 'a' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string')

!cat temp.ecsv
# %ECSV 1.0
# ---
# datatype:
# - {name: a, datatype: void192}
# - {name: b, datatype: string}
# schema: astropy-2.0
a b
"(0., 0., 0.6)" abcdef

So, I think the basic first step is to represent the data type correctly, as a structured one (which of course we just did, but may need to be careful about strings inside structured data types...)

taldcroft · 2021-12-10T20:50:27Z

There is no support in ECSV for structured data like dtype='f8,f8,f8'. That would be a whole new can of worms after the extension to allow multidimensional data. I suppose one could imagine subtype: 'float64,float64,int32' or similar to map to the structured dtype. (See https://github.com/astropy/astropy-APEs/blob/main/APE6.rst#multidimensional-array-data).

I think for now this could be called a bug in the ECSV writer that it doesn't detect such a datatype and stop with a good exception message.

mhvk · 2021-12-10T21:22:47Z

Indeed, I think that subtype trick could work! Though it would loose the names of the structure elements.

Alternatively, and perhaps easier, would treat this case more like a Representation and just write out individual columns according to the names of the dtype - dealing with it would then become something for ColumnInfo and QuantityInfo, and we don't need to change the ECSV standard.

nstarman · 2021-12-11T00:12:35Z

@taldcroft @mhvk, I think the Representation approach may be the optimal route since a structured dtype can mix scalars and non-scalars, and the latter would require ECSV's multidimensional capability

>>> x = np.array([(1, np.ones((2, 2))), (2, np.zeros((2, 2)))],
...              dtype=[("a", "i4"), ("b", "float64", (2, 2))])
>>> x["b"]
array([[[1., 1.],
        [1., 1.]],

       [[0., 0.],
        [0., 0.]]])

Making a problematic Table

>>> QTable({"name": [x[0][()]]})
<QTable length=1>
          name            
         void288         
-------------------------
(1, [[1., 1.], [1., 1.]])

Saving similarly to Representation, the void can be saved to look something approximately like this:

>>> print(repr(QTable(x)[0]))
<Row index=0>
  a    b [2,2]  
int32  float64  
----- ----------
    1 1.0 .. 1.0

though of course be read back to look like the void Table, above.

nstarman · 2021-12-11T00:23:16Z

Continuing the above comment, building support for this in ColumnInfo (or wherever appropriate) would presumably serve as the basis / base class for saving all structured data, like Representation.

Alternatively, I recall there was discussion in an old Issue/PR of allowing Table nesting. Would that be a similarly viable solution?
Found it: #3963

mhvk · 2021-12-11T01:36:25Z

I'm playing around with ColumnInfo - I think that can work. An additional benefit is that one gets writing of structured data for FITS, etc., for free.

mhvk · 2021-12-13T02:45:05Z

See #12589 for an idea.

mhvk · 2021-12-13T16:01:15Z

Am reconsidering the approach. Right now, the following actually works!

In [19]: q = u.Quantity([(0.1, 0.3, 0.6)], dtype='f8,f8,f8', unit='m,cm,s')

In [20]: QTable(q)
Out[20]: 
<QTable length=1>
   f0      f1      f2  
   m       cm      s   
float64 float64 float64
------- ------- -------
    0.1     0.3     0.6

In [21]: u.Quantity(QTable(q))
Out[21]: <Quantity [(0.1, 0.3, 0.6)]>

taldcroft · 2021-12-13T16:29:18Z

Am reconsidering the approach. Right now, the following actually works!

Humm. I'm not sure if I think that is a feature or a bug. The result is surprising to me, somewhat like how people are often surprised that list('abc') gives ['a', 'b', 'c']. But I guess there is some precedent since you can put in a 3x2 float array as the arg to Table and get some columns.

(I was wondering what Table('abc') would give, and it does the right thing of ValueError: Data type <class 'str'> not allowed to init Table.)

mhvk · 2021-12-13T16:34:52Z

The direction from Table to Column makes some sense - it calls table.__array__() (EDIT: that is almost certainly why the unit is lost). The other direction, there is a specific stanza:

astropy/astropy/table/table.py

Lines 749 to 753 in 2158c36

    
           elif isinstance(data, np.ndarray): 
        
               if data.dtype.names: 
        
                   init_func = self._init_from_ndarray  # _struct 
        
                   n_cols = len(data.dtype.names) 
        
                   default_names = data.dtype.names

nstarman · 2021-12-15T22:24:44Z

@mhvk, will #12589 close this Issue?

mhvk · 2021-12-15T22:27:22Z

Yes, it should - but need to decide on implementation!

nstarman · 2024-05-22T05:03:21Z

This appears to have been fixed

nstarman added table Bug labels Dec 9, 2021

taldcroft added io.ascii and removed table labels Dec 10, 2021

mhvk mentioned this issue Dec 13, 2021

Allow structured dtype for column serialization. #12589

Closed

nstarman closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table cannot read numpy void dtype #12577

Table cannot read numpy void dtype #12577

nstarman commented Dec 9, 2021 •

edited

nstarman commented Dec 9, 2021 •

edited

mhvk commented Dec 9, 2021

nstarman commented Dec 10, 2021

mhvk commented Dec 10, 2021

taldcroft commented Dec 10, 2021

mhvk commented Dec 10, 2021

nstarman commented Dec 11, 2021 •

edited

nstarman commented Dec 11, 2021 •

edited

mhvk commented Dec 11, 2021

mhvk commented Dec 13, 2021

mhvk commented Dec 13, 2021

taldcroft commented Dec 13, 2021

mhvk commented Dec 13, 2021 •

edited

nstarman commented Dec 15, 2021

mhvk commented Dec 15, 2021

nstarman commented May 22, 2024

Table cannot read numpy void dtype #12577

Table cannot read numpy void dtype #12577

Comments

nstarman commented Dec 9, 2021 • edited

Description

Expected behavior

Actual behavior

Steps to Reproduce

System Details

nstarman commented Dec 9, 2021 • edited

mhvk commented Dec 9, 2021

nstarman commented Dec 10, 2021

mhvk commented Dec 10, 2021

taldcroft commented Dec 10, 2021

mhvk commented Dec 10, 2021

nstarman commented Dec 11, 2021 • edited

nstarman commented Dec 11, 2021 • edited

mhvk commented Dec 11, 2021

mhvk commented Dec 13, 2021

mhvk commented Dec 13, 2021

taldcroft commented Dec 13, 2021

mhvk commented Dec 13, 2021 • edited

nstarman commented Dec 15, 2021

mhvk commented Dec 15, 2021

nstarman commented May 22, 2024

nstarman commented Dec 9, 2021 •

edited

nstarman commented Dec 9, 2021 •

edited

nstarman commented Dec 11, 2021 •

edited

nstarman commented Dec 11, 2021 •

edited

mhvk commented Dec 13, 2021 •

edited