Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: BUG: multidimensional tables are not supported #706

Closed
mikofski opened this issue Aug 10, 2018 · 1 comment
Closed

ENH: BUG: multidimensional tables are not supported #706

mikofski opened this issue Aug 10, 2018 · 1 comment

Comments

@mikofski
Copy link

mikofski commented Aug 10, 2018

Problem

Given an HDF5 file with a multidimensional table, then when opened with PyTables

  • the table dimensions are incorrect
  • accessing data in the table
    • raise an exception on Windows
    • on macOS causes a Abort Trap 6 and quits Python unexpectedly
HDF5ExtError: HDF5 error back trace

  File "C:\ci\hdf5_1525883595717\work\src\H5Dio.c", line 216, in H5Dread
    can't read data
  File "C:\ci\hdf5_1525883595717\work\src\H5Dio.c", line 471, in H5D__read
    src and dest data spaces have different sizes

End of HDF5 error back trace

Problems reading records.

reproducible example

import h5py
import tables
import numpy as np

x = np.array([[(1, 2, 3), (4, 5, 6)], [(7, 8, 9), (10, 11, 12)]],
             dtype=[('a', float), ('b', float), ('c', float)])

x.shape  # (2, 2)

with h5py.File('multidim_table.h5','w') as f: f['x'] = x
y = tables.open_file('multidim_table.h5')  # with or without read mode

on macOS if I attempt to view y Python quits unexpectedly, and I see this message

Abort trap: 6

on Windows if I attempt to view y, I see this:

File(filename=multidim_table.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/x (Table(2,)) ''
  description := {
  "a": Float64Col(shape=(), dflt=0.0, pos=0),
  "b": Float64Col(shape=(), dflt=0.0, pos=1),
  "c": Float64Col(shape=(), dflt=0.0, pos=2)}
  byteorder := 'little'
  chunkshape := (2730,)

You can see the shape is wrong, PyTables thinks it's a (2,) not (2, 2).
Also I can see the /x dataset, but I can only get the index at [1], not any of the other indices.

In [5]: y.get_node('/x')
Out[5]:
/x (Table(2,)) ''
  description := {
  "a": Float64Col(shape=(), dflt=0.0, pos=0),
  "b": Float64Col(shape=(), dflt=0.0, pos=1),
  "c": Float64Col(shape=(), dflt=0.0, pos=2)}
  byteorder := 'little'
  chunkshape := (2730,)

In [6]: y.get_node('/x')[0]
---------------------------------------------------------------------------
HDF5ExtError                              Traceback (most recent call last)
<ipython-input-6-876392551204> in <module>()
----> 1 y.get_node('/x')[0]

~\AppData\Local\Continuum\miniconda3\envs\py36\lib\site-packages\tables\table.py in __getitem__(self, key)
   2077                 key += self.nrows
   2078             (start, stop, step) = self._process_range(key, key + 1, 1)
-> 2079             return self.read(start, stop, step)[0]
   2080         elif isinstance(key, slice):
   2081             (start, stop, step) = self._process_range(

~\AppData\Local\Continuum\miniconda3\envs\py36\lib\site-packages\tables\table.py in read(self, start, stop, step, field, out)
   1932                                                 warn_negstep=False)
   1933
-> 1934         arr = self._read(start, stop, step, field, out)
   1935         return internal_to_flavor(arr, self.flavor)
   1936

~\AppData\Local\Continuum\miniconda3\envs\py36\lib\site-packages\tables\table.py in _read(self, start, stop, step, field, out)
   1846             # This optimization works three times faster than
   1847             # the row._fill_col method (up to 170 MB/s on a pentium IV @ 2GHz)
-> 1848             self._read_records(start, stop - start, result)
   1849         # Warning!: _read_field_name should not be used until
   1850         # H5TBread_fields_name in tableextension will be finished

tables\tableextension.pyx in tables.tableextension.Table._read_records()

HDF5ExtError: HDF5 error back trace

  File "C:\ci\hdf5_1525883595717\work\src\H5Dio.c", line 216, in H5Dread
    can't read data
  File "C:\ci\hdf5_1525883595717\work\src\H5Dio.c", line 471, in H5D__read
    src and dest data spaces have different sizes

End of HDF5 error back trace

Problems reading records.

In [7]: y.get_node('/x')[1]
Out[7]: (10., 11., 12.)

Reading the file with h5ls shows it's a 2x2 table

$ h5ls -rd multidim_table.h5 
/                        Group
/x                       Dataset {2, 2}
    Data:
        (0,0) {1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}

Rereading it with h5py also works:

In [1]: import h5py
In [2]: import numpy as np
In [3]: with h5py.File('multidim_table.h5','r') as f: x = np.array(f['x'])
In [4]: x
Out[4]: 
array([[( 1.,  2.,  3.), ( 4.,  5.,  6.)],
       [( 7.,  8.,  9.), (10., 11., 12.)]],
      dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])

In [5]: x['a']
Out[5]: 
array([[ 1.,  4.],
       [ 7., 10.]])

In [6]: x['a'].shape
Out[6]: (2, 2)

relevant info

I reported this issue in the PyTable-users google group.

versions

tables-3.4.4
python-3.6 and 3.7 (Anaconda and Homebrew)
h5py-2.8.0
macOS-10.13.6 and Windows-10
numpy-1.15.0

@FrancescAlted
Copy link
Member

Yes, multidimensional tables have never been supported in PyTables and will likely stay like this for the foreseeable future (unless there is a nice PR contributing it). Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants