Honor padding in compound types in native HDF5 files #720

FrancescAlted · 2019-02-19T19:22:00Z

This PR adds support for handling general compound types with padding. So far, PyTables always removed the possible padding (i.e. 'holes') in the compound datatypes, leading to issues when PyTables was used for manipulating or copying HDF5 files created with other tools in that the padding was removed.

With this PR, the HDF5 types are used internally with padding, so preserving it during output operations, most specially with copies (via e.g. ptrepack). This would allow a smoother interation with the HDF5 ecosystem out there (see some messages send from Ken Walker in the mailing list in 2018-10-30).

…d types

FrancescAlted · 2019-02-20T13:33:09Z

As a reference, here it is the message from Ken Walker to pytables-users on 2018-10-30 about the issue that is being addressed in this PR:

This is related to my previous question about H5Tpack().
I am working thru a problem reading/writing data with Pytables. I read some data rows from one HDF5 file/dataset into a numpy record array, then write that array to a dataset in a different HDF5 file (no change to the data). The data in the new file looks fine when interrogated with Pytables or viewed with HDFView. However, a downstream C++ app can't read the Pytables data.
I am told (by the developers) that the compiler for the upstream program is set to pad the data when it writes the original file (that I am reading), and the pad is expected by the downstream reader (that reads the file I created). Padding adds 4 pad characters to the a 4 byte S4 field so the next field starts at an 8 byte memory boundary. Based on observed behavior, they have inferred that Pytables removes the pad characters when reading the dataset, and does not add a pad when writing the new dataset. (all perfectly legal in hdf5 and does not affect data integrity). However the missing pad is expected by the downstream reader, and causes an error (I know, bad code design).

So....I'm wondering...is there something in Pytables that controls padding when reading/writing datasets like this?

FYI, I recreated this read/write process with h5py, and the output file is compatible with my downstream app. Apparently h5py retains the padded characters. This is confirmed when I write the dataset.dtype: h5py reports itemsize:384, vs itemsize:380 when Pytables reads the dataset.
I could rewrite my utility with h5py...but hope to avoid (if possible) because I leverage a lot of pytables unique functionality.
Thanks in advance for any insights into this quirky padding behavior.
-Ken

FrancescAlted · 2019-02-20T13:47:41Z

With this, PyTables can create tables with paddings as long as they come from NumPy arrays with paddings (i.e. paddings in NumPy structured arrays are respected), and the original paddings are respected during copies too.

To do:

Add tests for copies and check that padding is respected (fixed in 51a221a)
Document the new _v_offsets attribute in the Description class (fixed in 7fa825e)

FrancescAlted · 2019-02-21T08:00:14Z

The heavy test suite pass on Linux:

Ran 67234 tests in 6323.946s

OK (skipped=304)

tomkooij · 2019-02-21T10:10:02Z

The tests LGTM, so if they pass, go ahead and merge. If you'd like an actual review, I can maybe do that tomorrow. Let me know.

Nice work!

FrancescAlted · 2019-02-21T11:15:49Z

Hi @tomkooij . Yes, as this is a change that affects the format of dataset copies (again, only when padding is present), a review would be greatly appreciated. Thanks!

…onored or not

FrancescAlted · 2019-03-08T13:10:26Z

Let's merge.

FrancescAlted added 5 commits February 15, 2019 13:49

Change the name of enum.py to avoid collisions with the submodule

cadb09b

Preliminary support for keeping the padding within HDF5/NumPy compoun…

4a84d81

…d types

All tests in attributes pass with types with offset

9097fb3

Final fix for let the complete suite to pass

9b4dc73

Fix the order of the offsets (follow the positions)

03dfe54

FrancescAlted changed the title ~~Support for padding in native HDF5 files~~ Respect padding in compound types in native HDF5 files Feb 20, 2019

FrancescAlted added 2 commits February 21, 2019 10:45

Add tests for copying tables while respecting the padding

51a221a

Add entry in documentation for

7fa825e

Be explicit in the size of integers for avoiding diffs among platforms

edffb45

FrancescAlted added 3 commits February 21, 2019 15:17

New ALLOW_PADDING parameter for specifying whether padding is to be h…

d6c0732

…onored or not

Make sure the struct array is consistent with offsets of the description

33b5105

Add tests on not allowing padding when copying tables

f285118

FrancescAlted changed the title ~~Respect padding in compound types in native HDF5 files~~ Honor padding in compound types in native HDF5 files Feb 21, 2019

FrancescAlted added 2 commits February 21, 2019 18:59

Add release notes about the change in padding treatment

62ffd75

Add --dont-allow-padding flag for ptrepack

ab83619

FrancescAlted merged commit 34bbace into master Mar 8, 2019

FrancescAlted deleted the padding branch March 8, 2019 13:10

FrancescAlted mentioned this pull request Mar 12, 2019

Failure when trying to store recarray with a non-packed (aligned) dtype #661

Closed

tomkooij mentioned this pull request Sep 23, 2019

Alignment issues with python3 #734

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honor padding in compound types in native HDF5 files #720

Honor padding in compound types in native HDF5 files #720

FrancescAlted commented Feb 19, 2019 •

edited

FrancescAlted commented Feb 20, 2019

FrancescAlted commented Feb 20, 2019 •

edited

FrancescAlted commented Feb 21, 2019

tomkooij commented Feb 21, 2019

FrancescAlted commented Feb 21, 2019

FrancescAlted commented Mar 8, 2019

Honor padding in compound types in native HDF5 files #720

Honor padding in compound types in native HDF5 files #720

Conversation

FrancescAlted commented Feb 19, 2019 • edited

FrancescAlted commented Feb 20, 2019

FrancescAlted commented Feb 20, 2019 • edited

FrancescAlted commented Feb 21, 2019

tomkooij commented Feb 21, 2019

FrancescAlted commented Feb 21, 2019

FrancescAlted commented Mar 8, 2019

FrancescAlted commented Feb 19, 2019 •

edited

FrancescAlted commented Feb 20, 2019 •

edited