Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign b2nd array compatibility #1072

Merged
merged 12 commits into from Oct 19, 2023
Merged

Foreign b2nd array compatibility #1072

merged 12 commits into from Oct 19, 2023

Conversation

ivilata
Copy link
Contributor

@ivilata ivilata commented Oct 19, 2023

This expands the work on b2nd array support with direct chunking (#1056) to better handle such arrays created with other tools, e.g. when the filter values with chunk rank & shape are missing, or by being more relaxed about the format of the b2nd array used for storing the chunk (e.g. by not requiring it to consist of a single inner chunk, or have a specific blocksize). Some missing tests on chunk data shape have been added in the optimized path.

A new example script has been added to test writing and reading partial chunks.

Finally, the workaround for stack smashing on some versions of GCC has been removed (it seems not to be needed anymore).

To avoid confusion with the HDF5 array dataset itself.
Instead of checking that the array's chunk shape matches the dataset's chunk
shape, check that the array's whole shape matched the dataset's chunk shape.
PyTables stores one Blosc2 chunk per HDF5 chunk, but it should be able to cope
with Blosc2 frames containing several chunks (since reading does not really
operate at the Blosc2 chunk level), as long as the whole Blosc2 array has the
proper shape.

This may ease having PyTables read b2nd-compressed arrays where dataset chunks
contain several Blosc2 chunks.
These checks were already made by the filter, but still missing here.
This increases compatibility with datasets written with other tools, esp. if
they also use b2nd for scalar or unidimensional data, as chunk rank/shape
filter values used by PyTables for the checks may be missing (and filter set
function would not set them either since rank < 2).
The issue seems to have vanished on Guix (GCC 12.3.0), let us see what CI says
about Ubuntu.
@ivilata ivilata self-assigned this Oct 19, 2023
Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Thanks @ivilata !

@ivilata ivilata merged commit bb02c88 into master Oct 19, 2023
30 checks passed
@ivilata ivilata deleted the direct-chunking-b2ndarray branch October 19, 2023 15:05
ivilata added a commit that referenced this pull request Nov 9, 2023
Fix broken b2nd optimized slice assembly and tests

This fixes the assembly of slices obtained via Blosc2 ND optimized slicing, which was using `memcpy` from the outer dimension of each chunk slice instead of the inner one. The new code avoids the manual assembly of the slice altogether by leaving the job to `b2nd_copy_buffer`, which was published in C-Blosc2 2.11.0 (thus the dependencies on C-Blosc2 and python-blosc2 are updated too).

A new unit test `tables.test_carray.Blosc2Ndim3MinChunkOptTestCase` has been added that would trigger the error in case of the bug, to avoid regressions. Also, this fixes other unit tests that had been added for b2nd optimized slicing but were not enabled.

Finally, `tables.test_carray.Blosc2NDNoChunkshape` has been added to check the compatibility with arrays that contain b2nd chunks but do not include the extra filter parameters with the chunk rank and shape (e.g. because they were created with code other than `hdf5-blosc2`, see #1072).
/* Although blosc2_decompress_ctx ("else" branch) can decompress b2nd-formatted data,
* there may be padding bytes when the chunkshape is not a multiple of the blockshape,
* and only b2nd machinery knows how to handle these correctly.
*/
if (blosc2_meta_exists(schunk, "b2nd") >= 0
|| blosc2_meta_exists(schunk, "caterva") >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the "caterva" check here, as it never reached the production status, and its functionality has been included in "b2nd".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants