Drop the hdf5storage dependency (write-side breaking change)#141
Draft
KenyaOtsuka wants to merge 3 commits into
Draft
Drop the hdf5storage dependency (write-side breaking change)#141KenyaOtsuka wants to merge 3 commits into
KenyaOtsuka wants to merge 3 commits into
Conversation
_mat_v73 now only reads MATLAB v7.3 / hdf5storage / bdpy .mat files; the plain-HDF5 savemat()/write_dataset() helpers are removed (writing moves to the save sites). read_cell() now routes the plain-matrix branch through read_dataset() so MATLAB-style transposed matrices with MATLAB_class are de-transposed before being split into rows. Read compatibility is unchanged.
save_array (dense), save_multiarrays, SparseArray.save and save_feature now write plain HDF5 directly with h5py instead of going through a savemat wrapper. SparseArray.save inlines the former __save_h5py body. New files are bdpy-native plain HDF5 and intentionally not MATLAB-load compatible; existing files still load. Tests cover dense/sparse/multiarray round-trips, save_feature, and that new datasets carry no MATLAB_class/Python.Shape.
Remove hdf5storage from the project dependencies, mypy config, README and the legacy test Pipfiles now that nothing imports it, and update a stale hdf5storage reference in a datastore comment.
Author
|
This PR is intended to be merged in a release several releases after #142 has been merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the
hdf5storagedependency entirely. This follows up on #137, whichmoved the read path off
hdf5storage(onto an h5py reader) but deliberatelyleft the dense/sparse write paths on
hdf5storage.savemat, keeping thedependency and the NumPy < 2 pin on Python 3.8/3.9.
The guiding principle here:
hdf5storage/ MATLAB v7.3 (and by older bdpy) still load.written as bdpy-native plain HDF5 and are not meant to be opened by
MATLAB's
load.This is a deliberate write-side breaking change (see Compatibility below).
Why
hdf5storagebroke under NumPy 2.0 (it referenced the removednp.unicode_;see #106). Rather than reimplement a full MATLAB-v7.3-compatible writer just to
keep the on-disk format, we drop the dependency and write plain HDF5 at the
save sites. bdpy reads its own output back through the existing h5py reader.
Changes
bdpy/dataform/_mat_v73.py→ read-only legacy readersavemat()/write_dataset()(and from__all__); the module nolonger writes anything.
support for MATLAB v7.3 / hdf5storage
.matfiles.read_cell(): the plain-matrix branch now routes throughread_dataset(), so MATLAB-style transposed matrices carryingMATLAB_classare de-transposed before being split into rows. Readbehavior for bdpy's own (plain)
index/shapedatasets is unchanged.Write plain HDF5 directly at the save sites
bdpy/dataform/sparse.py:save_array(dense),save_multiarrays, andSparseArray.savewrite datasets/groups directly with h5py(
SparseArray.saveinlines the former__save_h5pybody; append-mode stillpreserves other top-level variables).
bdpy/dataform/features.py:save_featurewrites thefeatdatasetdirectly with h5py.
Drop the dependency
hdf5storagefrompyproject.toml,mypy.ini,README.md, and thelegacy
tests/env/*/Pipfiles; clean up a stale comment indatastore.py.Compatibility
hdf5storage/ MATLAB v7.3 / bdpy filesfeatdatasetThe only compatibility promise is reading existing files. If MATLAB-readable
output is required, pin an older bdpy release.
Tests
_mat_v73: removed the writer tests; addedread_cellcoverage for thetransposed-matrix path.
sparse: dense/sparse round-trips,save_multiarrays, preserve-other-variables, and an assertion that dense saves carry no
MATLAB_class/Python.Shape.features: mock data now written as plain HDF5; added an explicitsave_feature()test (round-trips vialoadmat_keyandFeatures).