-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Npy file format support #2437
Npy file format support #2437
Conversation
- Move files fetch_store.* from core/image_io/ to core/. - Rename some functions to reflect the fact that they perform a linear transform of intensities in addition to fetching or storing raw data. - Add new functions that provide access to such template functions but without the use of that linear transformation, which tends to be specific to image data.
New file core/file/matrix.h now contains functions relating to the loading and saving of numerical matrix data from / to text file. In addition, these functions are now in namespace File::Matrix::*, rather than MR::*. The exception is function parse_matrix(), which has instead been moved to file core/mrtrix.h.
Any MRtrix3 command that loads vector / matrix data from text files, or saves such data to text files, should now transparently handle binary NumPy .npy files. This includes support for float16 data, and floating-point numerical data can be saved in float16 with setting of the corresponding MRtrix config file entry.
👍 Happy with that.
Fully agree. I've had similar thoughts in the past, but not done anything about them... External code that we import wholesale includes the original NIfTI headers, glLoadGen to generate the
Yes, that sounds like the most sensible way to do it!
I wouldn't worry about MRView - as long as the internal in-RAM representation is one it can deal with, it doesn't matter how it was stored on file. I'd be surprised if that broke anything... |
The MRtrix image format now supports half-precision floating-point data. Some handling of the NumPy .npy array file format has been simplified to accommodate these deeper changes.
16-bit floating-point images working, and appear properly in
|
Conflicts: cmd/connectomestats.cpp core/image_io/fetch_store.cpp core/math/stats/shuffle.cpp src/dwi/tractography/SIFT2/tckfactor.cpp
- Prevent endianness warning on single-byte data types. - Explicitly disable complex datatype support. - Comment out residual debugging print statements. - Move large functions to .cpp file by isolating those sections that were dependent on templates and only having those within the .h file. - Explicitly handle bitwise data in the same way in which the NPY format supports it: as one byte per bit of data. This requires explicit special handling at both read and write time as it is incompatible with packed bit storage. - Provide a ReadInfo intermediate struct during file read that may potentially be used to construct an Eigen::Map if the data type / dimension / order are known a priori.
Worth noting on this PR, given its import of the |
In #1509, a CI failure was encountered where on Mac ARM64 the shared library would not link, complaining about the absence of __set_fetch_function() for unsigned long datatype. It is believed that this is being caused by the presence of MR::Math::load_matrix() with size_t template type being used for loading pre-defined permutation matrices. To address this, the new typedef MR::Math::Stats::index_type is being introduced to refer to any item being indexed into an array or matrix. A 32-bit unsigned integer should be sufficient for all cases of such (the only possible exception is the calculation of the maximal number of shuffles, for which 64-bit unsigned integers are used, just for the sake of potentially calculating and reporting that number, even if it turns out to be much larger than the number of shuffles actually utilised).
Still having some outstanding issues with CI on this PR. I had a couple of issues compiling on Mac that have been resolved. One of them I'm still a little confused by: it didn't like compiling Where it's still stuck now is my new unit tests for NPY support, which require importing
|
the shebang in
⬆️ will be the macos-latest xcode python version (3.8 or 9 I think?) and will look for packages in somewhere like but it looks like macos-latest also has homebrew python (3.11) already installed to /usr/local/bin/python3 and the associated this will be the python3 invoked by So either
or
should sort it out |
9dcc4a8
to
8e29ee7
Compare
Thanks @fionaEyoung and @daljit46 for tips on the Python multiple version issue; Mac checks are now working. There, I've chosen to stick to the same shebang for these unit test files as the Python executable scripts ( Unfortunately MSYS2 has a similar issue, and that one may be harder to fix.
Regardless, we will need to reconsider the hardcoded shebang on |
Note, that we are no longer relying on I think symlinking could be the least disrupting option here. Also, even if we manage to successfully install the I personally think we should change the shebang for the scripts as mentioned in #2261. Relying on hardcoded paths is not appropriate in my opinion, and I would rather use |
If that's the case, that would IMO be an argument for bypassing those checks in the case where
I think I'm leaning in the same direction, though I haven't re-familiarised myself with the debate that led to the current decision. Importantly " |
Superset of #2436. I'll leave that open for now, but showing this PR might help to more clearly provide a justification for those changes.
Listed as draft PR for now as I would like to add some unit tests, which will require additions to the test data repo.
Discussion points:
The functions for loading & saving matrix data from / to the filesystem are here proposed to be moved from the
::MR
namespace in filecore/math/math.h
to the::MR::File::Matrix
namespace in filecore/file/matrix.h
. I always found it unusual to have those functions in our base namespace, as well as having those functions defined in a header file where the content of the first half is so drastically different and in no way related to filesystem interfacing. While this change would break backward compatibility with any external C++ projects, it to me feels like a much more appropriate location, and external projects could get away with testing the content ofMRTRIX_MINOR_VERSION
at compile time if preserving3.0.x
compatibility is desired.Support for 16-bit floating-point data is via a single header include file, which is integrated into the repository, just as was done previously with JSON for Modern C++. It however occurs to me that the utilisation of such libraries is not broadcast to the same extent as are external dependencies that are listed as prerequisites for compilation. I wonder if we should have somewhere listed all packages on which the software depends, regardless of whether they are compile-time dependencies or imported directly into the repository.
16-bit floating-point support is currently implemented in a manner specific to the NPY file format for matrix or vector data. It does however occur to me that
Float16
could theoretically be just another type inMR::DataType
, with get/set functions that use the new 16-bit floating-point library, and the MRtrix image format would inherit support for such. That would potentially simplify the NPY code slightly, but would open up a whole bunch of requisite testing to make sure it doesn't break something else (e.g. no idea how easy or difficult this data type might be to support inmrview
without trying it).There is scope to speed up the NPY load / save operations in cases where data type / endianness / row vs. column ordering is identical between disk and RAM, by doing a direct
memcpy()
with the memory-mapped region.