Npy file format support #2437

Lestropie · 2022-02-25T10:47:06Z

Superset of #2436. I'll leave that open for now, but showing this PR might help to more clearly provide a justification for those changes.

Listed as draft PR for now as I would like to add some unit tests, which will require additions to the test data repo.

Discussion points:

The functions for loading & saving matrix data from / to the filesystem are here proposed to be moved from the ::MR namespace in file core/math/math.h to the ::MR::File::Matrix namespace in file core/file/matrix.h. I always found it unusual to have those functions in our base namespace, as well as having those functions defined in a header file where the content of the first half is so drastically different and in no way related to filesystem interfacing. While this change would break backward compatibility with any external C++ projects, it to me feels like a much more appropriate location, and external projects could get away with testing the content of MRTRIX_MINOR_VERSION at compile time if preserving 3.0.x compatibility is desired.
Support for 16-bit floating-point data is via a single header include file, which is integrated into the repository, just as was done previously with JSON for Modern C++. It however occurs to me that the utilisation of such libraries is not broadcast to the same extent as are external dependencies that are listed as prerequisites for compilation. I wonder if we should have somewhere listed all packages on which the software depends, regardless of whether they are compile-time dependencies or imported directly into the repository.
16-bit floating-point support is currently implemented in a manner specific to the NPY file format for matrix or vector data. It does however occur to me that Float16 could theoretically be just another type in MR::DataType, with get/set functions that use the new 16-bit floating-point library, and the MRtrix image format would inherit support for such. That would potentially simplify the NPY code slightly, but would open up a whole bunch of requisite testing to make sure it doesn't break something else (e.g. no idea how easy or difficult this data type might be to support in mrview without trying it).
There is scope to speed up the NPY load / save operations in cases where data type / endianness / row vs. column ordering is identical between disk and RAM, by doing a direct memcpy() with the memory-mapped region.

- Move files fetch_store.* from core/image_io/ to core/. - Rename some functions to reflect the fact that they perform a linear transform of intensities in addition to fetching or storing raw data. - Add new functions that provide access to such template functions but without the use of that linear transformation, which tends to be specific to image data.

New file core/file/matrix.h now contains functions relating to the loading and saving of numerical matrix data from / to text file. In addition, these functions are now in namespace File::Matrix::*, rather than MR::*. The exception is function parse_matrix(), which has instead been moved to file core/mrtrix.h.

Any MRtrix3 command that loads vector / matrix data from text files, or saves such data to text files, should now transparently handle binary NumPy .npy files. This includes support for float16 data, and floating-point numerical data can be saved in float16 with setting of the corresponding MRtrix config file entry.

jdtournier · 2022-02-25T18:14:07Z

The functions for loading & saving matrix data from / to the filesystem are here proposed to be moved from the ::MR namespace in file core/math/math.h to the ::MR::File::Matrix namespace in file core/file/matrix.h.

👍 Happy with that.

I wonder if we should have somewhere listed all packages on which the software depends, regardless of whether they are compile-time dependencies or imported directly into the repository.

Fully agree. I've had similar thoughts in the past, but not done anything about them... External code that we import wholesale includes the original NIfTI headers, glLoadGen to generate the src/gui/opengl/gl_core_3_3.* files (though the project seems defunct now - this is the only reference I can find to it on the web...), along with JSON for Modern C++. There may be others I've forgotten about...

It does however occur to me that Float16 could theoretically be just another type in MR::DataType, with get/set functions that use the new 16-bit floating-point library, and the MRtrix image format would inherit support for such.

Yes, that sounds like the most sensible way to do it!

but would open up a whole bunch of requisite testing to make sure it doesn't break something else (e.g. no idea how easy or difficult this data type might be to support in mrview without trying it).

I wouldn't worry about MRView - as long as the internal in-RAM representation is one it can deal with, it doesn't matter how it was stored on file. I'd be surprised if that broke anything...

core/datatype.h

core/fetch_store.cpp

The MRtrix image format now supports half-precision floating-point data. Some handling of the NumPy .npy array file format has been simplified to accommodate these deeper changes.

Lestropie · 2022-03-01T05:29:42Z

16-bit floating-point images working, and appear properly in mrview:

$ mrconvert dti_FA.nii.gz -datatype float16le - | mrinfo -
mrconvert: [100%] uncompressing image "dti_FA.nii.gz"
mrconvert: [100%] copying from "dti_FA.nii.gz" to "./mrtrix-tmp-Oi6GFd.mif"
************************************************
Image name:          "./mrtrix-tmp-Oi6GFd.mif"
************************************************
  Dimensions:        104 x 104 x 72
  Voxel size:        2.01923 x 2.01923 x 2
  Data strides:      [ -1 2 3 ]
  Format:            Internal pipe
  Data type:         16 bit float (little endian)
  Intensity scaling: offset = 0, multiplier = 1
  Transform:               0.9953    -0.08673     0.04235      -90.83
                          0.09543      0.9501     -0.2969      -106.5
                         -0.01449      0.2996       0.954        -127
  command_history:   C:\msys64\home\rob\src\mrtrix3\bin\mrconvert.exe dti_FA.nii.gz -datatype float16le -  (version=3.0.3-346-g0dc2a396-dirty)
  comments:          FSL5.0
  mrtrix_version:    3.0.3-346-g0dc2a396-dirty

Conflicts: cmd/connectomestats.cpp core/image_io/fetch_store.cpp core/math/stats/shuffle.cpp src/dwi/tractography/SIFT2/tckfactor.cpp

- Prevent endianness warning on single-byte data types. - Explicitly disable complex datatype support. - Comment out residual debugging print statements. - Move large functions to .cpp file by isolating those sections that were dependent on templates and only having those within the .h file. - Explicitly handle bitwise data in the same way in which the NPY format supports it: as one byte per bit of data. This requires explicit special handling at both read and write time as it is incompatible with packed bit storage. - Provide a ReadInfo intermediate struct during file read that may potentially be used to construct an Eigen::Map if the data type / dimension / order are known a priori.

Lestropie · 2023-07-16T06:05:28Z

Worth noting on this PR, given its import of the half header-only library, that C++23 integrates a 16-bit floating-point format into the standard.

In #1509, a CI failure was encountered where on Mac ARM64 the shared library would not link, complaining about the absence of __set_fetch_function() for unsigned long datatype. It is believed that this is being caused by the presence of MR::Math::load_matrix() with size_t template type being used for loading pre-defined permutation matrices. To address this, the new typedef MR::Math::Stats::index_type is being introduced to refer to any item being indexed into an array or matrix. A 32-bit unsigned integer should be sufficient for all cases of such (the only possible exception is the calculation of the maximal number of shuffles, for which 64-bit unsigned integers are used, just for the sake of potentially calculating and reporting that number, even if it turns out to be much larger than the number of shuffles actually utilised).

Changing indexing variable from signed to unsigned integer in 9a6f0bb introduced problem because of decrementing loop attempting to detect a transition past zero.

Further continuation of typedef adoption in 9a6f0bb.

Lestropie · 2023-07-21T06:03:51Z

Still having some outstanding issues with CI on this PR.

I had a couple of issues compiling on Mac that have been resolved. One of them I'm still a little confused by: it didn't like compiling File::NPY::load_matrix<size_t>() because it claimed __set_store_functions<unsigned long>() didn't exist, even though the functions for uint64_t should all have been explicitly instantiated for the sake of image IO. But I got around it by just not calling load_matrix<size_t>(), and instead typedef-ing an index_type as uint32_t for use all throughout the relevant code.

Where it's still stuck now is my new unit tests for NPY support, which require importing numpy in CI (I both generate NPYs in C++ and read them in Python, and generate NPYs in Python and read them in C++).

On Mac, it's added to the list of packages to install, and looks to be installed just fine, but then when the corresponding script is executed Python claims there's no such module. Maybe a Mac person can help me with that one.
On MSYS2, numpy refuses to install. By default, it's using the msys version of Python. There's no pacman package for an msys version of numpy. So I'm trying to install numpy through pip. But it fails, regardless of whether or not I upgrade pip beforehand. I have alternatively tried instead installing both Python and numpy through the mingw64 channel; but in that case Python scripts just don't execute at all.

fionaEyoung · 2023-08-04T12:04:19Z

On Mac, it's added to the list of packages to install, and looks to be installed just fine, but then when the corresponding script is executed Python claims there's no such module. Maybe a Mac person can help me with that one.

the shebang in testing/bin/testing_check_npy:

#!/usr/bin/python3

⬆️ will be the macos-latest xcode python version (3.8 or 9 I think?) and will look for packages in somewhere like
/Library/Frameworks/Python*.framework/Versions/3.9/lib/python3.9/site-packages

but it looks like macos-latest also has homebrew python (3.11) already installed to /usr/local/bin/python3 and the associated pip3 will install to /usr/local/lib/python3.11/site-packages ( or possibly /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages I'm pretty lost about where / why homebrew puts things sometimes.)

this will be the python3 invoked by python3 -m ensurepip && pip3 install numpy

So either

/usr/bin/pip3 install numpy and #!/usr/bin/python3

or

pip3 install numpy and #!/usr/bin/env python3

should sort it out

Lestropie · 2023-08-07T05:12:11Z

Thanks @fionaEyoung and @daljit46 for tips on the Python multiple version issue; Mac checks are now working. There, I've chosen to stick to the same shebang for these unit test files as the Python executable scripts (#!/usr/bin/python3).

Unfortunately MSYS2 has a similar issue, and that one may be harder to fix. /usr/bin/python3 corresponds to the MSYS2 version of Python, which unlike the MinGW64 version, not only does not have a numpy package that can be installed through pacman, but pip fails to install the module even with python-dev installed. Possible strategies:

On the CI machine, uninstall the MSYS2 version of Python and softlink /usr/bin/python3 to /mingw64/bin/python3.
Permit the NPY unit tests to fail if numpy is not installed.
Find some other solution to get numpy working on the MSYS2 version of Python.

Regardless, we will need to reconsider the hardcoded shebang on dev (#2261, #2047).

daljit46 · 2023-08-07T09:59:35Z

Note, that we are no longer relying on MINGW64, instead the CI checks now use the newer UCRT64 environment (which uses the ucrt C library instead of the old msvcrt).

I think symlinking could be the least disrupting option here. Also, even if we manage to successfully install the numpy package for MSYS2, it would noticeably increase the CI checks time as the package needs to be built from source (there is no pre-packaged numpy for MSYS).

I personally think we should change the shebang for the scripts as mentioned in #2261. Relying on hardcoded paths is not appropriate in my opinion, and I would rather use #!/usr/bin/env python3 to give the user the possibility to choose what python executable they wish to use. I see very little benefit in using #!/usr/bin/python.

Lestropie · 2023-08-08T03:11:39Z

even if we manage to successfully install the numpy package for MSYS2, it would noticeably increase the CI checks time

If that's the case, that would IMO be an argument for bypassing those checks in the case where numpy is not installed. As long as the relevant unit tests get executed in at least one environment it should be fine; it's not something for which I expect the behaviour to differ between OS's.

I personally think we should change the shebang for the scripts as mentioned in #2261.

I think I'm leaning in the same direction, though I haven't re-familiarised myself with the debate that led to the current decision. Importantly "#!/usr/bin/python3" is only on dev and is not included in any tag. But we can flesh that out in #2261, and figure out there what the consequences might be in the context of these changes.

Lestropie and others added 5 commits February 21, 2022 22:01

Import half-precision floating-point library

5260dd2

core/file/npy.h: Fix compiler warning

0af1457

Lestropie added the enhancement label Feb 25, 2022

Lestropie added this to the 3.1.0 updates milestone Feb 25, 2022

Lestropie self-assigned this Feb 25, 2022

Lestropie mentioned this pull request Feb 28, 2022

Connectome mapping execution duration sina-mansour/UKB-connectomics#21

Closed

tckstats: -dump option supports .npy format

0dc2a39

Lestropie commented Mar 1, 2022

View reviewed changes

core/datatype.h Show resolved Hide resolved

Lestropie commented Mar 1, 2022

View reviewed changes

core/fetch_store.cpp Show resolved Hide resolved

Generalise half-precision float support

58b5bb1

The MRtrix image format now supports half-precision floating-point data. Some handling of the NumPy .npy array file format has been simplified to accommodate these deeper changes.

Lestropie force-pushed the npy branch from b5c672c to 58b5bb1 Compare March 1, 2022 04:55

Lestropie mentioned this pull request Mar 10, 2022

Better handling of < 3D images #1664

Open

Lestropie mentioned this pull request Apr 5, 2022

NPY file format for regular matrix data tee-ar-ex/trx-python#21

Open

Lestropie mentioned this pull request Sep 15, 2022

[ENH] Proposal for multidimensional array file format bids-standard/bids-specification#197

Open

Lestropie mentioned this pull request Dec 21, 2022

Open discussion RE fixel data handling #2551

Open

Lestropie mentioned this pull request May 10, 2023

Track files: Implement file format handlers #411

Open

Lestropie mentioned this pull request Jun 7, 2023

New fixel correspondence: Data format #2200

Open

This was referenced Jun 30, 2023

WIP: Implement Pydra code-generators #2665

Draft

fixelconnectivity: Use NPY #2670

Open

Lestropie added 6 commits July 14, 2023 12:33

Merge branch 'dev' into npy

80a0134

Conflicts: cmd/connectomestats.cpp core/image_io/fetch_store.cpp core/math/stats/shuffle.cpp src/dwi/tractography/SIFT2/tckfactor.cpp

Resolve compilation of NPY additions against latest dev branch

a3be07c

NPY support: Remove unnecessary variable ghosting

25c2b74

Update test commands to reflect 6e1562f

08836f5

Add unit tests for NPY handling

fc18ea2

Lestropie marked this pull request as ready for review July 15, 2023 14:03

Lestropie mentioned this pull request Jul 15, 2023

NPY: Potential enhancements #2682

Open

Lestropie added 2 commits July 16, 2023 16:20

Neaten header include for half-precision floating-point

749eb2c

core/half.h: Modifications for compilation on Mac

088ca97

Lestropie force-pushed the npy branch from c6aa282 to 088ca97 Compare July 21, 2023 00:18

Lestropie mentioned this pull request Jul 21, 2023

CI: Install Python / numpy via mingw64 #2686

Merged

Lestropie force-pushed the npy branch from ee9c289 to 9a6f0bb Compare July 21, 2023 05:15

Lestropie added 2 commits July 21, 2023 15:32

Fix Math::Stats::Shuffler::generate_all_signflips()

45ada4f

Changing indexing variable from signed to unsigned integer in 9a6f0bb introduced problem because of decrementing loop attempting to detect a transition past zero.

Math::Stats::FWE: Fix signed / unsigned integer comparisons

b5849d0

Further continuation of typedef adoption in 9a6f0bb.

Lestropie mentioned this pull request Jul 24, 2023

Experimental CMake support #2620

Closed

CI: Alternative solution to installing numpy on Mac

864c3da

Lestropie force-pushed the npy branch from f93def4 to 864c3da Compare August 5, 2023 08:13

Docs: Include float16 in -datatype choices for various commands

8b22c75

Lestropie force-pushed the npy branch 2 times, most recently from 9dcc4a8 to 8e29ee7 Compare August 7, 2023 03:32

CI: Bypass NPY unit tests on Windows

361ea9e

Lestropie force-pushed the npy branch from 8e29ee7 to 361ea9e Compare August 8, 2023 03:16

Merge branch 'dev' into npy

fc91c33

Lestropie merged commit 598b076 into dev Aug 8, 2023
5 checks passed

Lestropie deleted the npy branch August 8, 2023 05:01

Lestropie mentioned this pull request Mar 1, 2024

cmake: tests that are not .cpp standalones #2836

Closed

daljit46 mentioned this pull request Mar 4, 2024

Fix the execution of NPY tests #2842

Merged

daljit46 mentioned this pull request May 11, 2024

Testing: shell script per test #2865

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Npy file format support #2437

Npy file format support #2437

Lestropie commented Feb 25, 2022 •

edited by jdtournier

Loading

jdtournier commented Feb 25, 2022

Lestropie commented Mar 1, 2022

Lestropie commented Jul 16, 2023

Lestropie commented Jul 21, 2023

fionaEyoung commented Aug 4, 2023

Lestropie commented Aug 7, 2023

daljit46 commented Aug 7, 2023 •

edited

Loading

Lestropie commented Aug 8, 2023

Npy file format support #2437

Npy file format support #2437

Conversation

Lestropie commented Feb 25, 2022 • edited by jdtournier Loading

jdtournier commented Feb 25, 2022

Lestropie commented Mar 1, 2022

Lestropie commented Jul 16, 2023

Lestropie commented Jul 21, 2023

fionaEyoung commented Aug 4, 2023

Lestropie commented Aug 7, 2023

daljit46 commented Aug 7, 2023 • edited Loading

Lestropie commented Aug 8, 2023

Lestropie commented Feb 25, 2022 •

edited by jdtournier

Loading

daljit46 commented Aug 7, 2023 •

edited

Loading