Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: cmake-generated wrappers for executables #2741

Closed
wants to merge 12 commits into from

Conversation

Lestropie
Copy link
Member

Experimental alternative to #2737 for addressing #2730.

Some comments in the initial commit, but I'll try to explain the wider concept here, especially given I don't have the capability with cmake to churn out a fully working proposal.

Here is what the source code looks like with this PR (reduced):

├── algorithm.py
├── app.py
├── CMakeLists.txt
├── fsl.py
├── image.py
├── __init__.py
├── matrix.py
├── path.py
├── phaseencoding.py
├── run.py
├── scripts
│   ├── 5ttgen
│   │   ├── freesurfer.py
│   │   ├── fsl.py
│   │   ├── gif.py
│   │   ├── hsvs.py
│   │   └── __init__.py
│   ├── dwi2response
│   │   ├── dhollander.py
│   │   ├── fa.py
│   │   ├── __init__.py
│   │   ├── manual.py
│   │   ├── msmt_5tt.py
│   │   ├── tax.py
│   │   └── tournier.py
│   ├── dwishellmath.py
│   ├── ...
│   └── responsemean.py
├── sh.py
├── utils.py
└── _version.py.in

Note one change in particular: 5ttgen and dwi2response don't have .py files in python/scripts/; instead, the corresponding code lies in eg. python/scripts/5ttgen/__init__.py.

Within the build directory, this will be arranged as follows:

.
├── bin
│   ├── 5tt2gmwmi
│   ├── 5tt2vis
│   ├── ...
│   ├── 5ttgen
│   ├── dwi2response
│   ├── dwishellmath
│   └── responsemean
├── lib
│   └── mrtrix3
│       ├── algorithm.py
│       ├── app.py
│       ├── fsl.py
│       ├── image.py
│       ├── __init__.py
│       ├── matrix.py
│       ├── path.py
│       ├── phaseencoding.py
│       ├── run.py
│       ├── sh.py
│       ├── utils.py
│       ├── _version.py
│       └── _version.py.in
└── src
    └── mrtrix3
        ├── _5ttgen
        │   ├── freesurfer.py
        │   ├── fsl.py
        │   ├── gif.py
        │   ├── hsvs.py
        │   ├── __init__.py
        ├── dwi2response
        │   ├── dhollander.py
        │   ├── fa.py
        │   ├── __init__.py
        │   ├── manual.py
        │   ├── msmt_5tt.py
        │   ├── tax.py
        │   └── tournier.py
        ├── dwishellmath.py
        └── responsemean.py

(Note one minor change: need to use directory name "_5ttgen" as Python modules can't start with a number)

Files within bin/ corresponding to Python commands would be generated during the cmake build stage. Their contents would look something like (with relevant content substituted on a per-command basis):

#!/usr/bin/python
# -*- coding: utf-8 -*-
import importlib.util
import os
import sys

api_location = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '..', 'lib', 'mrtrix3', '__init__.py'))
api_spec = importlib.util.spec_from_file_location('mrtrix3', api_location)
api_module = importlib.util.module_from_spec(api_spec)
sys.modules['mrtrix3'] = api_module
api_spec.loader.exec_module(api_module)

src_spec = importlib.util.spec_from_file_location('_5ttgen',
    os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '..', 'src', 'mrtrix3', '_5ttgen', '__init__.py')))
src_module = importlib.util.module_from_spec(src_spec)
sys.modules[src_spec.name] = src_module
src_spec.loader.exec_module(src_module)

from mrtrix3.app import _execute
import _5ttgen

_execute(_5ttgen)

(Note: would need to deal with the fact that for different commands, the source code module might be at src/mrtrix3/*.py or src/mrtrix3/*/__init__.py)

Potential disadvantage of an approach like this is that it's no longer possible to have just a standalone Python file that can execute against the API as long as the API can be found; one would need to do whatever setup ends up being requisite for external projects.

Had planned to write up a pros & cons list, but have to hit the sack.
Open to thoughts.

Change filesystem structure of python/.
No need to preserve the lib/mrtrix3/ sub-directory structure in the repository; that can be constructed exclusively in the build directory.
Executables now reside in sub-directory python/scripts/.
For algorithm-based scripts, all code is placed in a sub-directory of python/scripts/, with the previous bin/ file contents now placed in __init__.py of that directory, thus co-locating all relevant source code for such commands.
The algorithm module is simplified somewhat since the algorithm files are co-located with the interface source file.
Delete file python/bin/mrtrix3.py, and script source files no longer load the mrtrix3 module and invoke the execute function at the end of the file; an alternative mechanism for loading the API and script entrypoint will be created subsequently to this commit.
@Lestropie
Copy link
Member Author

Okay, extended this a little.

Successfully have a configuration where the cmake build directory contains softlinks to the relevant source files, such that edits can be testing without touching cmake. When cmake --install is run, the source files are copied into the destination directory.

So overall this seems to tick a lot of boxes:

  1. Ability to run code either from the cmake build directory or from the installed directory, and have the utilised library files / source code always correspond to that location
  2. Ability to test changes to scripts without having to run build

Potential downsides:

  1. Can't have just a standalone .py file that runs against the API; have to generate a corresponding binary executable that imports and executes the custom module
  2. On my system at least, cmake currently hardwires the Python interpreter inserted into the shebangs of the generated executables to a specific version of Python3, rather than a generic Python3 link. Could just ignore this and hardwire the shebangs to /usr/bin/python3 I suppose?

Other notes:

  1. Generated binaries could refer to absolute filesystem paths rather than relative paths. It would be necessary to generate two separate executables for each command, one for the build directory to permit execution from there and one tailored for the installation location.
  2. API files go into <InstallDir>/lib/mrtrix3/*.py because want "mrtrix3" to be the module name. Source code is currently going into <InstallDir>/src/mrtrix3/*. THe extra "mrtrix3/" sub-directory there technically isn't needed; but if someone is to install MRtrix3 into a directory that is not specific to MRtrix3, it might be preferable to preserve that distinction.

Would very much like feedback on this proposal. If we want to go for something else I need to know why & how.

@Lestropie
Copy link
Member Author

Here's what the structure looks like under this PR.

~/src/mrtrix3$ tree python/
python/
├── algorithm.py
├── app.py
├── CMakeLists.txt
├── fsl.py
├── image.py
├── __init__.py
├── matrix.py
├── path.py
├── phaseencoding.py
├── run.py
├── scripts
│   ├── 5ttgen
│   │   ├── freesurfer.py
│   │   ├── fsl.py
│   │   ├── gif.py
│   │   ├── hsvs.py
│   │   └── __init__.py
│   ├── blend.py
│   ├── convert_bruker.py
│   ├── dwi2mask
│   │   ├── 3dautomask.py
│   │   ├── ants.py
│   │   ├── b02template.py
│   │   ├── consensus.py
│   │   ├── fslbet.py
│   │   ├── hdbet.py
│   │   ├── __init__.py
│   │   ├── legacy.py
│   │   ├── mean.py
│   │   ├── mtnorm.py
│   │   ├── synthstrip.py
│   │   └── trace.py
│   ├── dwi2response
│   │   ├── dhollander.py
│   │   ├── fa.py
│   │   ├── __init__.py
│   │   ├── manual.py
│   │   ├── msmt_5tt.py
│   │   ├── tax.py
│   │   └── tournier.py
│   ├── dwibiascorrect
│   │   ├── ants.py
│   │   ├── fsl.py
│   │   ├── __init__.py
│   │   └── mtnorm.py
│   ├── dwibiasnormmask.py
│   ├── dwicat.py
│   ├── dwifslpreproc.py
│   ├── dwigradcheck.py
│   ├── dwinormalise
│   │   ├── group.py
│   │   ├── __init__.py
│   │   ├── manual.py
│   │   └── mtnorm.py
│   ├── dwishellmath.py
│   ├── for_each.py
│   ├── gen_scheme.py
│   ├── labelsgmfix.py
│   ├── mask2glass.py
│   ├── mrtrix_cleanup.py
│   ├── notfound.py
│   ├── population_template.py
│   └── responsemean.py
├── sh.py
├── utils.py
├── _version.py
└── _version.py.in
~/src/mrtrix3$ tree build/lib
...
├── lib
│   └── mrtrix3
│       ├── algorithm.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/algorithm.py
│       ├── app.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/app.py
│       ├── fsl.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/fsl.py
│       ├── image.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/image.py
│       ├── __init__.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/__init__.py
│       ├── matrix.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/matrix.py
│       ├── path.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/path.py
│       ├── phaseencoding.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/phaseencoding.py
│       ├── __pycache__
│       │   ├── app.cpython-310.pyc
│       │   ├── image.cpython-310.pyc
│       │   ├── __init__.cpython-310.pyc
│       │   ├── path.cpython-310.pyc
│       │   ├── run.cpython-310.pyc
│       │   ├── utils.cpython-310.pyc
│       │   └── _version.cpython-310.pyc
│       ├── run.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/run.py
│       ├── sh.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/sh.py
│       ├── utils.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/utils.py
│       └── _version.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/_version.py
...
└── src
    ├── mrtrix3
    │   ├── _5ttgen
    │   │   ├── freesurfer.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/5ttgen/freesurfer.py
    │   │   ├── fsl.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/5ttgen/fsl.py
    │   │   ├── gif.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/5ttgen/gif.py
    │   │   ├── hsvs.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/5ttgen/hsvs.py
    │   │   └── __init__.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/5ttgen/__init__.py
    │   ├── blend.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/blend.py
    │   ├── convert_bruker.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/convert_bruker.py
    │   ├── dwi2mask
    │   │   ├── 3dautomask.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/3dautomask.py
    │   │   ├── ants.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/ants.py
    │   │   ├── b02template.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/b02template.py
    │   │   ├── consensus.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/consensus.py
    │   │   ├── fslbet.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/fslbet.py
    │   │   ├── hdbet.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/hdbet.py
    │   │   ├── __init__.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/__init__.py
    │   │   ├── legacy.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/legacy.py
    │   │   ├── mean.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/mean.py
    │   │   ├── mtnorm.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/mtnorm.py
    │   │   ├── synthstrip.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/synthstrip.py
    │   │   └── trace.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2mask/trace.py
    │   ├── dwi2response
    │   │   ├── dhollander.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/dhollander.py
    │   │   ├── fa.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/fa.py
    │   │   ├── __init__.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/__init__.py
    │   │   ├── manual.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/manual.py
    │   │   ├── msmt_5tt.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/msmt_5tt.py
    │   │   ├── tax.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/tax.py
    │   │   └── tournier.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwi2response/tournier.py
    │   ├── dwibiascorrect
    │   │   ├── ants.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwibiascorrect/ants.py
    │   │   ├── fsl.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwibiascorrect/fsl.py
    │   │   ├── __init__.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwibiascorrect/__init__.py
    │   │   └── mtnorm.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwibiascorrect/mtnorm.py
    │   ├── dwibiasnormmask.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwibiasnormmask.py
    │   ├── dwicat.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwicat.py
    │   ├── dwifslpreproc.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwifslpreproc.py
    │   ├── dwigradcheck.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwigradcheck.py
    │   ├── dwinormalise
    │   │   ├── group.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwinormalise/group.py
    │   │   ├── __init__.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwinormalise/__init__.py
    │   │   ├── manual.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwinormalise/manual.py
    │   │   └── mtnorm.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwinormalise/mtnorm.py
    │   ├── dwishellmath.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/dwishellmath.py
    │   ├── for_each.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/for_each.py
    │   ├── gen_scheme.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/gen_scheme.py
    │   ├── labelsgmfix.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/labelsgmfix.py
    │   ├── mask2glass.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/mask2glass.py
    │   ├── mrtrix_cleanup.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/mrtrix_cleanup.py
    │   ├── notfound.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/notfound.py
    │   ├── population_template.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/population_template.py
    │   ├── __pycache__
    │   │   └── dwishellmath.cpython-310.pyc
    │   └── responsemean.py -> /home/unimelb.edu.au/robertes/src/mrtrix3/python/scripts/responsemean.py

~/src/mrtrix3$ tree ~/bin/mrtrix3 # install location
.
├── bin
│   ├── 5tt2gmwmi
│   ├── 5tt2vis
│   ├── 5ttcheck
│   ├── 5ttedit
│   ├── 5ttgen
│   ├── afdconnectivity
│   ├── amp2response
│   ├── amp2sh
│   ├── blend
│   ├── connectome2tck
│   ├── connectomeedit
│   ├── connectomestats
│   ├── convert_bruker
│   ├── dcmedit
│   ├── dcminfo
│   ├── dirflip
│   ├── dirgen
│   ├── dirmerge
│   ├── dirorder
│   ├── dirsplit
│   ├── dirstat
│   ├── dwi2adc
│   ├── dwi2fod
│   ├── dwi2mask
│   ├── dwi2response
│   ├── dwi2tensor
│   ├── dwibiascorrect
│   ├── dwibiasnormmask
│   ├── dwicat
│   ├── dwidenoise
│   ├── dwiextract
│   ├── dwifslpreproc
│   ├── dwigradcheck
│   ├── dwinormalise
│   ├── dwishellmath
│   ├── fixel2peaks
│   ├── fixel2sh
│   ├── fixel2tsf
│   ├── fixel2voxel
│   ├── fixelcfestats
│   ├── fixelconnectivity
│   ├── fixelconvert
│   ├── fixelcorrespondence
│   ├── fixelcrop
│   ├── fixelfilter
│   ├── fixelreorient
│   ├── fod2dec
│   ├── fod2fixel
│   ├── for_each
│   ├── gen_scheme
│   ├── label2colour
│   ├── label2mesh
│   ├── labelconvert
│   ├── labelsgmfix
│   ├── labelstats
│   ├── mask2glass
│   ├── maskdump
│   ├── maskfilter
│   ├── mesh2voxel
│   ├── meshconvert
│   ├── meshfilter
│   ├── mraverageheader
│   ├── mrcalc
│   ├── mrcat
│   ├── mrcentroid
│   ├── mrcheckerboardmask
│   ├── mrclusterstats
│   ├── mrcolour
│   ├── mrconvert
│   ├── mrdegibbs
│   ├── mrdump
│   ├── mredit
│   ├── mrfilter
│   ├── mrgrid
│   ├── mrhistmatch
│   ├── mrhistogram
│   ├── mrinfo
│   ├── mrmath
│   ├── mrmetric
│   ├── mrregister
│   ├── mrstats
│   ├── mrthreshold
│   ├── mrtransform
│   ├── mrtrix_cleanup
│   ├── mrview
│   ├── mtnormalise
│   ├── notfound
│   ├── peaks2amp
│   ├── peaks2fixel
│   ├── population_template
│   ├── responsemean
│   ├── sh2amp
│   ├── sh2peaks
│   ├── sh2power
│   ├── sh2response
│   ├── shbasis
│   ├── shconv
│   ├── shview
│   ├── tck2connectome
│   ├── tck2fixel
│   ├── tckconvert
│   ├── tckdfc
│   ├── tckedit
│   ├── tckgen
│   ├── tckglobal
│   ├── tckinfo
│   ├── tckmap
│   ├── tckresample
│   ├── tcksample
│   ├── tcksift
│   ├── tcksift2
│   ├── tckstats
│   ├── tcktransform
│   ├── tensor2metric
│   ├── transformcalc
│   ├── transformcompose
│   ├── transformconvert
│   ├── tsfdivide
│   ├── tsfinfo
│   ├── tsfmult
│   ├── tsfsmooth
│   ├── tsfthreshold
│   ├── tsfvalidate
│   ├── vectorstats
│   ├── voxel2fixel
│   ├── voxel2mesh
│   ├── warp2metric
│   ├── warpconvert
│   ├── warpcorrect
│   ├── warpinit
│   └── warpinvert
├── lib
│   ├── libmrtrix-core.so
│   ├── libmrtrix-gui.so
│   ├── libmrtrix-headless.so
│   └── mrtrix3
│       ├── algorithm.py
│       ├── app.py
│       ├── fsl.py
│       ├── image.py
│       ├── __init__.py
│       ├── matrix.py
│       ├── path.py
│       ├── phaseencoding.py
│       ├── __pycache__
│       │   ├── app.cpython-310.pyc
│       │   ├── image.cpython-310.pyc
│       │   ├── __init__.cpython-310.pyc
│       │   ├── path.cpython-310.pyc
│       │   ├── run.cpython-310.pyc
│       │   ├── utils.cpython-310.pyc
│       │   └── _version.cpython-310.pyc
│       ├── run.py
│       ├── sh.py
│       ├── utils.py
│       └── _version.py
├── share
│   └── mrtrix3
│       ├── 5ttgen
│       │   ├── FreeSurfer2ACT_sgm_amyg_hipp.txt
│       │   ├── FreeSurfer2ACT.txt
│       │   └── hsvs
│       │       ├── AmygSubfields.txt
│       │       └── HippSubfields.txt
│       ├── labelconvert
│       │   ├── aal2.txt
│       │   ├── aal.txt
│       │   ├── fs2lobes_cinginc_convert.txt
│       │   ├── fs2lobes_cinginc_labels.txt
│       │   ├── fs2lobes_cingsep_convert.txt
│       │   ├── fs2lobes_cingsep_labels.txt
│       │   ├── fs_a2009s.txt
│       │   ├── fs_default.txt
│       │   ├── hcpmmp1_ordered.txt
│       │   ├── hcpmmp1_original.txt
│       │   └── lpba40.txt
│       └── labelsgmfix
│           └── FreeSurferSGM.txt
└── src
    └── mrtrix3
        ├── _5ttgen
        │   ├── freesurfer.py
        │   ├── fsl.py
        │   ├── gif.py
        │   ├── hsvs.py
        │   └── __init__.py
        ├── blend.py
        ├── convert_bruker.py
        ├── dwi2mask
        │   ├── 3dautomask.py
        │   ├── ants.py
        │   ├── b02template.py
        │   ├── consensus.py
        │   ├── fslbet.py
        │   ├── hdbet.py
        │   ├── __init__.py
        │   ├── legacy.py
        │   ├── mean.py
        │   ├── mtnorm.py
        │   ├── synthstrip.py
        │   └── trace.py
        ├── dwi2response
        │   ├── dhollander.py
        │   ├── fa.py
        │   ├── __init__.py
        │   ├── manual.py
        │   ├── msmt_5tt.py
        │   ├── tax.py
        │   └── tournier.py
        ├── dwibiascorrect
        │   ├── ants.py
        │   ├── fsl.py
        │   ├── __init__.py
        │   └── mtnorm.py
        ├── dwibiasnormmask.py
        ├── dwicat.py
        ├── dwifslpreproc.py
        ├── dwigradcheck.py
        ├── dwinormalise
        │   ├── group.py
        │   ├── __init__.py
        │   ├── manual.py
        │   └── mtnorm.py
        ├── dwishellmath.py
        ├── for_each.py
        ├── gen_scheme.py
        ├── labelsgmfix.py
        ├── mask2glass.py
        ├── mrtrix_cleanup.py
        ├── notfound.py
        ├── population_template.py
        ├── __pycache__
        │   └── dwishellmath.cpython-310.pyc
        └── responsemean.py

@jdtournier
Copy link
Member

This general idea might indeed end up being the cleanest solution we can come up with...

A few quick questions:

  • why are the algorithms in a separate src/ folder, and no longer in the lib/mrtrix3/ folder? I still reckon the relevant algorithms ought be labelled as e.g. mrtrix3.dwibiascorrect.ants - i.e. be submodules of the wider mrtrix3 module, in which case I would have expected them to be located within the lib/mrtrix3/ folder. Also, I would naively expect the src/ folder to contain C/C++ sources - particularly for a project like MRtrix which is primarily written in C++. Might be OK simply to rename src/ to something else like algorithms or something (though I still reckon they ought be within the mrtrix3 module hierarchy, personally).

  • Any chance we can simplify the code to be generated by cmake for each command? It feels quite a bit more verbose than I would have expected.

  • Having looked into the problem over the last week, and what different people seem to recommend, it really feels like the site_packages approach might be the cleanest overall. It's not all that different from what you have here, but would potentially simplify things considerably - and mirrors the way virtualenvs work.

That said, having looked at Python's own packaging page, there seem to be a million recommendations, all relying on different tooling and more or less suitable for different types of packages and environments. I think it's fair to say there's no good answer here...

@Lestropie
Copy link
Member Author

why are the algorithms in a separate src/ folder, and no longer in the lib/mrtrix3/ folder?

  1. I've always felt slightly strange about those algorithms being in the "lib/" directory, since they're not "libraries" in the way I've always thought about them; in particular each file is only accessed by exactly one command. Not a problem, just ... a bit of an ick.
    But this also contrasts with your interpretation of "src/" being C++.
    Looking at the FHS Wikipedia entry, "lib/" links to "libraries", for which the definition perhaps doesn't have anything in there that completely violates such a usage. So maybe my ick about "lib" is unwarranted. Conversely, "src/" isn't even listed in FHS.

    If these algorithm source files were to move back to lib/ where they were before, the difference compared to pre-cmake is that the main command code that used to be in bin/ would now be alongside them. So there would be a slightly unusual result there in that eg. lib/mrtrix3/dwifslpreproc.py and lib/mrtrix3/app.py would live alongside one another. I'd prefer that there be a better separation between the API and the code for the commands.
    One option would be to have API exclusively as .py files and commands exclusively as sub-directories; so one would just have lib/mrtrix3/dwifslpreproc/__init__.py. Or maybe better would be to split into lib/mrtrix3/dwifslpreproc/usage.py and lib/mrtrix3/dwifslpreproc/execute.py. I'll muck around and see what I think might work, but very happy to take suggestions also.

  2. Trying to be prospective with regards to external projects. We may not yet have determined exactly how to deal with external projects following Change build system to CMake #2689 (which should perhaps be a priority since we can't tag 3.1.0 and just break external project capability). But I'm thinking about the case where one writes a Python command against the API, and that command makes use of the algorithm module. In this case, the external project would make reference to the API from the main MRtrix3 installation, and shouldn't need to make its own lib/ directory. But the source code for those algorithms needs to go somewhere. And it should be a relative path from the executable of the external project, not related to the installation location of MRtrix3 itself in any way.

    In retrospect after extending point 1 above, for external projects files would end up in lib/<projectname>/*, so a project could still be co-installed in the exact same location as MRtrix3 and there still wouldn't be conflict, or it could be installed in a different location and as long as the file in bin/ is generated correctly (ie. different locations for importing API vs. command code) it should be fine.

Any chance we can simplify the code to be generated by cmake for each command?

Probably. I just ripped #2735 to get the concept demonstration done.

... it really feels like the site_packages approach might be the cleanest overall.

I didn't completely follow the logic when you looked at this option in the meeting last week. I wanted to generate this proposal as it was on my mind but wasn't sure if it (or the differences to what you were thinking) were coming across. I need to figure out exactly what's being proposed there, and then do it and see if it's different to what you intended. From memory I did have some kind of concern with it at the time. I think it had to do with forcing creation of a python softlink and then what the guarantees were around the order of identifying modules. My suspicion right now is that it will just reduce the amount of code necessary in the bin/ executables, as the shebang-to-python-softlink will serve the same purpose as what's currently the first half of that code.

Also would need clarification on exactly what you intend by "the site_packages approach". It could be that use of the shebang-to-softlink, or you could be proposing actually putting the python API inside the user's Python site-packages directory.

@jdtournier
Copy link
Member

On the lib/ vs. src/ folder issue: to my mind, the algorithms should definitely be considered as part of a 'library'. Really, that's all (computing) libraries are: a collection of functions, many of which will implement various algorithms. I don't think it makes a great deal of difference whether they're specific to one command or not, really. But ultimately, it's all a bit subjective. The main issue for me is the name src/: when you see it on the filesystem (and you do see it, I have a /usr/src on my Arch system), it's invariably to hold the source C/C++ code for various bits that may need to be recompiled at some point (e.g. kernel modules).

I have to admit, I've not so far given a great deal of thought as to how we might end up dealing with external modules... Good to see you've got your eye on that! But like you said, I think this would work regardless of whether it's in lib/ or src/.

As to how the site_packages idea is supposed to work, I'll try to sketch it out here, but bear in mind I could easily have misunderstood certain aspects... I've had a go at doing this in my local build/ folder, and after a fair bit of trial and error, this seemed to work, at least for dwi2response as a proof of concept. Note that it still doesn't actually run as there would need to be quite a few additional changes to the BIN_PATH, etc. but at least it finds the mrtrix3 module, etc.

├── bin
│   ├── dwi2response -> ../lib/mrtrix3/bin/dwi2response
│   └── ...
└── lib
   └── mrtrix3
       ├── bin
       │   ├── dwi2response                    <-- see contents below
       │   ├── ...
       │   ├── python -> /usr/bin/python
       │   ├── python3 -> python
       │   └── python3.11 -> python
       ├── lib
       │   └── python3.11
       │       └── site-packages
       │           └── mrtrix3
       │               ├── algorithm.py
       │               ├── app.py
       │               ├── dwi2response
       │               │   ├── dhollander.py
       │               │   ├── fa.py
       │               │   ├── __init__.py      <-- see contents below
       │               │   ├── manual.py
       │               │   ├── msmt_5tt.py
       │               │   ├── tax.py
       │               │   └── tournier.py
       │               ├── ...
       │               ├── __init__.py
       │               └── _version.py
       ├── lib64 -> lib
       └── pyvenv.cfg      <-- empty file, just needs to be present

lib/mrtrix3/bin/dwi2response

#!/home/donald/exp/mrtrix3/build/lib/mrtrix3/bin/python
# -*- coding: utf-8 -*-
import re
import sys

from mrtrix3.dwi2response import main

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

lib/mrtrix3/lib/python3.11/site-packages/mrtrix3/dwi2response/init.py

Contains everything that was in original dwi2response script, with the final invocation at the end replaced with this:

...

# Execute the script
def main():
  mrtrix3.execute() #pylint: disable=no-member

Having gone through that exercise, I'm not sure I like this approach after all, it feels like an even more convoluted hack than anything we've discussed so far...

@jdtournier
Copy link
Member

jdtournier commented Oct 31, 2023

Having looked at various posts and suggestions online, it seems to me that the simplest solution really is just to add the path to the mrtrix3 module by prepending to sys.path. It's probably the most common suggestion I've seen on the various tutorials I've come across...

I had a go at mocking this up, it feels a lot more sane to me, and pretty close to what we have already. See what you think:

├── bin
│   ├── ...
│   ├── dwi2response      <-- see contents below
│   └── ...               <-- note: no mrtrix3.py
└── lib
    └── mrtrix3
        ├── app.py
        ├── ...
        ├── dwi2response
        │   ├── dhollander.py
        │   ├── fa.py
        │   ├── __init__.py                <-- see contents below
        │   ├── manual.py
        │   ├── msmt_5tt.py
        │   ├── tax.py
        │   └── tournier.py
        ├── ...
        ├── __init__.py
        ├── ...
        └── _version.py

bin/dwi2response

This could be trivially generated by cmake, following your original suggestion:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import sys, os

sys.path.insert (0, os.path.join (os.path.dirname (os.path.realpath(__file__)), os.pardir, 'lib'))
from mrtrix3.dwi2response import main

if __name__ == '__main__':
    sys.exit(main())

lib/mrtrix3/dwi2response/init.py

Original contents of bin/dwi2reponse, with final invocation replaced with:

...

# Execute the script
def main():
  import mrtrix3
  mrtrix3.execute()

This one seems to work fully as expected out of the box... It doesn't solve the problem that the bin/ and lib/ folders need to be co-located, so this approach still can't be used to modify the python sources in place in the original source folder, but I guess that's something we may have to live with - did your original suggestion allow for that kind of use case?

As to handling external modules, it's a trivial addition to add one more entry to sys.path in the short script in the bin/ folder if required, which I reckon can be handled with a bit of cmake magic?

This was referenced Feb 26, 2024
@Lestropie
Copy link
Member Author

Need to have a go at restarting & resolving this one.

Reading latest comments from @jdtournier: I'm dissuaded by solutions that involve any content in the code repository residing in a directory called bin/. I feel like one of the benefits of the cmake transition is the better separation of code vs. executable tools; previously our bin/ directory being shared by Python code files stored in the repository and C++ compiled binaries generated by build was kinda hacky, and we needed weird tricks to deal with it.

It is also in retrospect kind of strange to have such a wide separation between the location of eg. dwi2response, and the location of the source files for the various algorithms. Previously it was somewhat motivated in that the former was executable whereas the latter was not; but with the actual executables being short cmake-generated files, it makes more sense to me to co-locate them.

So I think what I'm leaning towards at the moment, at least as far as repository filesystem structure is concerned, is:

└── python
    └── lib
        └── mrtrix3
            ├── __init__.py
            ├── app.py
            ├── ...
            ├── dwi2response      <- Uses algorithm module
                ├── __init__.py
                ├── usage.py
                ├── execute.py
                ├── dhollander
                    ├── __init__.py
                    ├── usage.py
                    └── execute.py
                ├── fa
                    ├── __init__.py
                    ├── usage.py
                    └── execute.py
                ├── manual
                    ├── __init__.py
                    ├── usage.py
                    └── execute.py
                ├── msmt_5tt
                    ├── __init__.py
                    ├── usage.py
                    └── execute.py
                ├── tax
                    ├── __init__.py
                    ├── usage.py
                    └── execute.py
                └── tournier
                    ├── __init__.py
                    ├── usage.py
                    └── execute.py
            ├── dwishellmath      <- Does not use algorithm module
                ├── __init__.py
                ├── usage.py
                ├── execute.py
            └── utils.py

# Note: python/lib/mrtrix3/_version.py not shown, as it's only generated in the build directory

That would:

  • Co-locate all source code
  • Arguably be a more Python-conventional structure
  • Encourage functionalisation of different algorithmic components to reduce file size
    (currently looking at you, population_template...)
  • Mean that generation of the executable files would be based on only one source code location within the build directory, not two (eg. if split between */bin/* / */lib/* / */src/*)

Potential downside is that file names in your editor won't intrinsically convey which particular command is currently being edited. I've noticed that at least VSCode is quite good in that it shows more of the file path if you have concurrently opened two files with the same base name.


Please do chime in if you think I'm going down the wrong path.

The topics of:

  1. What should be the content of the cmake-generated executables;
  2. How these should add the appropriate filesystem path to sys.path; ie. explicitly or through softlink trickery;
  3. Whether these should be softlinked or copied to the build directory

can I think be handled separately once the filesystem structure is decided.

@Lestropie
Copy link
Member Author

Whether these should be softlinked or copied to the build directory

Could / should this be toggled with an environment variable at the cmake configure stage?

@Lestropie
Copy link
Member Author

Lestropie commented Mar 5, 2024

Actually, revising the proposal above. Within the code repository, there is no need to have the lib/mrtrix3/ sub-directory structure. The code repo could just have API files within python/, and only at build / install stage are they placed into lib/mrtrix3/.

Edit: Note that this was already the case with the current state of the code at time of writing (72ee94f).

@Lestropie Lestropie mentioned this pull request Mar 5, 2024
Lestropie and others added 2 commits March 6, 2024 19:04
- Do not split between "lib" and "src": All Python content goes into lib/mrtrix3/.
- Rather than having some executable scripts living as standalone .py files and others as sub-directories (to deal with separate algorithm files), place all files that are not part of the API into sub-directory trees. For now, all files have been renamed to __init__.py.
- For each script (or algorithm thereof), split code across at least usage.py and execute.py files.
- Remove Python scripts that are not based on the Python API.
@Lestropie Lestropie modified the milestones: 3.2.0 updates, 3.1.0 updates Mar 6, 2024
- Remove mrtrix3.algorithm module. Its operation was incommensurate with the transition to cmake, where dependencies are expected to be known prior to execution, as it scanned the filesystem at execution time for the sake of discovering any newly added algorithms. Most of the prior functionality is replaced with overloading function app.Parser.add_subparsers().
- Greatly simplify the content of cmake-generated executables for Python commands.
@Lestropie
Copy link
Member Author

Lestropie commented Mar 8, 2024

Candidate further changes to this proposal are in #2850.

I think my mind is converging on the solution in terms of:

  • cmake-generated executable files, which have a minimal amount of code to them (Python cmake binwrappers: Split all commands into multiple files #2850 reduces drastically compared to what's currently in here at time of writing): just loading the version-matched mrtrix3 module based on a relative path from the executable, and passing the module for the executed command to app._execute(), which invokes usage() and execute() appropriately.
    • Going for a genuine "site-packages-like" solution would require someone with a proper understanding of how that all works. Personally I'm not invested in that level of homologation, especially for a piece of software that is primarily not written in Python.
    • Having cmake create a python softlink, in order to intrinsically include the relevant location in sys.path, feels sort of obfuscated. Having each file in bin/ do a sys.path.insert() is IMO far more explicit and transparent.
  • Code pertaining to individual commands living proximally to the API modules.

What is still up in the air is the filesystem locations of command source code relative to API source code. This has to take into consideration the fact that currently, some commands are a standalone .py, whereas others have "algorithms" each specified as a .py file in a subdirectory named according to the corresponding command. There's been a few options pushed around, so I want to make a list and try to draw opinions on them.

If we here assume a root of python/mrtrix3/, the questions look something like:

1. Sub-directory to separate commands from API modules?

__init__.py
app.py
...
utils.py
bin/    # or "scripts"
    # Command content appears here

vs.

__init__.py
app.py
...
# Command content interspersed
utils.py

In the scenario where some commands appear as a single .py file (see below), this results in a lack of distinction between shared utilities and individual commands; so I would only potentially favour the second option if all commands are stored in their own sub-directories, such that the classification of file vs. directory distinguishes the two.

2. Sub-directories for only algorithm-based commands, or all commands?

Assuming just for this example that:

  • Commands are stored in a sub-directory to separate them from the shared utilities;
  • For non-algorithm-based commands, all content is placed in __init__.py
    :
__init__.py
app.py
...
utils.py
bin/
    5ttgen/
        __init__.py
        freesurfer.py
        ...
        hsvs.py
    ...
    dwinormalise/
        __init__.py
        group.py
        ...
        mtnorm.py
    dwicat.py
    ...
    responsemean.py
    

Note that you can't have a 5ttgen.py for the main interface and a 5ttgen/ sub-directory for the underlying algorithms; pretty sure the Python module loader doesn't like it.

vs.

__init__.py
app.py
...
utils.py
bin/
    5ttgen/
        __init__.py
        freesurfer.py
        ...
        hsvs.py
    ...
    dwinormalise/
        __init__.py
        group.py
        ...
        mtnorm.py
    dwicat/
        __init__.py
    ...
    responsemean/
        __init__.py

The former is a bit clumsy in cmake because generation of the executable has to branch depending on whether the usage() and execute() functions are to be found in python/mrtrix3/command.py or python/mrtrix3/command/__init__.py. It's also inconsistent just navigating the code; the latter is more consistent across commands. But it also means placement of a lot of source code in __init__.py files, which is not what they're supposed to be for.

3. Split command code across multiple files?

Assuming 1. A separate directory for command source code, and 2. sub-directories for all commands:

__init__.py
app.py
...
utils.py
bin/
    5ttgen/
        __init__.py
        freesurfer.py
        ...
        hsvs.py
    ...
    dwinormalise/
        __init__.py
        group.py
        ...
        mtnorm.py
    dwicat/
        __init__.py
    ...
    responsemean/
        __init__.py

vs.

__init__.py
app.py
...
utils.py
bin/
    5ttgen/
        freesurfer/
            __init__.py
            execute.py
            usage.py
        ...
        hsvs/
            __init__.py
            execute.py
            ...
            usage.py
        __init__.py
        execute.py
        usage.py
    ...
    dwinormalise/
        group/
            __init__.py
            execute.py
            usage.py
        ...
        mtnorm/
            __init__.py
            execute.py
            usage.py
    dwicat/
        __init__.py
        execute.py
        ...
        usage.py
    ...
    responsemean/
        __init__.py
        execute.py
        usage.py

The latter:

  • Moves source code out of __init__.py;
  • Makes the filesystem storage of commands the same regardless of whether or not they contain separate internal algorithms;
  • Facilitates the separation of more complex script code across multiple source files if the developer wishes

It does however result in a lot of files called execute.py and usage.py, which need to be disambiguated by parent directory. While this was quite annoying in generating the proposal, typically a developer will be working on just one such command at a time, and so that will be less of an issue.

I note the possibility that for algorithm-based commands like 5ttgen and dwinormalise shown here, they could be just stored as individual .py files rather than each having its own sub-directory. However I think that would be self-contradictory: not only must they still obey the usage() & execute() structure, but such algorithms may themselves be highly complex, and so being able to split source code across multiple files may be beneficial.


If anyone has an opinion on any of these three separately, or has a preference for one specific combination, please do say so; otherwise I might need to just pull the trigger based on my own intuition / preference.

For clarity, what I ended up generating in #2850 has: 1. No child directory separating commands from utilities, combined with 2. All commands get their own sub-directory, such that the distinction between utility modules and command source code is done based on files vs. directories. Then for 3. I did the separation of source code into usage.py and execute.py for all commands, and a small amount of further code splitting for some commands. I'd also note that this structure results in:

  • Being able to remove pylint disable= directives for no-name-in-module;
  • More sensibly moving imports currently done at function scope to the outer scope, as pylint recommends, since the files themselves better define the scope of applicability of those imports;
  • Detection of some previously missed erroneous source code due to pylint previously not being able to assess the content of utility modules when parsing command source code.

Python cmake binwrappers: Split all commands into multiple files
@Lestropie
Copy link
Member Author

Going to muse a little on the whole usage() / execute() split for point 3 above as implemented in #2850.
Don't think this is going to change anything in the immediate future, just want to write my thinking down for posterity.

This echoes how things have been done in MRtrix3 C++ commands since before the Python API existed. Over and above being a logical modularisation of a command-line executable, in C++, separation of usage() and run() allows standard execution of code in between those functions:

  • verify_usage()
  • parse_special_options()
  • Optional GUI initialisation
  • parse()

In Python, something reasonably similar occurs:

  • Check for special dunder options
  • argparse.parse_args()
  • Configure based on special options
  • (Not in linked code, but present in Python CLI changes #2678) Check output filesystem paths subsequent to parsing of -force

It occurs to me that if supplanting argparse (#1608), this split may not strictly be necessary. The dunder options, setting of verbosity based on -debug / -info / -quiet, checking output filesystem paths after checkin for presence of -force, ANSI code configuration, all could be integrated into the parser class itself.

Where this would however break is in those commands that provide multiple algorithms. Configuration of the subparser per algorithm would be best done using a function per algorithm that configures the command-line parser for just that algorithm, arguably within the module defined by that algorithm as is currently performed.

This was referenced May 9, 2024
@Lestropie Lestropie mentioned this pull request Jun 9, 2024
@Lestropie
Copy link
Member Author

Superseded by #2920.

@Lestropie Lestropie closed this Jun 9, 2024
@Lestropie Lestropie deleted the cmake_python_binwrappers branch June 22, 2024 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants