Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External projects for 3.1 #2901

Open
Lestropie opened this issue May 13, 2024 · 19 comments
Open

External projects for 3.1 #2901

Lestropie opened this issue May 13, 2024 · 19 comments
Labels
Milestone

Comments

@Lestropie
Copy link
Member

Given that the software is openly broadcast as being capable of supporting compilation of external projects making use of the API, ideally prior to tagging 3.1 we should have a template demonstrating how such external projects could work following 3.1 release.

@Lestropie Lestropie added this to the 3.1.0 updates milestone May 13, 2024
@daljit46
Copy link
Member

daljit46 commented May 14, 2024

I think it'd be really helpful to spell the exact requirements of "external projects", since there are countless ways to implement an extension system. Previously, I made a small test example that shows how can one use Git submodules to write their own commands in a separate repo. This works for C++ commands only. A similar approach could work for Python executables too I guess, at least looking at something like tracfinder.
@MRtrix3/mrtrix3-devs anyone has any further inputs on this?

@Lestropie
Copy link
Member Author

https://github.com/dchristiaens/shard-recon has both C++ and Python commands.

I wonder if the first port of call is whether external projects should be entirely self-contained, or should make use of an already built version of MRtrix3. On 3.0.x, we make a softlink to build, which self-detects that it's being executed for an external project, and links against the shared library / finds #included files in either the linked core or the local project code. That doesn't translate trivially to cmake. But there's nevertheless possibly two prospects that don't involve major changes to the core:

  1. Building of an external project entirely encapsulates the MRtrix3 version it depends upon.

    • +ve: Provides strong version matching guarantees
    • +ve: Existing cmake ecosystem to handle it
    • -ve: Redundant compilation for items from the API used by the external project
    • -ve: Possible path conflicts. To my knowledge, a user couldn't have within PATH both a shared DLL for an MRtrix3 core installation, and another shared DLL generated for compilation of an external project, and those two DLLs could have the same filename but be of different versions. Could insert the name of the external project into the DLL(s) it generates to disambiguate? Or external project builds never use shared DLLs? Other solutions?
  2. Building of an external project takes as inputs the locations of MRtrix3 API code and installed shared binaries.

    • -ve: Slightly clumsy; the old build softlink technique intrinsically captures both code and shared DLLs, whereas with cmake they'd need to be separate paths provided separately.
    • -ve: User could attempt to build external project against any version of MRtrix3, potentially incompatible, outside of control of the external project maintainer.
    • -ve: Guessing it's not conventional in a cmake environment.
    • +ve: More familiar to MRtrix3 precedent.
    • +ve: Reduced redundancy of compilation.

I've a feeling that we'd discussed and were drawn to 1., but don't recall if it was definitive.

Our code might be unusual in that an external C++ command might make use of both a precompiled core DLL and templated source code from the core. Altering for explicit instantiation of plausible templates and possibly a different resolution of shared library generation might modify this, but those might be much larger proposed changes. Plus there's situations where an external project may have some unforeseen template utilisation, which explicit instantiation into a shared DLL wouldn't cover.

(Can edit in light of suggested modifications or complete alternatives)

Does anyone know of any other neuroimaging analysis packages that support any form of "extensions" where we might see precedent / success or horror stories?

@daljit46
Copy link
Member

daljit46 commented May 14, 2024

-ve: Possible path conflicts. To my knowledge, a user couldn't have within PATH both a shared DLL for an MRtrix3 core installation, and another shared DLL generated for compilation of an external project, and those two DLLs could have the same filename but be of different versions. Could insert the name of the external project into the DLL(s) it generates to disambiguate? Or external project builds never use shared DLLs? Other solutions?

Hmm, I can't fully grasp whether this would be an issue. Do we expect users to install external commands in the same location as a standard MRtrix3 installation (considering also that our new packaging strategy would be to provide an extractable tarball as the standard way of distributing the project)? Our commands are installed with a relative rpath set to ../lib so if two different mrtrix3 installations are in PATH (but in separate directories) then things should still work as expected, right?

@daljit46
Copy link
Member

Does anyone know of any other neuroimaging analysis packages that support any form of "extensions" where we might see precedent / success or horror stories?

Looking around, I found that Slicer and MIRTK support extensions (although their project structure is quite different from ours).

@Lestropie
Copy link
Member Author

If shared libraries are found via relative paths, then that would I think alleviate path conflicts; unless someone installs both core and extensions directly to eg. /usr/bin. Would eg. Anaconda installations potentially have issues like this?

I'm being somewhat speculative here, I'm not across every possible use case. Mostly cognizant that if we propose something that prevents what people are already doing with 3.0.x, there may be unhappiness.

I'm trying to crunch out #2678 so that I can then do (possibly redo) #2741 dealing with conflicts. The outcome there will colour how to deal with Python commands for external projects here.

@Lestropie
Copy link
Member Author

For external projects with Python scripts, run.command() may be employed to execute MRtrix3 commands, and there may be an expectation that the versions of MRtrix3 commands being invoked are version-matched to what the project maintainer expects the MRtrix3 version to be. To guarantee expected execution, then however the external project mechanism works, it would need to be compiling whatever set of MRtrix3 commands may be invoked by commands within that external project, and installing them to whatever location the external project is installed.

@Lestropie
Copy link
Member Author

MRtrix3 C++ binaries that are built version-matched for the external project need to be available for Python scripts to execute via run.command(), but should not be in PATH.

@daljit46
Copy link
Member

daljit46 commented May 17, 2024

MRtrix3 C++ binaries that are built version-matched for the external project need to be available for Python scripts to execute via run.command(), but should not be in PATH.

Why not? My thinking is along the following. When MRtrix3 is built as a part of an external project, the installation directory would contain two folders: bin and mrtrix3 (and potentially a lib folder but let's ignore that).
The mrtrix3 folder will contain all the content of a standard mrtrix3 installation directory except for C++ and Python binaries.
The bin folder will contain all C++ executables (default + custom) with an RPATH set to ../mrtrix3/lib and all Python binaries (again default + custom).
Then to use the commands built by the project the developer of the extension will add installation/bin to PATH and run their commands as usual. The only change that would be needed here would be to allow the Python executables to locate the mrtrix3 package.

Is there a reason why the above wouldn't work?

@jdtournier
Copy link
Member

Is there a reason why the above wouldn't work?

The issue isn't whether it would work - it would, as far as the external project is concerned. It's about ensuring that the user's environment doesn't suddenly start using a different version of MRtrix simply because they've added some external project to their PATH.

External projects are bound to be relying on outdated versions of MRtrix. If these are listed first in the PATH, they'll take precedence over the user's main install, and cause no end of confusion. The external projects should only expose the commands that are specific to them, nothing else.

@daljit46
Copy link
Member

daljit46 commented May 17, 2024

External projects are bound to be relying on outdated versions of MRtrix. If these are listed first in the PATH, they'll take precedence over the user's main install, and cause no end of confusion. The external projects should only expose the commands that are specific to them, nothing else.

Ok, but I think there is a problem (at least in the way I envisage a user of an extension using MRtrix3) with this approach then. If the user expects their commands to work a specific version of MRtrix3 then surely they also expect the C++ executables in their Python scripts to be built against that specific version. The strategy I just outlined above ensures that this is the case, but, unless I'm misunderstanding something, not if PATH contains C++ binaries not built by the developer of the extension system.

@jdtournier
Copy link
Member

We've long had mechanisms in place to ensure that the python scripts always use the specific executables that match their install - even if that isn't in the PATH. If you look at the machinery in the python run.command() function, you'll see that it always searches first in the expected bin folder that matches the script's own location, only reverting back to PATH if nothing matches in there. We've ourselves been caught out by trying to run our (development) scripts (which are rarely in the PATH) and wondering why things don't work as expected, which is why there are these additional checks in place.

What we'd need to do here is simply extend that machinery to look in the extension's main bin/ folder (same place as the script itself), then in the ../mrtrix3/bin/ folder, and only rely on the PATH if nothing matches.

@daljit46
Copy link
Member

daljit46 commented May 17, 2024

Ok yes that makes sense. For some reason, I was under the impression that executables in PATH would take precedence over those found relatively, when it's actually the other way around.

@daljit46
Copy link
Member

A question that came to my mind is: how would this all work on Windows? RPATH isn't available, but you could theoretically use SetDllDirectoryA to achieve something similar.

@daljit46
Copy link
Member

daljit46 commented May 21, 2024

Here is how I thought an extension developer could write their CMakeLists.txt to create an extension that both has C++ and Python code:

cmake_minimum_required(VERSION 3.16 FATAL_ERROR)

project(TestExtension VERSION 0.0.1 LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

include(MRtrix3Extension.cmake)

add_executable(extension_cmd1 extension_cmd1.cpp)

add_library(extension_lib STATIC
    extension_source1.cpp
    extension_source2.cpp
)

target_link_libraries(extension_cmd1 PRIVATE
    ${MRtrix3_LIBRARIES}
    extension_lib
)

target_link_libraries(extension_lib PRIVATE
    ${MRtrix3_LIBRARIES}
)

add_mrtrix_python_cmd(PythonCMD extension_python_cmd1.py)
add_mrtrix_python_lib_sources(PythonLibSources
    extension_python_lib1.py
    extension_python_lib2.py
)

install_mrtrix3_extension(
    CPP_COMMANDS extension_cmd1
    PYTHON_COMMANDS PythonCMD
    PYTHON_LIB_SOURCES PythonLibSources
)

One question to ponder on is whether the developer should have full freedom to define their directory structure (perhaps limiting ourselves to giving guidelines on how an extension should be structured) or being more stringent and enforcing a fixed way of doing things.

@daljit46
Copy link
Member

In #2920, we implemented the possibility of creating either as standalone files or within various files within a directory. Now, during the build and installation stage, Python commands are executed from bin but their code can reside inside ${PROJECT_BUILD_DIR}/lib/mrtrix3/commands/. This works because a command expects the MRtrix3 library files to be in ../lib. For external projects, this cannot be the case as we intended to separate the core mrtrix3 code into a mrtrix3 folder inside the installation directory so that the library files for a Python command will be located at ../mrtrix3/lib.
I've been thinking about what would be the best way to handle this. Do we want to firmly keep a clean separation between the core mrtrix3 code or would it be permissible for the installation code of an external project to also reside in the mrtrix3 folder? I think this would allow the logic to be simpler, so I'm unsure if it's worth striving for "purity".
Either way, some extra CMake logic will be needed to deal with the different locations of external project commands and core mrtrix3 commands.
@Lestropie any feedback on how to handle this best?

@Lestropie
Copy link
Member Author

The way I think I had envisioned it was that, for an external project called "project", with a single-file Python command called "cmdpython", there would be:

bin/
    cmdpython
lib/
    mrtrix3/
        app.py
        ....
    project/
        commands/
            cmdpython.py

The CMake function that generates bin/cmdpython would need to be modified to support the case where it loads the MRtrix3 API from one location and the command usage() and execute() functions from another. Once that's working, it should also be made possible for lib/project/ to contain custom libraries used across multiple commands, and for the different per-command filesystem structures provided in #2920 to be applicable to external project commands.

In retrospect, while this works just fine for a build directory, there is a potential issue if the installation directory coincides with a location where there already exists a core MRtrix3 installation. One way to get around this would be to rename lib/mrtrix3/ to a full Git description of the version of MRtrix3 that that project is configured to build against; eg. lib/mrtrix3-<tag>/ or lib/mrtrix3-<tag>-###-g########/.

If that either doesn't work or is not preferred, it might be worth a quick call to flesh out ideas, as it can get clunky trying to explain filesystem structure via written text.


PS. As an aside: With #2920 & #2824, should C++'s "cmd/" and Python's "commands/" have the same name?

@Lestropie
Copy link
Member Author

In looking into #2960, I realised that on dev there's still this code within the Python parser, which is still explicitly executing git to get a version number. Part of the intention there is to be able to report a version identifier for the external project, rather than the MRtrix3 core it is executing against.

  1. Ideally that shouldn't be executing at all in core when unnecessary; that's an error on my part, and not specific to this Issue.

  2. What is specific to this Issue is how to obtain version information and report it in the interface. So in the above example, there would need to be something like:

    bin/
        cmdpython
    lib/
        mrtrix3-<tag>/
            app.py
            ....
            version.py
        project/
            commands/
                cmdpython.py
            version.py
    

    What I'm not sure about is how to make the external project's version information visible to the core API so that the CLI reports it.

@daljit46
Copy link
Member

Is there any benefit of separating the code within lib into mrtrix3 and project though? The user of the project is completely unaffected by this and I see no practical downsides.

If that either doesn't work or is not preferred, it might be worth a quick call to flesh out ideas, as it can get clunky trying to explain filesystem structure via written text.

On this, I think if we want to maintain FHS guidelines (personally I'm not a fan of FHS, I think it has more cons than pros and really just used for historical reasons at this point), we should just consider isolating all the library installation code for a project into lib/project rather than creating two separate folders. This would be cleaner IMO.

@Lestropie
Copy link
Member Author

I would have thought that from a code perspective it would be preferable for module access to be done through "mrtrix.*" and "project.*". Saying "from project import app" to import the MRtrix3 app module feels wrong to me.

Though in retrospect I suppose that the former wouldn't work if specifically within lib/ the MRtrix3 installation directory needs to be potentially disambiguated from other MRtrix3 installed in the same location, as I'd proposed earlier.


What about if the MRtrix3 Python API were to go into lib/project/mrtrix3/? That would be stringently tied to the external project. As long as it's still possible for external project code (whether in lib/project/*.py or lib/project/commands/*) to do "from mrtrix3 import *", that would be suitable. The CMake-generated executables might need additional security to make sure that such imports would pull from the project-tied MRtrix3 API rather than anything else that may have made its way into the Python path. Also the CMake-generated Python executables for MRtrix3 commands would need to point to lib/project/mrtrix3/commands/*.

So it would look like:

bin/
    mrtrix3cppcmd
    mrtrix3pythoncmd -> imports project.mrtrix3.commands.mrtrix3pythoncmd.py
    projectcppcmd
    projectpythoncmd -> imports project.commands.projectpythoncmd.py
lib/
    project/
        mrtrix3/
            commands/
                mrtrix3pythoncmd.py
            app.py
            ...
            utils.py
        commands/
            projectpythoncmd.py
        projectpythonmodule.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants