Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scripted modules to declare pip requirements and wrap pip dependencies. #7697

Closed
allemangD opened this issue Apr 15, 2024 · 4 comments
Closed
Assignees
Labels
Domain: build-system make-simple-things-simple "Make simple things simple and complex things possible" - to improve Slicer usability

Comments

@allemangD
Copy link
Contributor

allemangD commented Apr 15, 2024

This is tightly related to #7171 and #6913, but since this is more an RFC for a particular solution, so I'm creating a separate issue for discussion specifically on this proposal.


Install scripted modules python dependencies during module discovery to support new use cases:

Safely import pip-installed dependencies at the global level without catch ImportError: slicer.util.pip_install(...) wrappers.

Create scripted CLI modules which wrap python package entrypoints.

Supersede slicer.util.pip_install usage in scripted loadable modules to help prevent/debug environment breakage from incompatible dependencies.

Create scripted CLI modules implemented in multiple Python source files.

Proposed solution

When scripted modules are discovered (ie. $NAME.py or $NAME.xml is discovered) check for a corresponding $NAME-requirements.txt. When the module is initialized (lazily), check the dependencies and install. Show a summary and confirmation dialog to the user, similar to pip install when -y is not given. If installation is rejected, do not load the module.

For an extension to install $NAME-requirements.txt, they should use the RESOURCES argument to SlicerMacroBuildScriptedModule. This requires the $NAME-requirements.txt appear, with that name, in the source tree so that "installing" a module from source via module search path will still discover the file.

SlicerMacroBuildScriptedCLI does not currently have a RESOURCES argument, so additionally: add SCRIPTS and RESOURCES arguments and update it to use ctkMacroCompilePythonScript in the same way as SlicerMacroBuildScriptedModule. If neither RESOURCES nor SCRIPTS are provided, automatically add $NAME.py and $NAME.xml to each, respectively, for backwards compatibility.

Note - this also allows splitting CLI modules into multiple files and multiple scripts. Only the script with a matching $NAME.xml is discovered at runtime.

Add a convenience argument ENTRYPOINT to slicerMacroBuildScriptedCLI (with the same semantics as setuptools console script entrypoints) which is mutually exclusive with SCRIPTS. If provided, at configure time, generate a $NAME.py which invokes the function identified by ENTRYPOINT.

Examples

The current behavior would be unchanged:

slicerMacroBuildScriptedCLI(
	NAME Foo
)

slicerMacroBuildScriptedModule(
	NAME Bar
	SCRIPTS ...
	RESOURCES ...
)

Now it would be possible to explicitly list the script and xml for ScriptedCLI, or to declare additional scripts/resources:

slicerMacroBuildScriptedCLI(
	NAME Foo
	SCRIPTS Foo.py ...
	RESOURCES Foo.xml ...
)

Each module could declare a requirements.txt to be checked just before module initialization:

slicerMacroBuildScriptedCLI(
	NAME Foo
	SCRIPTS Foo.py ...
	RESOURCES Foo.xml Foo-requirements.txt ...
)

slicerMacroBuildScriptedModule(
	NAME Bar
	SCRIPTS Bar.py ...
	RESOURCES Bar-requirements.txt ...
)

And the CLI module could wrap a pip-installed entrypoint:

slicerMacroBuildScriptedCLI(
	NAME Baz
	ENTRYPOINT libbaz.run:main
	RESOURCES Baz.xml Baz-requirements.txt  # installs `libbaz`
)

Alternatives considered

  • Dependencies declared per-extension, rather than per-module.
  • Dependencies declared in a requirements.txt which is renamed to $NAME-requirements.txt.
  • Dependencies declared directly in CMake.
  • REQUIREMENTS_FILE or PIP_DEPENDENCIES or similar dedicated CMake macro argument.

All these suffer issues when developers "install" modules by adding the source tree to Slicer module path. Placing the dependencies in a $NAME-requirements.txt forces them to be picked up at module discovery and ensures they are updated as the file is edited.

Even if an extension developer does not build the project with CMake, the requirements are sure to be handled for users when the project is built and published via CI. If the developer forgets to declare the requirements file in RESOURCES, there would be a clear import error in user logs that the required packages are missing.

  • Include pip_install calls in the generated $NAME.py file for ENTRYPOINT.

This only supports the ENTRYPOINT mechanism. This invokes dependency resolution on every CLI invocation, not just on CLI instantiation. It is incompatible with installing modules from source via module path.

Future Planning

CLI module virtual environments

In the future if CLI modules get their own virtual environment, module initialization must create the virtual environment and install $NAME-requirements.txt to that virtual environment, then launch $NAME.py in the same environment. Behavior would be the same with or without the requirements.txt.

Slicer Constraints File

If a Slicer constraints file is provided to lock versions of built-in packages like numpy, vtk, SimpleITK, etc. it may be provided with pip -r $NAME-requirements.txt -c path-to-slicer-constraints.txt. Should not be done for CLI modules if they have their own virtual environments.

Dependency resolution errors

Discover all loadable module requirements.txt togethre and invoke pip once with all the requirements listed together? This would detect incompatibilities between modules and mitigate startup time cost - especially if using uv. Should not be done for CLI modules if they have their own virtual environments.

e.g. uv install -r Foo-requirements.txt -r Bar-requirements.txt -c path-to-slicer-constraints.txt; might detect an incompatibility between Foo and Bar dependencies. How should this error be handled? The user should have a means to downgrade or remove one or both of the infringing extensions.

Deprecate slicer.util.pip_install

Once requirements.txt is deemed mature, we should show a deprecation warning on slicer.util.pip_install to encourage extension developers to migrate to this mechanism; else dependency resolution will not be useful.

Questions

ENTRYPOINT would not be supported by install-from-source, since it depends on CMake configure time. Is this a problem?

Scripted Loadable modules are initialized during startup - this means all scripted modules dependencies would be resolved at startup. Is this overhead significant? We could mitigate this by:

  • using uv instead of pip
  • collecting all requirements and installing them together with multiple -r arguments. (This would also identify incompatibilities between extensions' dependencies).
  • doing a fast-check with pip freeze to avoid dependency resolution in the happy path when everything is already installed.

Should pip installation be done at extension installation time? How would from-source modules be handled? How would extension installation time module discovery be handled?

Should pyproject.toml be involved in any way? How do we handle from-source installation and module discovery? tomllib is available to Python in 3.11, or tomli for prior versions. Setuptools supports dynamic dependencies, but this is a beta feature.

This technically allows installing multiple .xml files in a single slicerMacroBuildScriptedCLI call. Should this be explictly forbidden? Should this be explicitly allowed? Should multiple ENTRYPOINT arguments be allowed?


I am working on a draft implementation of these changes but it is not yet complete. I invite comments and concerns in the meantime.

cc @jcfr @ebrahimebrahim

@allemangD allemangD added make-simple-things-simple "Make simple things simple and complex things possible" - to improve Slicer usability Domain: build-system labels Apr 15, 2024
@allemangD allemangD self-assigned this Apr 15, 2024
@pieper
Copy link
Member

pieper commented Apr 15, 2024

A few things:

  • Let's not deprecate pip_install when it's used in the python console
  • If you give the user a dialog asking them if they want to install a particular dependency version be sure to list what other modules (and Slicer itself) are using the installed version so they'll have an idea what might break if they go ahead

@allemangD
Copy link
Contributor Author

Note - this also allows splitting CLI modules into multiple files and multiple scripts. Only the script with a matching $NAME.xml is discovered at runtime.

This was based on a misconception on my part. I think supporting this properly is out of scope for this.

It is possible as long as the helper scripts are in sub-directories.

The issue is that both qSlicerScriptedLoadableModuleFactory and qSlicerCLIExecutableModuleFactory search cli-modules; so if a script is missing a corresponding .xml, it is skipped by qSlicerCLIExecutableModuleFactory but it is still recognized by qSlicerScriptedLoadableModuleFactory. It fails to load and shows as an error.

This isn't so easy to fix since all factories search all module paths. We could change this so that the CLI module factory searches only CLI paths, and the loadable module factories search only loadable module paths - but this might have unintended consequences. eg. in-source module loading might break.

Better probably to leave the module factory search paths as-is, and require helper scripts be placed in sub-directories or pip-installed. This is the same limitation we've always had in loadable modules. It might be nice to fix this or use some more declarative manifest for this, but either way is out of scope for this issue.

@allemangD
Copy link
Contributor Author

allemangD commented Apr 16, 2024

Notes from the weekly hangout today:

  • Strongly opposed to installations at startup time. Installations should be done at the very last minute possible ("Apply" button) or done during extension installation time.
  • It makes sense for CLI, but there are to many pitfalls for loadable modules. Remove loadable modules from scope in this.

Questions about conditional installation. requirements.txt doesn't support that but pyproject.toml does.

Comments on usability difficulties with CLI modules. There's usually a CLI and a scripted wrapper for a CLI. Maybe instead we just push for scripted modules in general. Maybe a new module type that uses a loadable-style GUI/Widget combined with a CLI-style Logic?

  • Benefit here is everything is in python-space and doesn't have a cost for startup/installation time. Costs only at runtime.

Do we even need CLI modules at all? Originally the intent was to provide an easy on-ramp for people to add functionality. That's been superseded by Python interaction. Do we even want to keep the CLIs long-term?

See https://napari.org/stable/plugins/first_plugin.html for another project plugin structure.

Really the discriminator is in the environment for the logic of the thing. Need first-class:

  • CLI-as-logic
    • wrap a cli executable as an invokable logic for a module
  • logic-with-pip-dependencies
    • dependency installation only when the logic is instantiated
  • logic-in-environment (venv? conda?)

"The module logic should almost always be a pip-installable module"

Moving module logic into a separate file impacts reload functionality. Not an issue if the logic runs in a separate process (ie CLI) but it requires chain-reload or other unreliable complexity.

@allemangD
Copy link
Contributor Author

allemangD commented Apr 16, 2024

I'm closing this issue and will open a revision, rather than heavily editing this one. Plan to pivot to managing environment for module logic rather than strictly for CLI. Implementation of this will hopefully be more straightforward and extensible in the future. It also avoids the issues around module discovery paths I described in #7697 (comment).

Basically we split the module into a "loadable" part - module description and widget - and a "runtime" part - logic. The "loadable" part is very sensitive to import order and environment. Only the runtime part really needs the installable dependencies.

Critical questions:

  • How are the dependencies listed? requirements.txt? pyproject.toml?
  • How is the environment for the module logic described? venv? conda? poetry? Are multiple backends allowed?
  • What is the structure for the "runtime" part that prevents discovering it as loadable module?

Critical support infrastructure:

  • Wrap a CLI module or other command-line entrypoint as a "runtime" part.
  • Transparently invoke the "runtime" part from the "loadable" part, triggering dependency resolution and user confirmation only if necessary. Support type hints.

Perhaps we install the "runtime" parts as distribution packages and use importlib and importlib.metadata to get information about them? Not sure if this is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Domain: build-system make-simple-things-simple "Make simple things simple and complex things possible" - to improve Slicer usability
Development

No branches or pull requests

2 participants