-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency Management Strategies... #741
Comments
what I am guessing from the above:
Does anybody have any literature to review on this topic? |
Another, related question... should we make more dependencies optional, and degrade further to allow simpler installation when deps are hard to resolve... (so HPC becomes not a special case) degrade when watchdog is missing, when appdirs is missing, etc... |
Should we make feature enablement an additional switch beyond just presence of a module? Maybe when appdirs is present, people still don't want to use it? |
There is currently a mix of all approaches present in the code, which is a bit incoherent. Might be good to increase consistency. |
platform dependent dependencies were tried in the past... At one point it appeared that the deps were evaluated when building the wheel, rather than when installing, so one needed a wheel built on windows, and another on linux... but the details of the testing are lost to time. Likely want to validate again. |
If this makes sense to people as an approach, will need to document it. |
As an example of dependency shenanigans... trying to build a windows executable:
so then I edit the file and remove the netifaces package download, so when running on windows with the executable built this way:
the vip line will say absent. and description: will not be able to use the vip option for high availability clustering if it worked, but the pynsist generation just failed later anyways. |
in previous releases, it would complain only a run time when someone tried to use it and give no hint as to what is missing... with the features... um... feature, at least the package can be interrogated. |
Similarly, on redhat, it is not possible to get an OS package for the python watchdog package. So... should report that watch feature is missing. If a redhat user chooses to install watchdog in their account via pip, they can then confirm that sr3 will use that package with sr3 features |
now have built a self-extracting executable and installed it on windows, and the result is:
in an experimental build... https://hpfx.collab.science.gc.ca/~pas037/Sarracenia_Releases/ see release 3.00.42_pre1 |
xattr has been folded into the proposed features function from as has paramiko... so consistency for the modules that give us trouble has been improved. |
with the last patches:
The functionality degrades nicely if paramiko is missing now... question is, do we remove the hard dependency |
so now the idea is to go through the list of all dependencies and analyze what functionality we lose and degrade nicely, or list them as part of existing ones.
So in the end... when deps are missing: don't crash, just work worse (degrade, and do everything else that you can without the feature). Also: report the breakage via sr3 features command. |
After an audit of all the dependencies, it seems that they are all used for very specific purposes, and using a guard if features['x']['present'] guard around their usage is not that big of a problem. It also makes things |
I've gone through the pull request and the issue description/comments. I like the changes that you've added. In the documentation, I think we should emphasize users to run sr3 features right off the bat after an install, just so they can make sure that all the dependencies that they want/need are present. I might've missed in the PR, but did you add some output for the logs? It could be helpful for analysts to have the sr3 features output in the logs when debug mode is ran, or something of the likes. |
released as part of 3.0.42 |
The Problem
Sarracenia uses a lot of other packages to provide functionality. These are called dependencies. In it's native environment (Ubuntu Linux) most of these dependencies are easily resolved using the built-in debian packaging tools (apt-get.) but in many other environments, It is more complex. like: https://xkcd.com/1987/ Even in environments where dependencies are installed somewhere it is not always clear which ones are available to a given program.
On redhat-8, for example, there does not seem to be a wide variety of python packages available in operating system repositories. Rather the specific minimal packages needed for the OS's own needs of python are all that seem to be available. This makes it challenging to install on redhat, as one now has to package many dependencies as well as the main package. The typical approach is to hunt for individual dependencies in different third party repositories, or rebuild them from source... This is a bit haphazard, and in some cases, like watchdog or dateparser, the package itself has dependencies and one ends up having to create dozens of python packages.
On redhat, as in many other environments, it seems more practical to use python native packaging, rather than the incomplete OS ones, as they do dependency resolution, and all the dependencies can be brought in using pip. The result of this, if done system-wide, is a mix of Distro packages, and pip provided packages, which complicates auditing and patching. System Administrators may also object to the use of pip packages in the base operating system.
Windows is another example of an environment where pre-existing package availability is unclear. On windows, the natural distribution format would be a self-extracting EXE, but use of plugins with such a method is unclear, and all the dependencies need to be packaged within it. People also install python distributions ActiveState, Anaconda, or the more traditional cpython, and those will each have their own installation methods.
The complications mostly arise from dependencies such as xattr, python3-magic, watchdog, etc... that is packages that are wrappers around C libraries or use C libraries as part of their implementation. In these cases, pure python packaging often fails, as more environmental support is needed. For example, the python-magic python package requires the c-library libmagic1 to be installed. If using OS packages, this is just an additional dependency, no problem, but with pip, it will just fail, and the user needs to find the OS package, install that, and then try installing the python package again.
Another complication results from all these different platforms having methods of installation mean that it is not obvious what advice to provide to users when a dependency is missing "pip installe? conda install? apt install, yum install ?" ... the package naming conventions vary by distribution, and are different from the module names used to test their presence.
Approaches to Dependency Management
Manual Tailoring
For HPC (which runs redhat 8.x) there are a few dependencies brought in by EPEL packages, some built from source, but some had to be left out. The setup.py file, when building packages on redhat are typically hand edited to work around packages that are not available. So manual editing of packages is done. After the RPM is generated, it is then tested on another system, and a different user, to see whether it runs (as the local user doing the build may have pip packages which provide deps not available to others.)
implementation: manual editing of setup.py to remove dependencies.
(Mostly) Silent Disable
Looking at xattr, the import is in a try/except, and if it fails, the storing of metadata in extended file attributes is disabled. There is a loss of functionality or a different behaviour on these systems as a result. There is no way to query the system for which degrades are active. nothing to prompt the user what to do to address, if they want to.
implementation in filemetadata.py:
There are also tests in sarracenia/init.py for the code to degrade/understand when dependencies are missing:
Demotion to Extras
The Python Packaging tool has a concept of extras, sort of the inverse of batteries included... in setup.py one can put extras that are available with additional dependencies being installed:
Platform Dependent Deps
one can add dependencies that vary depending on the platform we are installing on.
( this is in the v03_issue721_platdep branch)
What do we do?
So all of the approaches above (and perhaps others?) are used in the code, and someone using an installation will have a subset of functionality available, and sr3 has no way of reporting what is available or not. there is a branch #738 that provides an example report of modules available using an sr3 extras command.
should we at least report what is working, and what isn't? An additional problem is that configured plugins may have additional dependencies. The mechanism in the pull request also provides a way for plugins to register those, so they show up in the inventory command.
Is this a reasonable/adviseable approach?
The text was updated successfully, but these errors were encountered: