Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always set $EBPYTHONPREFIXES instead of PYTHONPATH #2887

Open
Flamefire opened this issue Feb 10, 2023 · 9 comments
Open

Always set $EBPYTHONPREFIXES instead of PYTHONPATH #2887

Flamefire opened this issue Feb 10, 2023 · 9 comments
Milestone

Comments

@Flamefire
Copy link
Contributor

Flamefire commented Feb 10, 2023

A short time ago I did a major replacement of setting $EBPYTHONPREFIXES instead of PYTHONPATH on the modules on our HPC cluster because the latter does not play nice with virtualenvs: The $PYTHONPATH packages take precedence over the virtualenv making it impossible for users to update/downgrade packages (in their virtualenv by explicitly issuing the requests via e.g. pip).

Hence we should always set $EBPYTHONPREFIXES (in PythonPackage/PythonBundle) or at least provide an option to set this for each site.

@boegel boegel added the change label Feb 15, 2023
@boegel boegel added this to the 4.x milestone Feb 15, 2023
@smoors
Copy link
Contributor

smoors commented Mar 3, 2023

i personally think it is a good thing that users cannot upgrade/downgrade packages that are already loaded with modules:

  • for reproducibility
  • for performance

also, if you don't want to use the module provided version, why load the module in the first place?

so i think if this is implemented it should indeed be optional

@Flamefire
Copy link
Contributor Author

i personally think it is a good thing that users cannot upgrade/downgrade packages that are already loaded with modules:

The point is: They can. It just silently does not work. See below

if you don't want to use the module provided version, why load the module in the first place?

Because the module might contain a dependency which is not the (primary) reason for using that module, and that might need changing.

Example: TensorFlow requires a lot of additional Python packages. E.g. for 2.11 we have tensorboard-plugin-profile==2.11.1

After loading the relevant module(s) the user creates a virtualenv and installs additional packages with pip into that as required. What is already part of the module(s) will be skipped from installation --> All good so far.

Now suppose that they release a version 2.11.2 of the package with a fix we didn't anticipate when creating the easyconfig but is for some reason important for users.

  • If they do pip install tensorboard-plugin-profile nothing will happen as that package is already installed.
  • pip install --upgrade tensorboard-plugin-profile==2.11.2 will download and install that package into the virtualenv where it will be simply ignored due to our use of $PYTHONPATH in the module file. So the user might rightfully think that the new version is used while it is not possibly wasting a lot of compute resources until an issue occurs rendering the results useless.

A similar thing happens, when a user installs some additional python package into the virtualenv which has a dependency on a Python package contained (as a dependency/extension) in a module. But the users python package requires a different version. pip will resolve the version requirements and show that it installs a different version into the virtualenv. But again although it looks like all is good, it is not: The package in the virtualenv will be ignored.

This has actually caused issues for us in the past which led me to "fixing" all module files. So this is not theoretically.

so i think if this is implemented it should indeed be optional

Probably, but the default should be to use $EBPYTHONPREFIXES as this is/was actually an unannounced, breaking change, which I'd call a bug, caused by us switching from the Python2/3 multi-dep approach (which used $EBPYTHONPREFIXES ) to suffixed ECs (which use $PYTHONPATH) and the implication are so subtly wrong that I guess that is the safer approach.

Coming back to

if you don't want to use the module provided version, why load the module in the first place?

Another viewpoint is: We shouldn't restrict users when we don't have to. If a user actively installs a different version of a package there might be reasons for doing so which I think are hard to anticipate in general. I mean it is nothing that can happen implicitly: The user has to actively do that.

@smoors
Copy link
Contributor

smoors commented Mar 4, 2023

i see your point now, thanks for explaining.

this is/was actually an unannounced, breaking change

true, but adopting the multi_deps approach was a breaking change too, so you could argue it was reverted to the original approach :)

@bartoldeman
Copy link
Contributor

bartoldeman commented Mar 16, 2023

I think this disagreement also came up in Julia easybuilders/easybuild-easyconfigs#17455
where @lexming mentioned "The issue is that this approach allows user installs to overwrite what is loaded with modules from EasyBuild (not desirable)."

I'm with @Flamefire here, and it was actually the major second reason for us to use $EBPYTHONPREFIXES, as PYTHONPATH overrides everything (including OS python, which may even be 2.7!). We had the issue that a user had modules loaded, then install a virtual env but because PYTHONPATH was set found the virtual env didn't work as intended.

One could actually use EBPYTHONPREFIXES everywhere and still have the (configurable, if we agree to disagree?) effect of overriding user installs.

See this script that is installed (sitecustomize.py)

# sitecustomize.py script installed by EasyBuild,
# to pick up Python packages installed with `--prefix` into folders listed in $%(EBPYTHONPREFIXES)s

import os
import site
import sys

# print debug messages when $%(EBPYTHONPREFIXES)s_DEBUG is defined
debug = os.getenv('%(EBPYTHONPREFIXES)s_DEBUG')

# use prefixes from $EBPYTHONPREFIXES, so they have lower priority than
# virtualenv-installed packages, unlike $PYTHONPATH

ebpythonprefixes = os.getenv('%(EBPYTHONPREFIXES)s')

if ebpythonprefixes:
    postfix = os.path.join('lib', 'python' + '.'.join(map(str, sys.version_info[:2])), 'site-packages')
    if debug:
        print("[%(EBPYTHONPREFIXES)s] postfix subdirectory to consider in installation directories: %%s" %% postfix)

    potential_sys_prefixes = (getattr(sys, attr, None) for attr in ("real_prefix", "base_prefix", "prefix"))
    sys_prefix = next(p for p in potential_sys_prefixes if p)
    base_paths = [p for p in sys.path if p.startswith(sys_prefix)]

    for prefix in ebpythonprefixes.split(os.pathsep):
        if debug:
            print("[%(EBPYTHONPREFIXES)s] prefix: %%s" %% prefix)
        sitedir = os.path.join(prefix, postfix)
        if os.path.isdir(sitedir):
            if debug:
                print("[%(EBPYTHONPREFIXES)s] adding site dir: %%s" %% sitedir)
            site.addsitedir(sitedir)

    # Move base python paths to the end of sys.path so modules can override packages from the core Python module
    sys.path = [p for p in sys.path if p not in base_paths] + base_paths

if you change
base_paths = [p for p in sys.path if p.startswith(sys_prefix)]
to
base_paths = sys.path[:]
it'll override everything. Then you'll also need to parse PYTHONPATH to give that the absolute priority though, e.g.:

base_paths = [p for p in sys.path if p.startswith(sys_prefix)]
python_path = [p for p in sys.path if p in os.getenv("PYTHONPATH", "").split(os.pathsep)]
... #... site.addsitedir(...).
sys.path = python_path + [p for p in sys.path if p not in python_path+base_path] + base_paths

which switches then from
PYTHONPATH - virtualenv paths - EBPYTHONPREFIXES - system
to
PYTHONPATH - EBPYTHONPREFIXES - virtualenv paths - system

@ocaisa
Copy link
Member

ocaisa commented Feb 22, 2024

I think we should do this, is this perhaps something we could squeeze into EB5 since it would be a pretty major change?

@Flamefire
Copy link
Contributor Author

I think we should do this, is this perhaps something we could squeeze into EB5 since it would be a pretty major change?

This could even be guided by an option. Might be hard to get always correct when custom easyblocks are involved though. Currently we have some code in a hook that replaces PYTHONPATH by EBPYTHONPREFIXES

@Micket
Copy link
Contributor

Micket commented Feb 22, 2024

I'm in favor of this. Given that I used this with multideps and never say any issues, I think it should be safe to default to this.

Might be hard to get always correct when custom easyblocks are involved though.

Even if we don't cover those cases it's still worth a fix for "just" doing this in the PythonBundle and PythonPackage's, as those are most likely the stuff that the user wants to shadow with their venv anyway.

@boegel boegel added the EasyBuild-5.0 EasyBuild 5.0 label Feb 29, 2024
@boegel boegel modified the milestones: 4.x, 5.0 Feb 29, 2024
@Micket
Copy link
Contributor

Micket commented Feb 29, 2024

I brought this up on the 5.0 meeting today

  • many of us were in favor of this (at least in that meeting)
  • the need for this seems to be on the rise (more venv use, more PythonBundles to potentially need to shadow)
  • this is similar to the behavior that we already have in R and Julia, so making it consistent is also good.
  • has a nice side benefit of totally not messing up your OS python3.6 since python bundles wouldn't be picked up.
  • definitely want to have a framework option for this (maybe --prefer-ebpythonprefix-over-pythonpath, --prefer-ebpythonprefix, --avoid-pythonpath ?).
  • the only thing that makes this a 5.0 feature is whether this gets behavior is default. Opt-in there is no objections to adding this for any release.

I'm not sure how we want to tackle this in practice though.
Approach A:
Adding the option in framework, and then customizing our easyblocks to prefer EBPYTHONPREFIX + updating our easyconfigs that happen to hardcode PYTHONPATH today.
The (rare) corner cases like, e.g. SRA-Toolkit:
https://github.com/easybuilders/easybuild-easyconfigs/blob/1485d4b19ecb2e0d72a7ef27c9b64521a391265a/easybuild/easyconfigs/s/SRA-Toolkit/SRA-Toolkit-3.0.10-gompi-2023a.eb#L71-L74
which puts

modextrapaths = {
    'CLASSPATH': 'jar/ngs-java.jar',
    'PYTHONPATH': 'ngs-python',
}

we either just ignore if the use is intentional, or possibly fix so that they do install into the correct sub-directory.

Approach B:
Some form of built-in hook that rewrites the PYTHONPATH's into corresponding EBPYTHONPREFIX's.
This would still need to at least avoid some exceptions i think, like system level Spark packages does

modextrapaths = {'PYTHONPATH': 'python'}

and of course, we have to avoid it in all cases where the target path doesn't follow the expected site-packages directory structure.

I'm leaning option towards A.

Edit: Corrections, I thought about it some more, and I think it requires a hook to deal with just the easyconfigs that specify their own PYTHONPATH + having it optionally converted to EBPYTHONPREFIX so, option B is the only alternative.

@Micket
Copy link
Contributor

Micket commented Apr 3, 2024

I began working on this in
easybuilders/easybuild-framework#4496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants