Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run python in the same process as eb wrapper script by using exec #4048

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

Flamefire
Copy link
Contributor

Use exec in the eb wrapper script to avoid creating a new process.
This allows easier work with e.g. SLURM as signals send to the main process (eb) may not be forwarded to Easybuild (python) which results in e.g. stale locks when the process is then later force-killed.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Flamefire Please sync this with develop now that #4049 is merged

@Flamefire
Copy link
Contributor Author

Rebased

@boegel boegel changed the title Run python in the same process as eb run python in the same process as eb wrapper script by using exec Aug 5, 2022
@boegel
Copy link
Member

boegel commented Aug 5, 2022

This change looks harmless, but when playing with it on our systems I actually run into problems because of this...
It's not entirely clear to me what's going wrong here, but it has something to do with the environment in which EasyBuild is being run not being exactly the same...

== Temporary log file in case of crash /tmp/eb-ypv4xy3v/easybuild-12uyhfz0.log
ERROR: Failed to process easyconfig /tmp/CFITSIO-4.1.0-GCCcore-11.3.0.eb: Module command '/usr/share/lmod/lmod/libexec/lmod python --terse --show-hidden avail ' failed with exit code 1; stderr: no such variable
    (read trace on "env(_)")
    invoked from within
"string match "*tcl2lua.tcl" $env(_)"
    (file "/etc/modulefiles/vsc/cluster/.modulerc" line 4)
    invoked from within
"source $mRcFile"
    (procedure "main" line 15)
    invoked from within
"main $fn"
    (file "/usr/share/lmod/lmod/libexec/RC2lua.tcl" line 137)
Lmod has detected the following error: Unable to parse: "/etc/modulefiles/vsc/cluster/.modulerc". Aborting!

If you don't understand the warning or error, contact the helpdesk at hpc@ugent.be


; stdout: _mlstatus = False

Here's the contents of the .modulerc file:

#%Module1.0
# Legacy modulerc. Lmod should always take default.lua first. It's only here to ensure
# falling back to environment-modules keeps on working.
if { ![string match "*tcl2lua.tcl" $env(_)] } {
    if {[info exists ::env(VSC_DEFAULT_CLUSTER_MODULE)]} {
        module-version cluster/$::env(VSC_DEFAULT_CLUSTER_MODULE) default
    } else {
        puts stderr "The default cluster module cannot be determined. Please set \$VSC_DEFAULT_CLUSTER_MODULE."
        exit 1
    }
}

The $env(_) part is where it tries to figure out how this file is being processed (via the tcl2lua script in Lmod, or not), so somehow the value of the $_ environment variable is different with and without using exec in the eb wrapper script?

@Flamefire
Copy link
Contributor Author

The $env(_) part is where it tries to figure out how this file is being processed (via the tcl2lua script in Lmod, or not), so somehow the value of the $_ environment variable is different with and without using exec in the eb wrapper script?

In fact it is not set at all as tested via a python script printing os.environ.
That variable is a bit special: https://unix.stackexchange.com/questions/280453/understand-the-meaning-of

The actual bug in your combination of config, lmod, this change, etc is that /usr/share/lmod/lmod/libexec/lmod python --terse --show-hidden avail as run by EB isn't run in a shell and hence $_ doesn't get set. So the inherited value from EB is used which isn't there either anymore after this change (exec transitions the process so there is no "last command" I guess.)

I added a commit which readds this variable in main. Hope that helps.

Flamefire and others added 2 commits August 23, 2022 10:41
Use `exec` in the `eb` wrapper script to avoid creating a new process.
This allows easier work with e.g. SLURM as signals send to the main process (`eb`) may not be forwarded to Easybuild (python) which results in e.g. stale locks.
easybuild/main.py Outdated Show resolved Hide resolved
@boegel boegel merged commit c84d1ac into easybuilders:develop Sep 9, 2022
@Flamefire Flamefire deleted the patch-1 branch September 9, 2022 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants