Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up SLURM/Batchsystem environment before doing builds #4434

Open
Flamefire opened this issue Jan 8, 2024 · 4 comments
Open

Clean up SLURM/Batchsystem environment before doing builds #4434

Flamefire opened this issue Jan 8, 2024 · 4 comments
Labels
Milestone

Comments

@Flamefire
Copy link
Contributor

Some software tests fail when run inside a SLURM job, e.g. OpenMPI which does mpirun that picks up the SLURM job it is running in and fails as the resulting configuration doesn't match what the job is expecting.

I have 2 workarounds in my EB wrapper script:

  if [[ "${SLURM_NODELIST:-}" != "" ]]; then
    ssh $SLURM_NODELIST bash -l "$0"
    exit $?
  fi

This basically restarts the current script via ssh if run from inside a SLURM job assuming only 1 node.

  for i in $(env | grep ^SLURM_ | cut -f1 -d=); do
    unset $i
  done

This removes all SLURM_* variables from the current environment.

As the issue is a common pitfall with EB and given how easy the 2nd variant is to implement in EB via os.environ I'd suggest to do this by default, possibly with a --no-cleanup-slurm-env option to opt-out.

@akesandgren
Copy link
Contributor

Note that not all sites (like us) allow ssh between nodes in a slurm job

@Flamefire
Copy link
Contributor Author

That was just an example. Both methods seem to work, I had the 2nd in use while our nodes where updated and didn't allow SSH yet. And only the 2nd can be reasonably done in EasyBuild.

@boegel
Copy link
Member

boegel commented Jan 10, 2024

Makes sense to me, and it probably makes sense to implement this in EasyBuild 5.0 (although it shouldn't actually break anything, only fix builds that break/hang because they're running in a Slurm environment which may cause trouble with MPI).

@boegel boegel added this to the 5.0 milestone Jan 10, 2024
@boegel boegel changed the title Cleanup SLURM/Batchsystem environment before doing builds Clean up SLURM/Batchsystem environment before doing builds Jan 17, 2024
@boegel
Copy link
Member

boegel commented Jan 31, 2024

We briefly discussed this during the EasyBuild conf call today, and the general consensus seemed to be that this should be made opt-in rather than opt-out (which makes sense to me)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants