Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

{mpich,openmpi}: optionally link against slurm's libpmi2 #283071

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

SomeoneSerge
Copy link
Contributor

Description of changes

Closes #280406. Partially addresses #274584 (memory leaks are still there, but segfaults seem to be gone, at least on aalto's triton)

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 馃憤 reaction to pull requests you find important.

pmixSupport ? false,
withSlurm ? true,
slurm,
}:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up just reformatting with nixfmt...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment - the reformatting and reordering of code and makes the PR quite difficult to read. If you reformat a package, please put it at least in a different commit. @markuskowa

Note: now in a separate commit

markuskowa

This comment was marked as outdated.

Closes NixOS#280406

(cherry picked from commit 3e98731)
# PMIX support is likely incompatible with process managers (`--with-pm`)
# https://github.com/NixOS/nixpkgs/pull/274804#discussion_r1432601476
pmixSupport ? false,
withPmilib ? "slurm",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this now overlaps with pmixSupport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, there's also a --with-pmi, which could also take "slurm". On one hand, mirroring the upstream configuration options is a sane default. OTOH, upstream seems to maintain a variety of options that do the same or related things and potentially conflict with each other. We should expose the bare minimum that we can truly guarantee to work, plus a way to disable all of our heuristics and do the ad hoc job in an override

https://github.com/pmodels/mpich/blob/c2f04da1b3371b0e85c77761649adfa30a0f5269/configure.ac#L1524-L1527

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, when we pass both pmix and slurm.lib to buildInputs there are two paths that provide libpmi2.so no there aren't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force-pushed again. Now explicitly mirroring upstream and erroring out on any surprising inputs by means of broken = true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased on master after the split-pmix PR

Comment on lines 71 to 81
# https://www.open-mpi.org/faq/?category=buildcuda
(lib.withFeature true "ucc")
(lib.withFeature true "ucx")

(lib.enableFeature (!cudaSupport) "mca-dso")
(lib.withFeatureAs cudaSupport "cuda" cudaPackages.cuda_cudart)
(lib.enableFeature cudaSupport "dlopen")

(lib.withFeatureAs stdenv.isLinux "pmix" (lib.getDev pmix))
(lib.withFeatureAs stdenv.isLinux "pmix-libdir" "${lib.getLib pmix}/lib")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation: prevents autoconfigure from guessing stuff.
Didn't migrate all of the flags because out of scope. The ones migrated I had already touched for other reasons

# PMIX support is likely incompatible with process managers (`--with-pm`)
# https://github.com/NixOS/nixpkgs/pull/274804#discussion_r1432601476
enablePmix ? false,
enablePmi1 ? false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn... those are internal variables, not options I guess:

configure: WARNING: unrecognized options: --disable-pmix, --disable-pmi1, --enable-pmi2

@SomeoneSerge SomeoneSerge force-pushed the feat/libpmi2 branch 2 times, most recently from 6959731 to b5108e5 Compare January 24, 2024 19:52
Comment on lines +18 to +20
] ++ lib.optionals (withPmi == "pmi1") [
"gforker"
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a failure saying something like "gforker needs pmi1 but you chose pmi2". Forgetting now, I should check again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at pmodels/mpich#6662 and trying to think what could be the callPackage interface that we could fix now and reuse once we update to 4.2.x? The --with-pmi{1,2} options are claimed to expect paths, except I think in case of pmix their values are optional and just dropping stuff in buildInputs still "works". There's no explicit option to mirror for slurm's pmi support, because --with-pmilib=slurm was a temporary hack? And I don't think we want packages or strings with store paths as arguments because that might be bad for splicing

@markuskowa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofborg build mpich openmpi slurm
@ofborg test slurm

...which were previously ignored; these already support CUDA since a while ago

(cherry picked from commit 5596085)
(cherry picked from commit 17dd38f46978cf7d0ed6780b298c5fb31f548e49)
(cherry picked from commit c920deb49a83cf504b251588e9e2210a674ff407)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slurm: doesn't build libpmi.so
3 participants