Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PSM2 dependency to libfabric 1.12.1 and newer #20501

Merged

Conversation

jfgrimm
Copy link
Member

@jfgrimm jfgrimm commented May 9, 2024

motivation:

  • performance of OpenMPI, MPICH, etc is poor on OmniPath systems with the default EasyBuild configuration
  • on OmniPath, you want to use either PSM2 or OPX (via libfabric)
  • adding the PSM2 dependency has no real downside for those on mellanox IB systems, but makes it simplifies the changes needed for those with OPA (no need to hook in extra dependencies)

will create a follow-up PR that adds a build_info_msg to OpenMPI and MPICH

(created using eb --new-pr)

@jfgrimm jfgrimm requested a review from branfosj May 9, 2024 13:22
@jfgrimm
Copy link
Member Author

jfgrimm commented May 9, 2024

Test report by @jfgrimm
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
node048.viking2.yor.alces.network - Linux Rocky Linux 8.9, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/6c1083187d0f5f75207010e845162344 for a full test report.

@branfosj
Copy link
Member

branfosj commented May 9, 2024

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on login1

PR test command 'EB_PR=20501 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_20501 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13436

Test results coming soon (I hope)...

- notification for comment with ID 2102701752 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@branfosj
Copy link
Member

branfosj commented May 9, 2024

Test report by @branfosj
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
bear-pg0105u03a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/e4f56531d01700895e14eb5f7ac55fbd for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/c9ba5b04dd129a19f04fa04c6095199f for a full test report.

@branfosj branfosj added this to the release after 4.9.1 milestone May 9, 2024
@branfosj
Copy link
Member

branfosj commented May 9, 2024

@boegelbot please test @ generoso
EB_ARGS="libfabric-1.13.0-GCCcore-11.2.0.eb libfabric-1.13.1-GCCcore-11.2.0.eb libfabric-1.13.2-GCCcore-11.2.0.eb libfabric-1.15.1-GCCcore-11.3.0.eb libfabric-1.16.1-GCCcore-12.2.0.eb libfabric-1.18.0-GCCcore-12.3.0.eb libfabric-1.19.0-GCCcore-13.2.0.eb"

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on login1

PR test command 'EB_PR=20501 EB_ARGS="libfabric-1.13.0-GCCcore-11.2.0.eb libfabric-1.13.1-GCCcore-11.2.0.eb libfabric-1.13.2-GCCcore-11.2.0.eb libfabric-1.15.1-GCCcore-11.3.0.eb libfabric-1.16.1-GCCcore-12.2.0.eb libfabric-1.18.0-GCCcore-12.3.0.eb libfabric-1.19.0-GCCcore-13.2.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_20501 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13437

Test results coming soon (I hope)...

- notification for comment with ID 2102819249 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@branfosj
Copy link
Member

branfosj commented May 9, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="libfabric-1.13.0-GCCcore-11.2.0.eb libfabric-1.13.1-GCCcore-11.2.0.eb libfabric-1.13.2-GCCcore-11.2.0.eb libfabric-1.15.1-GCCcore-11.3.0.eb libfabric-1.16.1-GCCcore-12.2.0.eb libfabric-1.18.0-GCCcore-12.3.0.eb libfabric-1.19.0-GCCcore-13.2.0.eb"

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=20501 EB_ARGS="libfabric-1.13.0-GCCcore-11.2.0.eb libfabric-1.13.1-GCCcore-11.2.0.eb libfabric-1.13.2-GCCcore-11.2.0.eb libfabric-1.15.1-GCCcore-11.3.0.eb libfabric-1.16.1-GCCcore-12.2.0.eb libfabric-1.18.0-GCCcore-12.3.0.eb libfabric-1.19.0-GCCcore-13.2.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_20501 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4096

Test results coming soon (I hope)...

- notification for comment with ID 2102853899 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/9c69372388a3682f127e5ffc91ba8e7f for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/93a2be41facd6539673c64bfc6b39c1d for a full test report.

@branfosj
Copy link
Member

branfosj commented May 9, 2024

Going in, thanks @jfgrimm!

@branfosj branfosj merged commit 2a99211 into easybuilders:develop May 9, 2024
9 checks passed
ocaisa added a commit to EESSI/software-layer that referenced this pull request May 15, 2024
`PSM2` was introduced as a dependency of `libfabric` in easybuilders/easybuild-easyconfigs#20501. 

We already have PSM2 in the compat layer, so we can filter this dependency out, but longer term we probably actually want it since it should be built with accelerator support.
@boegel
Copy link
Member

boegel commented May 22, 2024

@jfgrimm Can you update the PR description to motivate this a bit (just for future reference)?

@@ -43,6 +43,7 @@ builddependencies = [

dependencies = [
('numactl', '2.0.14'),
('PSM2', '12.0.1'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed during today's EasyBuild conf call, this should be done conditionally, only for x86_64 since PSM2 is not compatible with Arm (and probably also not with RISC-V).

@SebastianAchilles will open a PR for that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #20585

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants