Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patch for adding a write memory barrier to all OpenMPI 4.1.x easyconfigs #19940

Merged
merged 3 commits into from Feb 21, 2024

Conversation

bedroge
Copy link
Contributor

@bedroge bedroge commented Feb 20, 2024

This solves a bug in the smcuda btl that causes MPI applications to crash or hang on Neoverse V1 CPUs, see open-mpi/ompi#12270. The issue was fixed in open-mpi/ompi#12344 for OpenMPI 4.1.x (4.1.7 should include the fix).

@bedroge
Copy link
Contributor Author

bedroge commented Feb 20, 2024

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19940 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19940 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12926

Test results coming soon (I hope)...

- notification for comment with ID 1953959787 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge
Copy link
Contributor Author

bedroge commented Feb 20, 2024

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=19940 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_19940 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3642

Test results coming soon (I hope)...

- notification for comment with ID 1954126426 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 14 out of 14 (14 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/ef2751a131a3499141f02d45a7eb29aa for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 15 out of 16 (14 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/7b076361a2cf4873b7486a6b07a4ed4a for a full test report.

@boegel boegel added this to the release after 4.9.0 milestone Feb 20, 2024
@bedroge
Copy link
Contributor Author

bedroge commented Feb 21, 2024

Test report by @boegelbot
FAILED
Build succeeded for 15 out of 16 (14 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/7b076361a2cf4873b7486a6b07a4ed4a for a full test report.

They all succeeded except OpenMPI-4.1.1-intel-compilers-2021.4.0.eb:

============================================================================
== Configuring Open MPI
============================================================================

*** Startup tests
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for x86_64-pc-linux-gnu-gcc... icc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/tmp/boegelbot/OpenMPI/4.1.1/intel-compilers-2021.4.0/openmpi-4.1.1':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
 (at easybuild/easybuild-framework/easybuild/tools/run.py:682 in parse_cmd_output)

Not sure what's wrong here...

@boegel
Copy link
Member

boegel commented Feb 21, 2024

Test report by @boegel
SUCCESS
Build succeeded for 14 out of 14 (14 easyconfigs in total)
node3106.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/fb69197b37e2604f56a805224de670e5 for a full test report.

@boegel
Copy link
Member

boegel commented Feb 21, 2024

Test report by @boegelbot
FAILED
Build succeeded for 15 out of 16 (14 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/7b076361a2cf4873b7486a6b07a4ed4a for a full test report.

They all succeeded except OpenMPI-4.1.1-intel-compilers-2021.4.0.eb:

============================================================================
== Configuring Open MPI
============================================================================

*** Startup tests
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for x86_64-pc-linux-gnu-gcc... icc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/tmp/boegelbot/OpenMPI/4.1.1/intel-compilers-2021.4.0/openmpi-4.1.1':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
 (at easybuild/easybuild-framework/easybuild/tools/run.py:682 in parse_cmd_output)

Not sure what's wrong here...

It's related to RHEL9 and the newer glibc:

In file included from conftest.c(10):
/usr/include/stdio.h(824): error: attribute "__malloc__" does not take arguments
    __attribute_malloc__ __attr_dealloc (pclose, 1) __wur;
                         ^

compilation aborted for conftest.c (code 2)
configure:6725: $? = 2
configure:6732: ./conftest
./configure: line 6734: ./conftest: No such file or directory

It seems like that Intel compilers version is basically not compatible with RHEL 9.

I won't let that block this PR, the problem is not caused by the patch being added.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Feb 21, 2024

Going in, thanks @bedroge!

@boegel boegel merged commit 01084d1 into easybuilders:develop Feb 21, 2024
9 checks passed
@bedroge bedroge deleted the openmpi_4.1.x_add_wmb branch February 21, 2024 20:04
@bedroge bedroge added the EESSI Related to EESSI project label Feb 21, 2024
@boegel boegel added the aarch64 Related to Arm 64-bit (aarch64) label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aarch64 Related to Arm 64-bit (aarch64) bug fix EESSI Related to EESSI project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants