New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[GCCcore/10.2.0] OpenMPI v4.0.5, libevent v2.1.12, libfabric v1.11.0, PMIx 3.1.5 #11333
{lib}[GCCcore/10.2.0] OpenMPI v4.0.5, libevent v2.1.12, libfabric v1.11.0, PMIx 3.1.5 #11333
Conversation
@boegelbot please test @ generoso |
@boegel: Request for testing this PR well received on generoso PR test command '
Test results coming soon (I hope)... - notification for comment with ID 697295413 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
Test report by @boegel |
Test report by @boegel |
Test report by @boegel |
…asyconfigs into 20200923113619_new_pr_OpenMPI405
Test report by @lexming |
Test report by @lexming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This OpenMPI is not working well on my side. A simple MPI hello world program fails to initialise OpenFabrics
$ mpirun ./test
[node379.hydra.os:24944] [[51950,0],0] ORTE_ERROR_LOG: Out of resource in file util/show_help.c at line 501
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: node378
Local device: mlx5_0
--------------------------------------------------------------------------
Hello world from processor node379.hydra.os, rank 0 out of 2 processors
Hello world from processor node378.hydra.os, rank 1 out of 2 processors
OSU-Micro-benchmarks has the same issue
# OSU MPI Latency Test v5.6.3
# Size Latency (us)
1024 2.08
2048 2.83
4096 3.72
8192 5.46
16384 7.56
32768 9.83
65536 14.34
131072 22.34
262144 32.28
524288 54.46
1048576 97.91
2097152 181.33
4194304 354.37
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: node378
Local device: mlx5_0
--------------------------------------------------------------------------
[node379.hydra.os:15539] [[38701,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/show_help.c at line 501
The execution completes in both cases, but those errors are not good.
Started a test build on a "clean" arm box. It'll take a bit. It started building M4... The box has no toolchains. :) |
Test report by @terjekv |
The problem here is that we should be configuring OpenMPI with Please try again with the updated OpenMPI easyblock from easybuilders/easybuild-easyblocks#2188 . |
Test report by @lexming |
Test report by @lexming |
@boegel thanks a lot, that was indeed the issue. We have been already disabling verbs in our production system, but I was totally misled by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@lexming So let's merge? Or do you want to see more tests? |
Test report by @boegel |
Going in, thanks @boegel ! |
(created using
eb --new-pr
)requires
easybuilders/easybuild-easyblocks#2184+#11320(UCX) +#11332(hwloc)