Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm nodenames not matching CycleCloud hostnames cause some MPI variants to fail #65

Open
anhoward opened this issue Sep 16, 2021 · 2 comments

Comments

@anhoward
Copy link
Contributor

Depending on the version of MPI or ISV code being used, occasionally they try to rely on the Slurm nodenames which aren't actual resolvable hostnames. This causes the jobs to fail.

It would be good if the actual hostnames on the nodes and in Azure DNS matched the nodename used in Slurm.

@gjhw
Copy link

gjhw commented Nov 8, 2021

We are seeing this with Abaqus. It's worth noting that we are confined to running in UK South where we only have H Series available, which do not have SR-IOV support and therefore limits us to Intel MPI. When HC Series lands later this year (with SR-IOV support), we expect to be able to use the MPI that ships with Abaqus and will see if this allows multi-node jobs to run when Slurm node names do not match host names.

@tbugfinder
Copy link

It looks like this is now supported with v2.5.0 / v2.5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants