Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The script doesn't create a correct Standard_HC44rs cluster with Mellanox EDR InfiniBand #374

Closed
Smahane opened this issue Sep 16, 2020 · 13 comments
Labels
bug Something isn't working

Comments

@Smahane
Copy link

Smahane commented Sep 16, 2020

I need to run HPC applications (like LAMMPS) on a high bandwidth but the cluster doesn't seems to be configured correctly.

Describe the bug

  • Mpi is not picking up Mellanox
  • LAMMPS doesn't scale as expected (lower performance over tcp)

To Reproduce
Steps to reproduce the behavior:

  1. create a cluster using this config.json
    mlx-fail.config.txt

  2. setup the environment and run
    mpiexec.hydra -f ${hostfile} -n 88 -ppn 44 ./lmp_intel_cpu_intelmpi -in in.intel.lc -v x 4 -v y 2 -v z 2 -pk intel 0 omp 1 -sf intel

Expected behavior

  • get the expected network bandwidth
  • Mlx provider should be detected and used:
    [0] MPI startup(): libfabric provider: mlx
  • LAMMPS should scale on 2 or more nodes.

** Other details
created a cluster with OpenLogic:CentOS:7_7-gen2:latest
Used Standard_HC44rs for compute nodes
Used Standard_D8s_v3 for headnode

** Additional details

To try to grantee that Mellanox is installed and setup correctly, i tried to set the headnode to be Standard_HC44rs but it fails with the error bellow:

image

@Smahane Smahane added the bug Something isn't working label Sep 16, 2020
@jithinjosepkl
Copy link

AN (Accelerated networking) is not available for HC. Please set "accelerated_networking": false for HC

@Smahane
Copy link
Author

Smahane commented Sep 16, 2020

@jithinjosepkl in the config file i shared? Will that fix the interconnect issue?

@jithinjosepkl
Copy link

Please follow this article to make IMPI pick the mlx provider.

Next update of CentOS HPC images will include IMPI 2019-U8, where you don't have to specify this environment parameter (picks up mlx provider by default).

@Smahane
Copy link
Author

Smahane commented Sep 16, 2020

@jithinjosepkl I would appreciate a working scenario as the cost is getting very high without us getting a job ran correctly.
I'm interested in Standard_HC44rs cluster with Mellanox and Gen2 VM

surprisingly the OpenLogic:CentOS:7_7-gen2:latest image didn't come with any IMPI installed (It sounds like it get overridden maybe via the azurehpc scripts?). So i had to install IMPI manually .

When i setup the Mlx, i get an error because the interconnect is not installed/setup correctly:

[hpcadmin@headnode lammps-avx512]$ export FI_PROVIDER=mlx
[hpcadmin@headnode lammps-avx512]$ mpirun -n 2 -ppn 44 /opt/intel/psxe_runtime/linux/mpi/intel64/bin/IMB-MPI1 pingpong
[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 8  Build 20200624 (id: 4f16ad915)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1-impi
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:No data available)

@garvct
Copy link
Collaborator

garvct commented Sep 16, 2020

The CentOS images do not contain any pre-installed OFED drivers and mpi libraries (including Intel MPI), try using the CentOS-HPC images instead (e.g OpenLogic:CentOS-HPC:7_7-gen2:latest). The CentOS-HPC images should be ready to go if you want to use Infiniband on HC44 skus.

@Smahane
Copy link
Author

Smahane commented Sep 16, 2020

@garvct I'm already using OpenLogic:CentOS-HPC:7_7-gen2:latest image. Please see my original post

@jithinjosepkl
Copy link

@Smahane , based on your config file, you are using OpenLogic:CentOS:7_7-gen2:latest.
"hpc_image": "OpenLogic:CentOS:7_7-gen2:latest",

You need OpenLogic:CentOS-HPC:7_7-gen2:latest image instead for the MPIs to be pre-installed.

@garvct
Copy link
Collaborator

garvct commented Sep 16, 2020

Once you sort out your image, you can execute the IMB-MPI1 benchmark using the scripts in azurehpc/apps/imb-mpi. Examples of running IMB-MPI1 with different MPI libraries are provided.

@Smahane
Copy link
Author

Smahane commented Sep 17, 2020

@garvct and @jithinjosepkl thank you for pointing me out of this HPC image.

  • Should headnode have this image too?
  • Should I still set "accelerated_networking": false" ? And what does it mean?
  • any instructions of how to add my own post installation script to the config file?

@xpillons
Copy link
Collaborator

xpillons commented Sep 17, 2020

@Smahane

  • The Headnode can have the HPC image too
  • Accelerated Networking is offloading TCP and will boost the frontend NIC, but is not available on all VM SKUs. See the public documentation here. Accelerated Networking doesn't apply to the Infiniband NIC and today is not supported on our HPC VM SKUs yet.
  • You can add your own scripts by
    • Adding your scripts in the scripts directory where your config file is stored
    • Add a custom tag on the resources you want your scripts to be applied
    • In the install array add a section for each script you want to be applied, specify if sudo is required, and add params and deps files if needed

@Smahane
Copy link
Author

Smahane commented Sep 18, 2020

This worked. Thank you everyone

@Smahane Smahane closed this as completed Sep 18, 2020
@Smahane
Copy link
Author

Smahane commented Sep 22, 2020

Hello @xpillons @garvct and @jithinjosepkl . I ran LAMMPS on up to 8 nodes Standard_HC44rs but I'm having performance issues at 8 nodes:

image

I think one or more nodes are bad. Do you know of any smoke test or script that can help me check the nodes and detect which one is not working well?

the azhpc_install_config/install/11_node_healthchecks.log doesn't show any errors.

Thank you,

@xpillons
Copy link
Collaborator

@Smahane you can use the MPI PingPong test, we have an example here https://github.com/Azure/azurehpc/tree/master/apps/imb-mpi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants