Skip to content

AWS OFI NCCL v1.7.3

Compare
Choose a tag to compare
@bwbarrett bwbarrett released this 05 Oct 19:26
· 262 commits to master since this release
v1.7.3-aws

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:

  • Do not disable LL and LL128 protocols on P5 instances.
  • Add support for g5.48xlarge instance types.
  • Fix a block in use leak in the freelist implementation.
  • For NCCL 2.18.5 or later, don't disable NVLS support.
  • Fix bug in handling retry error issues from Libfabric in the RDMA transport (P5 instance types).

This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.