Skip to content

AWS OFI NCCL v1.8.1

Compare
Choose a tag to compare
@rajachan rajachan released this 25 Feb 21:40
· 138 commits to master since this release
v1.8.1-aws

This is a bugfix release that requires Libfabric v1.18.0 or later and supports NCCL v2.19.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).

Bug Fixes:

  • Fix an issue with the ID pool's reference counting and allocation
  • Improved error propagation for failed NCCL requests, allowing applications to fail early instead of blocking on requests that can never be completed.

The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:

  • efa

Checksum (sha512) for the release tarball:

4ee21380176d5a76e4af0233ac44d1d46f92fd34941ecfaa104b7567a16cc84503c0abe59e540d36d79675bb3cc443979ed319f39582e301814d0653ea184508  aws-ofi-nccl-1.8.1-aws.tar.gz