Skip to content

v2.4.2-1

@sjeaugey sjeaugey tagged this 29 Jan 23:19
Add tree algorithms for allreduce to improve performance at scale.
Add ncclCommAbort() and ncclCommGetAsyncError() to properly handle
network errors and be permit recover.
Detect initial CPU affinity and no longer escape it.
Assets 2