Skip to content
Choose a tag to compare


@ndryden ndryden released this
· 260 commits to master since this release
Choose a tag to compare

New features/changes:

  • Host-transfer implementations of standard collectives in the MPI-CUDA backend: AllGather, AllToAll, Broadcast, Gather, Reduce, ReduceScatter, and Scatter.
  • Progress engine is now aware of separate compute streams. This enables better scheduling of non-interfering operations.
  • Experimental RMA Put/Get operations.
  • Improved Aluminum algorithm specification.
  • Non-blocking point-to-point operations.
  • Improved testing and benchmarks.
  • Bugfixes and performance improvements.