Skip to content
  • v0.2
  • 5ddc442
  • Compare
    Choose a tag to compare
    Search for a tag
  • v0.2
  • 5ddc442
  • Compare
    Choose a tag to compare
    Search for a tag

@ndryden ndryden released this Jan 30, 2019 · 68 commits to master since this release

New features/changes:

  • Host-transfer implementations of standard collectives in the MPI-CUDA backend: AllGather, AllToAll, Broadcast, Gather, Reduce, ReduceScatter, and Scatter.
  • Progress engine is now aware of separate compute streams. This enables better scheduling of non-interfering operations.
  • Experimental RMA Put/Get operations.
  • Improved Aluminum algorithm specification.
  • Non-blocking point-to-point operations.
  • Improved testing and benchmarks.
  • Bugfixes and performance improvements.
Assets 2
You can’t perform that action at this time.