@bvanessen bvanessen released this Feb 12, 2019 · 4 commits to master since this release

Fixed the internal version number to properly reflect the release
number. This is required for checking API compatibility.

Assets 2

@ndryden ndryden released this Jan 30, 2019 · 26 commits to master since this release

New features/changes:

  • Host-transfer implementations of standard collectives in the MPI-CUDA backend: AllGather, AllToAll, Broadcast, Gather, Reduce, ReduceScatter, and Scatter.
  • Progress engine is now aware of separate compute streams. This enables better scheduling of non-interfering operations.
  • Experimental RMA Put/Get operations.
  • Improved Aluminum algorithm specification.
  • Non-blocking point-to-point operations.
  • Improved testing and benchmarks.
  • Bugfixes and performance improvements.
Assets 2

@bvanessen bvanessen released this Sep 14, 2018 · 139 commits to master since this release

Aluminum provides a generic interface to high-performance communication libraries, with a focus on allreduce algorithms. Blocking and non-blocking algorithms and GPU-aware algorithms are supported. Aluminum also contains custom implementations of select algorithms to optimize for certain situations.

Features:

  • Blocking and non-blocking algorithms
  • GPU-aware algorithms
  • Implementations/interfaces:
    • MPI: MPI and custom algorithms implemented on top of MPI
    • NCCL: Interface to Nvidia's NCCL 2 library
    • MPI-CUDA: Custom GPU-aware algorithms
Assets 2