Skip to content

v0.102

Compare
Choose a tag to compare
@bvanessen bvanessen released this 28 May 00:16
· 505 commits to develop since this release

============================== Release Notes: v0.102 ==============================

Support for new training algorithms:

  • LTFB is now a first-class training algorithm.
  • LTFB now allows multiple metrics. The local algorithm is favored by
    each trainer and a partner model must win every metric to be declared
    the tournament winner.
  • The batched iterative optimizer (sgd_training_algorithm) was
    refactored for consistency.
  • Improved documentation of training algorithm infrastructure.

Support for new network structures:

  • ATOM WAE model - character-based Wasserstein Autoencoder
  • Community GAN model for graph data sets

Support for new layers:

  • "DFTAbs" layer that computes the absolute value of the channel-wise
    DFT of the input data
  • Adding support for 3D Matrix Multiplication
  • Added scatter and gather neural network layers
  • CPU-based GRU layers using oneDNN
  • Added batch-wise reduce-sum
  • ArcFace loss

Python front-end:

  • Added 3D U-Net Model
  • Added Cosmoflow Model
  • Ported CANDLE Pilot1 models
  • Support nvprof
  • Added channelwise fully connected layer
  • Added support for non square kernels, padding, stride, and
    dilation for the convolution module
  • Support for OpenMPI launcher

Performance optimizations:

  • Use cuDNN 8 RNN API and CUDA Graphs in GRU layer
  • Cache CUDA Graphs for each active mini-batch size
  • Tuned performance of slice, concatenate, and tessellate layers on
    ARM processors
  • Parallelize computation of Gaussian random numbers
  • Optimizing tessellate, concatenate, and slice layers on CPU

Experiments & Applications:

  • Added experiment scripts for ATOM cWAE Gordon Bell simulations
  • LBANN-ATOM model inference and analysis

Internal features:

  • Wrapper classes for CUDA Graphs API
  • Elementary examples of using complex numbers
  • cuDNN handles are now wrapped in RAII management classes
  • Improved HWLOC compatility for v1.11 and v2.x
  • Added an enum type of visitor hooks that will eventually be used to
    allow callbacks or other visitors to operate at user defined hook
    points
  • Changed checkpoint logic to checkpoint at the start of epochs
    and changed the naming scheme to use the callback phase (visitor
    hook) in the name rather than the current execution context.
  • Added in-memory binary model exchange for LTFB.
  • Added support for ROCm and MIOpen
  • Added support for oneDNN
  • Updated the bamboo test environment to use local executable rather
    than hard coded executables
  • Overhauled and refactored serialization throughout code to use
    Cereal serialization library
  • Significant cleanup and refactoring of code base to improve compile
    times. Moving to ensure that code adheres to standard split of
    header between declaration and implementation functions (for
    templated code). Specifically focused on serialization functions
    and comm class. Reduced dependencies through over reaching header
    inclusions.
  • The relationship of execution_contexts and training_algorithms was
    clarified. There is still work to do here.
  • Added DistConv tests both convolution and pooling layers
  • Support padding in distributed embedding layer
  • Added dump model graph callback
  • Added perturb learning rate callback
  • Added batched inference algorithm
  • Switched ATOM tests to use CPU embedding and tessellate layers to
    minimize noise

I/O & data readers:

  • Experimental data reader that generates graph random walks with
    HavoqGT
  • Added explict tournament execution mode
  • Added support to split training data reader into validation and
    tournament readers
  • node2vec data reader

Build system:

  • Hydrogen v1.5.0+
  • Aluminum v0.5.0+
  • DiHydrogen v0.2.0 is required
  • C++14 or newer standard with CUDA (CMake: "-DCMAKE_CUDA_STANDARD=14")
  • OpenCV is now an optional dependency via CMake "LBANN_WITH_VISION"
  • CNPY is now an optional dependency via CMake "LBANN_WITH_CNPY"
  • Adds support in the build_lbann.sh script for concretizing extra
    packages with the primary LBANN installation
  • New features in the build script to setup / configure the build
    environment, but stop and allow the user to manually add extra
    packages
  • Add a set of user-focused build scripts that use the main
    build_lbann.sh script to setup good defaults on known systems
  • Added application specific build scripts for users such as ATOM
  • Added support for pulling from Spack mirrors and setting them up
  • Split embedded Python support from Python Front End
  • Switched Spack-based build script to use Spack's clingo concretizer

Bug fixes:

  • Fixed a bug where LBANN didn't set the Hydrogen RNG seed
  • Fixed both CosmoFlow and UNet models PFE as well as addressed
    issues in the data reader and data coordinator.
  • Fixed the HDF5 data reader to properly specify the supported I/O
    types
  • Fixed calculation of the linearized response size
  • Fixed the data coordinator's interface to input_layer
  • Fixed error with deterministic execution of dropout layers

Retired features:

  • Removed deprecated JAG leader mode which was made obsolete when the
    data reader moved into the data coordinator
  • Removed the deprecated partitioned data reader modes that were used
    to partition and overlap data sets for multiple models
  • Removed deprecated ActivationDescriptor class