Skip to content
Compare
Choose a tag to compare

============================== Release Notes: v0.102 ==============================

Support for new training algorithms:

  • LTFB is now a first-class training algorithm.
  • LTFB now allows multiple metrics. The local algorithm is favored by
    each trainer and a partner model must win every metric to be declared
    the tournament winner.
  • The batched iterative optimizer (sgd_training_algorithm) was
    refactored for consistency.
  • Improved documentation of training algorithm infrastructure.

Support for new network structures:

  • ATOM WAE model - character-based Wasserstein Autoencoder
  • Community GAN model for graph data sets

Support for new layers:

  • "DFTAbs" layer that computes the absolute value of the channel-wise
    DFT of the input data
  • Adding support for 3D Matrix Multiplication
  • Added scatter and gather neural network layers
  • CPU-based GRU layers using oneDNN
  • Added batch-wise reduce-sum
  • ArcFace loss

Python front-end:

  • Added 3D U-Net Model
  • Added Cosmoflow Model
  • Ported CANDLE Pilot1 models
  • Support nvprof
  • Added channelwise fully connected layer
  • Added support for non square kernels, padding, stride, and
    dilation for the convolution module
  • Support for OpenMPI launcher

Performance optimizations:

  • Use cuDNN 8 RNN API and CUDA Graphs in GRU layer
  • Cache CUDA Graphs for each active mini-batch size
  • Tuned performance of slice, concatenate, and tessellate layers on
    ARM processors
  • Parallelize computation of Gaussian random numbers
  • Optimizing tessellate, concatenate, and slice layers on CPU

Experiments & Applications:

  • Added experiment scripts for ATOM cWAE Gordon Bell simulations
  • LBANN-ATOM model inference and analysis

Internal features:

  • Wrapper classes for CUDA Graphs API
  • Elementary examples of using complex numbers
  • cuDNN handles are now wrapped in RAII management classes
  • Improved HWLOC compatility for v1.11 and v2.x
  • Added an enum type of visitor hooks that will eventually be used to
    allow callbacks or other visitors to operate at user defined hook
    points
  • Changed checkpoint logic to checkpoint at the start of epochs
    and changed the naming scheme to use the callback phase (visitor
    hook) in the name rather than the current execution context.
  • Added in-memory binary model exchange for LTFB.
  • Added support for ROCm and MIOpen
  • Added support for oneDNN
  • Updated the bamboo test environment to use local executable rather
    than hard coded executables
  • Overhauled and refactored serialization throughout code to use
    Cereal serialization library
  • Significant cleanup and refactoring of code base to improve compile
    times. Moving to ensure that code adheres to standard split of
    header between declaration and implementation functions (for
    templated code). Specifically focused on serialization functions
    and comm class. Reduced dependencies through over reaching header
    inclusions.
  • The relationship of execution_contexts and training_algorithms was
    clarified. There is still work to do here.
  • Added DistConv tests both convolution and pooling layers
  • Support padding in distributed embedding layer
  • Added dump model graph callback
  • Added perturb learning rate callback
  • Added batched inference algorithm
  • Switched ATOM tests to use CPU embedding and tessellate layers to
    minimize noise

I/O & data readers:

  • Experimental data reader that generates graph random walks with
    HavoqGT
  • Added explict tournament execution mode
  • Added support to split training data reader into validation and
    tournament readers
  • node2vec data reader

Build system:

  • Hydrogen v1.5.0+
  • Aluminum v0.5.0+
  • DiHydrogen v0.2.0 is required
  • C++14 or newer standard with CUDA (CMake: "-DCMAKE_CUDA_STANDARD=14")
  • OpenCV is now an optional dependency via CMake "LBANN_WITH_VISION"
  • CNPY is now an optional dependency via CMake "LBANN_WITH_CNPY"
  • Adds support in the build_lbann.sh script for concretizing extra
    packages with the primary LBANN installation
  • New features in the build script to setup / configure the build
    environment, but stop and allow the user to manually add extra
    packages
  • Add a set of user-focused build scripts that use the main
    build_lbann.sh script to setup good defaults on known systems
  • Added application specific build scripts for users such as ATOM
  • Added support for pulling from Spack mirrors and setting them up
  • Split embedded Python support from Python Front End
  • Switched Spack-based build script to use Spack's clingo concretizer

Bug fixes:

  • Fixed a bug where LBANN didn't set the Hydrogen RNG seed
  • Fixed both CosmoFlow and UNet models PFE as well as addressed
    issues in the data reader and data coordinator.
  • Fixed the HDF5 data reader to properly specify the supported I/O
    types
  • Fixed calculation of the linearized response size
  • Fixed the data coordinator's interface to input_layer
  • Fixed error with deterministic execution of dropout layers

Retired features:

  • Removed deprecated JAG leader mode which was made obsolete when the
    data reader moved into the data coordinator
  • Removed the deprecated partitioned data reader modes that were used
    to partition and overlap data sets for multiple models
  • Removed deprecated ActivationDescriptor class
Compare
Choose a tag to compare

============================== Release Notes: v0.101 ==============================

Support for new network structures:

  • ATOM VAE model
  • Graph neural networks
  • Graph Convolutional Networks (GCN)
  • 3D U-Net Model

Support for new layers:

  • Implemented optimized GRU layer using cuDNN kernel
  • Graph Layers: GCN, GIN, Graph, GatedGraph

Python front-end:

  • Support for Graph and Graph Convolutional Networks
  • Added support for OCLF data center (Summit)

Performance optimizations:

  • Optimize CUDA kernel for tensor reordering in GRU layer
  • Enabled TensorCore optimization for GRU layer
  • GCN and Graph layers also have a faster Dense variant which only utilizes Matrix Multiplication

Model portability & usability:

  • Added Users Quickstart section to documentation including PyTorch
    to LBANN mini-tutorial
  • Added section on callbacks with detailed instructions on summarize
    images callback

Internal features:

  • Support for double data type in distributed embedding layer
  • Support for large number of channels in GPU batchnorm layer
  • Modified LTFB so that NaNs lose tournaments
  • Improved numerical stability of reconstruction loss in ATOM VAE
    model
  • Skip bad gradients in Adam

I/O & data readers:

  • Added support for ImageNet data reader to use sample lists
  • Refactored sample list code to be more flexible and generalize
    beyond JAG data reader
  • Added support for slab-based I/O in HDF5 data reader required by
    DistConv implementations of CosmoFlow 3D volumes
  • Extended slab-based HDF5 data reader to support labels and
    reconstruction modes for use with U-Net architecture

Datasets:

  • Added two graph datasets (MNIST, and PROTEINS)

Build system and Dependent Libraries:

  • Hydrogen 1.4.0
  • Aluminum 0.4.0
  • Spack v0.15.4+ (Requires new format for environments)
  • cuDNN 8.0.2
  • Require C++14
  • Added Spack build support for OCLF data center (Summit)

Bug fixes:

  • Properly reset data coordinator after each LTFB round
  • Fixed bug in weights proxy when weights buffer is reallocated
  • Bugfix for smiles data reader bound checking and simple LTFB data
    distribution
  • Eliminated a race condition observed in VAE ATOM model with SMILES
    data reader. Added a barrier after each data store mini-batch
    exchange -- avoid race between non-blocking sends and receives and
    later GPU kernel communication.
Compare
Choose a tag to compare

============================== Release Notes: v0.100 ==============================
Support for new network structures:

  • 3D molecular generation models for Metal Organic Frameworks from the CoRE MOF Database.
  • 3D CosmoFlow Model
  • DenseNet
  • ATOM LSTM model
  • RAS state classifier
  • node2vec
  • Transformer and other attention-based models
  • ExaGAN (formerly CosmoGAN)
  • MaCC ICF surrogate model

Applications:

  • Created a directory of example applications, deprecating the "model zoo" directory

Support for new layers:

  • Embedding layer
  • Distributed embedding layer
  • Channel-wise scale/bias layer
  • Entry-wise scale/bias layer
  • Gated-Recurrent Units (GRU)
  • Entry-wise batchnorm
  • Argmax, Argmin, and one-hot layers
  • Layer norm
  • Deconvolution layer (transposed convolution)
  • Layers for channel-wise operations (channel-wise fully-connected, channel-wise softmax, channel-wise scale/bias, instance norm)
  • Matrix multiply layer

Python front-end:

  • Can now configure contrib launcher with environment variables
  • Added NERSC compute center
  • Per-layer specification of compute device (CPU or GPU)
  • Option to write custom batch scripts with Python front-end

Performance optimizations:

  • Parallelized Python data reader with "multiprocessing" module
  • Fuse batchnorm stats allreduces in FP/BP.
  • Tuned concatenate and slice layer
  • Dynamically allocate and free memory for layer error signals (halves LBANN's memory footprint)

Model portability & usability:

  • Bamboo tests for individual layers

Internal features:

  • Added support for DistConv features (distributed, generalized,
    parallel convolution)
  • Added support for NVSHMEM 1.0 API (used in distributed embedding
    layer and DistConv halo exchange)
  • Support for multiple data types per model (per-layer)
  • Support for per-layer mixed-precision weight training and inference,
    includes per-weight object and objective function mixed-precision.
  • Improved how and when the RNGs are initialized
  • Callback to dump images to TensorBoard
  • Callback to save model weights (useful to export to PyTorch)
  • Callback to save top K models (LTFB)
  • Improved run-to-run reproducibility by initializing weights in alphabetical order
  • Moved models from model_zoo directory to applications directory
  • Cleanup and refactoring of callbacks and layer instantiation
  • Grouped batchnorm statistics
  • Callback to print model description
  • Refactored trainer and training-state out of the model class
  • Support for transposing data in matrix multiply layers
  • Added DiHydrogen tensor and DistConv library
  • Added parallel strategy to layer class to support DistConv
  • LBANN inference mode supports loading models from multiple directories
  • Cleanup of checkpoint and restart logic

I/O & data readers:

  • Added in-memory data store that caches samples in CPU memory. It can be loaded
    during the first epoch or preloaded
  • Added new "transform" data preprocessing ingestion pipeline
  • Added sample list format for specifying data sets
  • Introduced data coordinator that manages data readers and extracts them from
    the input layers
  • Data store is able to checkpoint / spill it's contents to local disk
  • Data reader for SMILE strings

Build system:

  • Hydrogen 1.3.4
  • Aluminum 0.3.3
  • Improved documentation on read the docs (RTD)
  • Robust support for using Spack as a build system around CMake
  • Identified compute centers for specifying build and run dependencies
  • Added Catch2-based tests

Bug fixes:

  • Fixed path resolution for dump weights, save model, and checkpoint callbacks
  • Added mutexes for preloading the data store
  • Fixed the LTFB exchange to include all ADAM optimizer state
  • Fixed the mapping of I/O RNGs to I/O processing threads to ensure
    consistent and correct multi-threaded performance

Retired features:

  • moving MNIST data reader is replaced by python data reader
  • ASCII data reader is deprecated
Compare
Choose a tag to compare

============================== Release Notes: v0.99 ==============================

Support for new training algorithms:

  • Improvements to LTFB infrastructure (including transfer of SGD and Adam hyperparameters)

Support for new network structures:

  • Support for Wide ResNets

Support for new layers:

Python front-end:

  • Python front-end for generating neural network architectures (lbann namespace):
    including layers, objective functions, callbacks, metrics, and optimizers.
  • Python interface for launching (SLURM or LSF) jobs on HPC systems
  • Support for running LBANN experiments and capturing experimental output
  • Network templates for AlexNet, LeNet, arbitrary ResNet models, and Wide ResNet models
  • Python scripts for LeNet, AlexNet, and (Wide) ResNets in model zoo.

Performance optimizations:

  • GPU implementation of RMSprop optimizer.
  • cuDNN convolution algorithms are determined by empirically measuring
    performance rather than using heuristics.
  • Avoid setting up unused bias weights.
  • Perform gradient accumulations in-place when possible.

Model portability & usability:

Internal features:

  • Weight gradient allreduces are in-place rather than on a staging buffer.
  • Fully connected and convolution layers only create bias weights when
    needed.
  • Optimizer exposes gradient buffers so they can be updated in-place.
  • Added callback support to explicitly save model
  • Min-max metric for reporting on multiple LTFB trainers
  • Cleanup of Hydrogen interface to match Hydrogen v1.2.0
  • Added type-erased matrix class for internal refactoring
  • Make CUB always log performance critical events

I/O & data readers:

  • Python data reader that interacts with an embedded Python session.
  • Optimized data store to provide preload option
  • Extended data store to operate with Cosmoflow-numpy data reader

Build system:

  • Added documentation for how users can use Spack to install LBANN
    either directly or via environments.
  • Conduit is a required dependency.
  • Provided Spack environment for installing LBANN as a user
  • Improved documentation on lbann.readthedocs.io
  • CMake installs a module file in the installation directory that
    sets up PATH and PYTHONPATH variables appropriately

Bug fixes:

  • Models can now be copied or setup multiple times.
  • Fixed incorrect weight initialization with multiple trainers.
  • Updated I/O random number generators to be C++ thread safe (rather than OpenMP)
  • Added an I/O random number generator for preprocessing that is independent
    of the data sequence RNG.
  • Fixed initialization order of RNGs and multiple models / trainers.
  • General fixes for I/O and LTFB interaction.

Retired features:

  • "Zero" layer (hack for early GAN implementation).
  • Removed data reader specific implementations of data store (in favor of Conduit-based
    data store)
Compare
Choose a tag to compare

============================== Release Notes: v0.98.1 ==============================

Bug Fixes:

  • Added missing header
Compare
Choose a tag to compare

============================== Release Notes: v0.98 ==============================

Support for new training algorithms:

  • Hyperparameter exploration with Adam optimizers
  • LTFB can perform inter-trainer communication via checkpoint files

Support for new network structures:

  • Wassertein autoencoder

Support for new layers:

  • Squared difference
  • Tessellate
  • Clamp

Performance optimizations:

  • Added support for node-local batch normalization

Model portability & usability:

  • Added prototype Python front end for generating model prototext files
    that is inspired by PyTorch's interface
  • Created Python library of networks and modules used for prototext
    generation
  • Support for exporting and importing models in ONNX format
  • Output dumping callback exports in CSV, TSV, .npy, or .npz formats
  • Added dedicated inference front end

Internal features:

  • Expanded layer documentation
  • Utility class for nicely formatted descriptions
  • Switched to using ReadTheDocs for documentation which uses a
    combination of doxygen, breathe, and sphinx
  • Provided distinction between trainer and model objects
  • Added a generic factory template
  • Refactored front-end functionality into library class

I/O & data readers:

  • Overhauled the I/O system to use an independent background thread
    pool for fetching data
  • Added support for data set metadata file that provides both schema
    and normalization values unique to a given data set. Demonstrated
    use in JAG Conduit data reader.
  • Added support for an index list based approach for describing the
    samples to use in the training and testing. Note that this is
    currently only supported in the JAG Conduit data reader
  • Create a general-purpose data store that operates on generic
    Conduit node data structures. This should provide an extensible
    and generic approach for holding and exchanging data between
    epochs. Note that this is currently only supported in the JAG
    Conduit data reader.

Build system:

  • Support for using Spack environments feature when building

Retired features:

  • Removed deprecated objective functions and target layer
  • Removed distributed I/O buffer layer it has been deprecated by the
    background I/O threads
Compare
Choose a tag to compare

============================== Release Notes: v0.97.1 ==============================

Bug Fixes:

  • Removed deprecated header file include statement
Compare
Choose a tag to compare

============================== Release Notes: v0.97 ==============================

Support for new layers:

  • Mean absolute error and L1 norm
  • GPU implementation for activation layers
  • Log sigmoid and softsign
  • Channel-wise mean (temporary kludge)

Model portability & usability:

  • Hints for layer output dimensions
  • Confusion matrix callback
  • Metric checking callback

Internal features:

  • Removed target-layer-based features from model zoo
  • Layer unit tests check for expected output values

Retired features:

  • Smooth ReLU, bent identity, and swish layers
  • Target-layer-based metrics
  • Target-layer-based models (sequential, greedy layer-wise autoencoder, Siamese)
Compare
Choose a tag to compare

============================== Release Notes: v0.96 ==============================

Support for new layers:

  • Log softmax
  • Basic math functions
  • Weights layer, which outputs a weights tensor
  • L2 norm squared
  • Binary cross entropy loss and sigmoid binary cross entropy loss
  • Boolean accuracy, Boolean false negative rate, Boolean false positive rate
  • Bilinear resize
  • Variance and covariance
  • Dilated and grouped convolution (GPU only)

Performance optimizations:

  • Optimized GPU model-parallel softmax layer

Model portability & usability:

  • Option for weight initialization with user-provided list of values
  • Callback to save any layer output as an image

Internal features:

  • Provide compile time option to selectively disable OpenMP for data fetching loop
  • Thrust calls no longer involve the default CUDA stream

I/O & data readers:

  • Reworked jag_conduit data reader:
    • Support the updated JAG simulation data output format
    • Use direct HDF5 I/O for on-demand data loading with Conduit
    • Ingest a unique set of data files per instance
    • Allow exclusive data partitioning among multiple trainers
    • Multi-channel images
    • Normalization of JAG data
    • Interface to select images of specific views and time indices
    • Interface to describe how to slice JAG data
    • Avoid redundant fetching and incoherent random number pulls in the group of local data readers
  • Improved threading performance by preallocating scratch space for loading samples

Build system:

  • Support cross-compilation configurations in superbuild and SetupProtobuf
Compare
Choose a tag to compare

============================= Release Notes: v0.95 ==============================

Support for new training algorithms:

  • Generative Adversarial Networks (GAN)

Support for new network structures:

  • Variational Autoencoders
  • GAN
  • CycleGAN
  • Combined Autoencoders with CycleGAN
  • Deep Recurrent Attention Model (DRAM), Ba et al. (2015)
  • Video Recurrent Attention Model (VRAM)

Support for new layers:

  • Optimized Top-K accuracy (CPU, GPU)
  • Crop (CPU, GPU)
  • Sort (CPU, GPU) both ascending and descending order
  • Absolute value (CPU, GPU)
  • Mean-squared (CPU, GPU)
  • Top-K categorical accuracy (CPU, GPU)
  • Cross-entropy (CPU, GPU)
  • Stop gradient (CPU, GPU)

Performance optimizations:

  • Use Pinned memory for CPU activations matrices
  • Non-blocking GPU computation of objective functions and metrics
  • Refactored weight matrices and weight initialization
  • Manage GPU workspace buffers with memory pool
  • Slice and concatenation layer emit matrix views if possible
  • Used more fine-grained asynchronous calls when using Aluminum Library
    • Minimized GPU stream synchronization events per call
  • Improved / minimized synchronization events when using a single GPU
  • Fixed GPU workspace size
  • GPU implementation of Adagrad optimizer
  • GPU model-parallel softmax
  • Optimized local CUDA kernel implementations
  • Support for distributed matrices with arbitrary alignment

Model portability & Usability:

  • Keras to LBANN prototext conversion tool

Internals Features:

  • Support for multiple objective functions and metrics per network with arbitrary placement
    • Objective functions represented as layers
    • Metrics represented as layers
    • Introduced evaluation layer construct
  • Ability to freeze specific layers for pre-training / fine-tuning
  • Refactoring tensor setup in setup, forward prop, and back prop
  • Layers store matrices in private smart pointers
  • Model automatically inserts evaluation layers where needed
  • Copy Layer activations between models
  • Annotated GPU profiling output with training phases
  • Fixed initialization of Comm object and Grid objects when using multiple models
  • General code cleanup, refactoring, and various bug fixes.
  • All layers overwrite error signal matrices
  • NCCL backend is now implemented via Aluminum Library
  • MPI calls are routed through the LBANN Comm object into Hydrogen or Aluminum
  • Provide runtime statistics summary from every rank
  • Reworked LBANN to use Hydrogen to manage GPU memory
  • GPU allocations now via CUB memory pool
  • Fixed Spack build interaction with Hydrogen Library

I/O & data readers:

  • Support for Conduit objects with HDF5 formatting
  • In-memory and locally offloaded data store
    • Data Store can hold the entire training set in memory (or node-local storage)
    • Data store will shuffle data samples between epochs and present samples to input layer
  • Updated synthetic data reader
  • Modified data readers to handle bad samples in JAG conduit data
  • Reworked the I/O layers (input and target) so that the input layer produces both the
    sample and label / response if necessary.
    • Target layer is being deprecated
  • Updated image data reader to use cv::imdecode to accelerate image load times
  • Allow users to specify an array of data sources for the independent/dependent
    variables via prototext