Skip to content

Version 0.6.0

Compare
Choose a tag to compare
@jackalcooper jackalcooper released this 07 Jan 06:06
eabe79e

OneFlow v0.6.0 Release Notes

OneFlow has been open sourced for 528 days since July 31,2020. Today OneFlow v0.6.0 came out. Welcome to use OneFlow v0.6.0. We would love to hear feedback!

This version mainly updates three parts: framework, model, and OneFlow-ONNX. Hightlights include:

  • Performance optimization in static graphs, dynamic graphs, operators, memory occupation, etc
  • A larger number of common operators
  • Improvements in static graphs and ConsistentTensor
  • Serving functionality as Nvidia Triton's backend
  • Richer visual pre-training models similar to torchvision and timm
  • Better OneFlow-ONNX conversion functionality

The following are the detailed release notes.

Framework

1. Performance Optimization of nn.Graph

  • Compared to v0.5.0, nn.Graph in v0.6.0 delivers a 10% speedup in training on models such as ResNet AMP and WDL, etc
    • Optimized nn.Graph's performance in high frequency iterative training scenarios
    • Redesigned the scheduling instructions of nn.Graph and refactored the interaction logic between Actor Graph and Eager VM so that the runtime execution of the Graph is asynchronous and parallel to Python input/output Tensor as much as possible

2. Performance Optimization of Eager

  • Compared to v0.5.0, v0.6.0 OneFlow Eager's training speed increases dramatically in small batch scenarios
    • Optimized the scheduling logic for virtual machines
    • Optimized get/set item
    • Optimized tensor.numel()
    • Optimized oneflow.Size()

3. Performance Optimization of Operators

  • Optimized some operators that affect the performance of new model to significantly improve the training speed of these models

4. Performance Optimization of Eager's Memory Occupation

  • Optimized some operators' memory occupation during net training, making the same computing device run bigger models or data
    • Optimized the backward memory occupation of broadcast binary operators
    • Optimized the backward memory occupation of Slice operator
    • Optimized the memory occupation of LayerNorm operator

5. More Useful Features to Static Computation Graph (nn.Graph)

  • The newly added features are related to the effeciency, debugging, completeness, and usability of static graphs
    • To help the debugging of static graphs, we added the following features:
      • debug mode supports graph.debug(1) printing more information about the composition
      • Provided the environment variable ONEFLOW_DEBUG_PASS to show the changes in the computed graph before and after compile-time optimization
      • Added user-readable thread naming information to Nsight Profile for locating and retrieving target key thread locations
      • Added many static graph test cases and added automatic nn.Graph tests that accompany Eager tests
    • Provided graph.save() and load() interfaces to support the deployment of models (Serving) using nn.Graph
    • To do AMP acceleration on GPUs which use TensorCore, the environment variable ONEFLOW_ENABLE_NHWC is provided to indicate the CNN-related operators for channels last calculation
    • Enabled nn.Graph to support more usage scenarios:
      • Supported for Sparse Update Optimizer for sparse update of parameters in WDL scenarios
      • Supported for using the following nn.Module Containers with nn.Graph:
        Sequential, ModuleList, ModuleDict, ParameterList, and ParameterDict
      • Supported for creating Optimizer in the init function of nn.Graph
      • Supported multiple parameters sharing the same Tensor with nn.Graph
      • Supported for scenarios where the actual number of processes is greater than the number of GPU devices
      • Supported more Inplace execution for Consistent SBP inference under nn.Graph

6. A Larger Number of Operators

7. User-Defined autograd.Function

Users can customize autograd.Function just like using Torch.

8. Added Basic Serving Functionality

Serving functionality of models is provided by OneFlow as Nvidia Triton's backend.

9. Added Some Functionalities of Tensor (ConsistentTensor)

  • Supported Tensor using 2-D SBP to represent arbitrary hybrid parallelism (such as a Linear operation that runs data parallelism in the row direction of the device matrix and model parallelism in the column)
  • Supported Tensor's conversion from arbitrary 1-D SBP to 2-D SBP (the network consists of a mixture of 1-D parallel and 2-D parallel)
  • Supported constructing ConsistentTensor from numpy
  • oneflow.from_numpy()
  • oneflow.numel()
  • tensor.expand_as()

Model

Released flowvision 0.0.54.

1. Richer Visual Pre-training Models

Image Classification

  • CNN series: ResNet, DenseNet, VGG, ResNext, EfficientNet, etc
  • Vision Transformer series: ViT, PVT, Swin-Transformer, etc
  • Vision MLP series: Mlp-Mixer, Res-MLP, g-MLP, etc

Object Detection

  • SSD, SSDLite
  • Faster R-CNN
  • RetinaNet

Image Segmentation

  • FCN
  • DeepLabV3

Style Migration

  • StyleNet: Suport Styles sketch, candy, mosaic, rain_princess, and undie

2. Implemented Data Augmentation Operations Similar to torchvision

For data augmentation operations like CenterCrop and ColorJitter similar to torvhvision, developers can run import flowvision as torchvisionto execute in most scenarios.

3. Implemented Advanced Data Augmentation Opertations Similar to timm

Advanced data augmentation opertations implemented in flowvision.data:

  • Mixup
  • CutMix
  • Random-Erasing
  • AutoAugment
  • RandAugment
  • AugMix

4. Separated the Layers Module and Provided a Plug-and-play Block when Building a Model

flowvision.layers.attention

  • Implemented plug-and-play attention models like Non-Local, SELayer, CBAM, BAM, ECA, etc

flowvision.layers.blocks

  • Provided modules that might be used for model building like PatchEmb, Pooler, ConvBnAct, etc

flowvision.layers.regularization

  • Provided regularization modules such as drop-path, drop-block, and stochastic depth to improve model generalization ability
  • Provided separate files such as activation and weight_init to improve components like activation function and initialize method

OneFlow-ONNX Conversion

Updated OneFlow to ONNX toolkit:

  • Supported OneFlow model converting to ONNX model in CPU or GPU mode
  • Added test cases for operators and models to align all classification models in OneFlowVision library
  • Fixed onnx-runtime bugs during PReLU conversion
  • Compatible with v1.9.0 onnx-runtime library or later versions
  • Released v0.5.4 oneflow-onnx package, and developers can run pip install oneflow-onnx to experience