@szha szha released this Sep 11, 2018 · 148 commits to master since this release

Assets 5

MXNet Change Log

1.3.0

New Features - Gluon RNN layers are now HybridBlocks

  • In this release, Gluon RNN layers such as gluon.rnn.RNN, gluon.rnn.LSTM, gluon.rnn.GRU becomes HybridBlocks as part of gluon.rnn improvements project (#11482).
  • This is the result of newly available fused RNN operators added for CPU: LSTM(#10104), vanilla RNN(#11399), GRU(#10311)
  • Now many dynamic networks that are based on Gluon RNN layers can now be completely hybridized, exported, and used in the inference APIs in other language bindings such as R, Scala, etc.

MKL-DNN improvements

  • Introducing more functionality support for MKL-DNN as follows:
    • Added support for more activation functions like, "sigmoid", "tanh", "softrelu". (#10336)
    • Added Debugging functionality: Result check (#12069) and Backend switch (#12058).

New Features - Gluon Model Zoo Pre-trained Models

  • Gluon Vision Model Zoo now provides MobileNetV2 pre-trained models (#10879) in addition to
    AlexNet, DenseNet, Inception V3, MobileNetV1, ResNet V1 and V2, SqueezeNet 1.0 and 1.1, and VGG
    pretrained models.
  • Updated pre-trained models provide state-of-the-art performance on all resnetv1, resnetv2, and vgg16, vgg19, vgg16_bn, vgg19_bn models (#11327 #11860 #11830).

New Features - Clojure package (experimental)

  • MXNet now supports the Clojure programming language. The MXNet Clojure package brings flexible and efficient GPU computing and state-of-art deep learning to Clojure. It enables you to write seamless tensor/matrix computation with multiple GPUs in Clojure. It also lets you construct and customize the state-of-art deep learning models in Clojure, and apply them to tasks, such as image classification and data science challenges.(#11205)
  • Checkout examples and API documentation here.

New Features - Synchronized Cross-GPU Batch Norm (experimental)

  • Gluon now supports Synchronized Batch Normalization (#11502).
  • This enables stable training on large-scale networks with high memory consumption such as FCN for image segmentation.

New Features - Sparse Tensor Support for Gluon (experimental)

  • Sparse gradient support is added to gluon.nn.Embedding. Set sparse_grad=True to enable when constructing the Embedding block. (#10924)
  • Gluon Parameter now supports "row_sparse" storage type, which reduces communication cost and memory consumption for multi-GPU training for large models. gluon.contrib.nn.SparseEmbedding is an example empowered by this. (#11001, #11429)
  • Gluon HybridBlock now supports hybridization with sparse operators (#11306).

New Features - Control flow operators (experimental)

  • This is the first step towards optimizing dynamic neural networks with variable computation graphs, by adding symbolic and imperative control flow operators. Proposal.
  • New operators introduced: foreach(#11531), while_loop(#11566), cond(#11760).

New Features - Scala API Improvements (experimental)

  • Improvements to MXNet Scala API usability(#10660, #10787, #10991)
  • Symbol.api and NDArray.api would bring new set of functions that have complete definition for all arguments.
  • Please see this Type safe API design document for more details.

New Features - Rounding GPU Memory Pool for dynamic networks with variable-length inputs and outputs (experimental)

  • MXNet now supports a new memory pool type for GPU memory (#11041).
  • Unlike the default memory pool requires exact size match to reuse released memory chunks, this new memory pool uses exponential-linear rounding so that similar sized memory chunks can all be reused, which is more suitable for all the workloads with dynamic-shape inputs and outputs. Set environment variable MXNET_GPU_MEM_POOL_TYPE=Round to enable.

New Features - Topology-aware AllReduce (experimental)

  • This features uses trees to perform the Reduce and Broadcast. It uses the idea of minimum spanning trees to do a binary tree Reduce communication pattern to improve it. This topology aware approach reduces the existing limitations for single machine communication shown by mehods like parameter server and NCCL ring reduction. It is an experimental feature (#11591).
  • Paper followed for implementation: Optimal message scheduling for aggregation.
  • Set environment variable MXNET_KVSTORE_USETREE=1 to enable.

New Features - Export MXNet models to ONNX format (experimental)

  • With this feature, now MXNet models can be exported to ONNX format(#11213). Currently, MXNet supports ONNX v1.2.1. API documentation.
  • Checkout this tutorial which shows how to use MXNet to ONNX exporter APIs. ONNX protobuf so that those models can be imported in other frameworks for inference.

New Features - TensorRT Runtime Integration (experimental)

  • TensorRT provides significant acceleration of model inference on NVIDIA GPUs compared to running the full graph in MxNet using unfused GPU operators. In addition to faster fp32 inference, TensorRT optimizes fp16 inference, and is capable of int8 inference (provided the quantization steps are performed). Besides increasing throughput, TensorRT significantly reduces inference latency, especially for small batches.
  • This feature in MXNet now introduces runtime integration of TensorRT into MXNet, in order to accelerate inference.(#11325)
  • Currently, its in contrib package.

New Examples - Scala

  • Refurnished Scala Examples with improved API, documentation and CI test coverage. (#11753, #11621 )
  • Now all Scala examples have:
    • No bugs block in the middle
    • Good Readme to start with
    • with Type-safe API usage inside
    • monitored in CI in each PR runs

Maintenance - Flaky Tests improvement effort

  • Fixed 130 flaky tests on CI. Tracked progress of the project here.
  • Add flakiness checker (#11572)

Maintenance - MXNet Model Backwards Compatibility Checker

  • This tool (#11626) helps in ensuring consistency and sanity while performing inference on the latest version of MXNet using models trained on older versions of MXNet.
  • This tool will help in detecting issues earlier in the development cycle which break backwards compatibility on MXNet and would contribute towards ensuring a healthy and stable release of MXNet.

Maintenance - Integrated testing for "the Straight Dope"

  • "Deep Learning - The Straight Dope" is a deep learning book based on Apache MXNet Gluon that are contributed by many Gluon users.
  • Now the testing of this book is integrated in the nightly tests.

Bug-fixes

  • Fix gperftools/jemalloc and lapack warning bug. (#11110)
  • Fix mkldnn performance regression + improve test logging (#11262)
  • Fix row_sparse_param.save() (#11266)
  • Fix trainer init_kvstore (#11266)
  • Fix axis Bug in MKLDNN Softmax (#11335)
  • Fix 'AttributeError: '_thread._local' object has no attribute 'value'' on distributed processing applications (#11332)
  • Fix recordfile dataset with multi worker (#11370)
  • Manually check node existence in CachedOp (#11545)
  • Javadoc fix (#11239)
  • Fix bugs in MKLDNN operators to handle the kAddTo request (#11129)
  • Fix InferStorage for sparse fallback in FullyConnected (#11498)
  • Fix batchnorm problem with sparse matrices when fix_gamma=True (#11656)
  • Fix rnn layer save (#11776)
  • Fix BucketSentenceIter bug related to #11430 (#11580)
  • Fix for _backward_softsign activation (#11827)
  • Fix a bug in CachedOp. (#11675)
  • Fix quantization divide by zero errors (#11833)
  • Refactor R optimizers to fix memory leak (#11374)
  • Avoid use of troublesome cudnnFind() results when grad_req='add' (#11338)
  • Fix shared memory with gluon dataloader, add option pin_memory (#11908)
  • Fix quantized graph pass bug (#11937)
  • Fix MXPredReshape in the c_predict_api (#11493)
  • Fix the topk regression issue (#12197)
  • Fix image-classification example and add missing optimizers w/ momentum support (#11826)

Performance Improvements

  • Added static allocation and static shape for HybridBloc gluon (#11320)
  • Fix RecordIO augmentation speed (#11474)
  • Improve sparse pull performance for gluon trainer (#11429)
  • CTC operator performance improvement from HawkAaron/MXNet-CTC (#11834)
  • Improve performance of broadcast ops backward pass (#11252)
  • Improved numerical stability as a result of using stable L2 norm (#11573)
  • Accelerate the performance of topk for GPU and CPU side (#12085 #10997 ; This changes the behavior of topk when nan values occur in the input)
  • Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU (#11113)
  • Performance improvement for Batch Dot on CPU from mshadow (mshadow PR#342)

API Changes

  • Allow Scala users to specify data/label names for NDArrayIter (#11256)
  • Allow user to define unknown token symbol to rnn encode_sentences() (#10461)
  • Added count_include_pad argument for Avg Pooling (#11021)
  • Add standard ResNet data augmentation for ImageRecordIter (#11027)
  • Add seed_aug parameter for ImageRecordIter to fix random seed for default augmentation (#11247)
  • Add support for accepting MXNet NDArrays in ColorNormalizeAug (#11606)
  • Enhancement of take operator (#11326)
  • Add temperature parameter in Softmax operator (#11466)
  • Add support for 1D inputs in leaky relu (#11850)
  • Add verify_ssl option to gluon.utils.download (#11546)

Other features

  • Added ccache reporting to CI (#11322)
  • Restructure dockcross dockerfiles to fix caching (#11302)
  • Added tests for MKLDNN backward operators (#11232)
  • Add elemwise_add/sub between rsp and rsp on GPU (#11179)
  • Add clip_global_norm(row_sparse_grad) (#11266)
  • Add subgraph storage type inference to CachedOp (#11306)
  • Enable support for dense weight and sparse grad Adagrad updates (#11355)
  • Added Histogram Operator (#10931)
  • Added Matthew's Correlation Coefficient to metrics (#10524)
  • Added support for add_n(dense, csr, dense) = dense on CPU & GPU (#11330)
  • Added support for add_n(any combination longer than 4 with at least one dense storage) = dense on CPU & GPU (#11330)
  • L1 Normalization (#11229)
  • Add support for int64 data type in CSVIter (#11446)
  • Add test for new int64 type in CSVIter (#11499)
  • Add sample ratio for ROI Align (#11145)
  • Shape and Size Operator (#10889)
  • Add HybidSequentialRNNCell, which can be nested in HybridBlock (#11003)
  • Support for a bunch of unary functions for csr matrices (#11559)
  • Added NDArrayCollector to dispose intermediate allocated NDArrays automatically (#11751)
  • Added the diag() operator (#11643)
  • Added broadcast_like operator (#11820)
  • Allow Partial shape infer for Slice (#11406)
  • Added support to profile kvstore server during distributed training (#11215)
  • Add function for GPU Memory Query to C API (#12083)
  • Generalized reshape_like operator to be more flexible (#11928)
  • Add support for selu activation function (#12059)
  • Add support for accepting NDArray as input to Module predict API (#12166)
  • Add DataDesc type for the Scala Package (#11844)

Usability Improvements

  • Added NDArray auto-collector for Scala (#11751, #12232)
  • Added docs for mx.initializer.Constant (#10637)
  • Added build from souce instructions on windows (#11276)
  • Added a tutorial explaining how to use the profiler (#11274)
  • Added two tutorials on Learning Rate Schedules (#11296)
  • Added a tutorial for mixed precision training with float16 (#10391)
  • Create CPP test for concat MKLDNN operator (#11371)
  • Update large word language model example (#11405)
  • MNIST Examples for Scala new API (#11250)
  • Updated installation info to have latest packages and more clarity (#11503)
  • GAN MNIST Examples for Scala new API (#11547)
  • Added Learning Rate Finder tutorial (#11304)
  • Fix Installation instructions for R bindings on Linux systems. (#11590)
  • Integration Test for Scala (#11596)
  • Documentation enhancement for optimizers (#11657)
  • Update rcnn example (#11373)
  • Gluon ModelZoo, Gluon examples for Perl APIs (#11642)
  • Fix R installation in CI (#11761, #11755, #11768, #11805, #11954, #11976)
  • CNN Examples for Scala new API (#11292)
  • Custom Operator Example for Scala (#11401)
  • Added detailed doc about global pool layers in Gluon (#11832)
  • Updated MultiTask example to use new infer api (#11605)
  • Added logistic regression tutorial (#11651)
  • Added Support for integer type in ImageIter (#11864)
  • Added depth_to_space and space_to_depth operators (#11587)
  • Increased operator support for ONNX to MXNet importer (#11856)
  • Add linux and macos MKLDNN Building Instruction (#11049)
  • Add download utility for Scala APIs (#11866)
  • Improving documentation and error messages for Async distributed training with Gluon (#11910)
  • Added NeuralStyle Example for Scala (#11621)

Known Issues

  • Armv7 docker builds are broken due to problem with dockcross