MXNet 0.11.0

@nswamy nswamy released this
· 6061 commits to master since this release
Major Features

API Changes

  • Added CachedOp. You can now cache the operators that’s called frequently with the same set of arguments to reduce overhead.
  • Added sample_multinomial for sampling from multinomial distributions.
  • Added trunc operator for rounding towards zero.
  • Added linalg_gemm, linalg_potrf, ... operators for lapack support.
  • Added verbose option to Initializer for printing out initialization details.
  • Added DeformableConvolution to contrib from the Deformable Convolutional Networks paper.
  • Added float64 support for dot and batch_dot operator.
  • allow_extra is added to Module.set_params to ignore extra parameters.
  • Added mod operator for modulo.
  • Added multi_precision option to SGD optimizer to improve training with float16. Resnet50 now achieves the same accuracy when trained with float16 and gives 50% speedup on Titan XP.

Performance Improvements

  • ImageRecordIter now stores data in pinned memory to improve GPU memcopy speed.


  • Fixed a bug in Adam that causes weight decay to be handled incorrectly. If you are using Adam, you may need to tune learning rate a little to get the same performance as previous versions.
  • Remove WaitToRead in dist-kvstore: Improves performance 20-30% for distributed training.
  • Cython interface is fixed. make cython and python install --with-cython should install the cython interface and reduce overhead in applications that use imperative/bucketing.
  • Fixed various bugs in Faster-RCNN example: #6486
  • Fixed various bugs in SSD example.
  • Fixed out argument not working for zeros, ones, full, etc.
  • expand_dims now supports backward shape inference.
  • Fixed a bug in rnn. BucketingSentenceIter that causes incorrect layout handling on multi-GPU.
  • Fixed context mismatch when loading optimizer states.
  • Fixed a bug in ReLU activation when using MKL.
  • Fixed a few race conditions that causes crashes on shutdown.
  • Fixed image-classification example code.


  • Refactored TShape/TBlob to use int64 dimensions and DLTensor as internal storage. Getting ready for migration to DLPack. As a result TBlob::dev_mask_ and TBlob::stride_ are removed.

Known Issues

  • Inception-V3 model can be converted into CoreML format but is unable to run on Xcode.