v0.12.0
Release log
Major Improvements
Reader Prototype. Data can be read through C++ reader asynchronously with potentially higher performance.
ParallelExecutor. Significantly improve the multi-gpu performance over the previous solution.
Distributed Training. Major performance improvements and stability improvements.
Inplace Activation. Significantly reduce the GPU memory requirements and increase the batch size.
Operator Optimizations. Performance improvements of many operators.
Timeline Profiling. Allow to visualize performance as time series.
Major Bug Fixes
Calling cublas/cudnn library with wrong argument types.
Evaluated Models
Image Classification
Object Detection
OCR
Machine Translation
Text Classification
Language Model
Sequence Tagging