Skip to content

Latest commit

 

History

History
46 lines (39 loc) · 6.07 KB

comparison.rst

File metadata and controls

46 lines (39 loc) · 6.07 KB

Comparison with Other Frameworks

A table for quick comparison

This table compares Chainer with other actively developed deep learning frameworks. Content is current as of July 2017.

Chainer PyTorch TensorFlow Theano-based Caffe1/Caffe2 Torch7 MXNet DyNet PaddlePaddle DL4J CNTK neon Knet.jl Darknet Thinc
Basics Language Python Python Python Python Python/C++/ MATLAB LuaJIT Python/others Python/C++ Python/C++ Java BrainScript/ Python/C++ Python Julia C Python
Approach define-by-run define-by-run symbolic autograd symbolic autograd static static/ manual grads symbolic autograd/ manual grads/ define-by-run1 define-by-run symbolic autograd static/ manual grads/ symbolic autograd2 static/ symbolic autograd static/ symbolic autograd3 define-by-run static callback-based define-by-run
CPU backend package NumPy TH Eigen NumPy TH mshadow Eigen ND4J NumPy Julia NumPy
GPU backend package CuPy THC Eigen libgpuarray THC mshadow Eigen ND4J neon KnetArrays CuPy
Primary sponsor Preferred Networks Facebook Google MILA Facebook Facebook Amazon/Apache CMU Baidu Skymind Microsoft Intel Nervana Koç University Joe Redmon Explosion AI
NNs CNNs full full full full full full full partial full full full full partial full none
RNNs full full full full partial full full full full full full partial partial partial partial
Reverse-mode autograd Y Y Y Y torch-autograd Y Y Y Y ngraph Y with closures
Forward-mode autograd tensorflow-forward-ad Y
Higher-order grads Y Y Y Y
Variable-length loops native native while_loop scan RNNs only native 2017 native RNNs only none dynamic axis none native none native
Different architectures per batch native native fold torch-autograd MinPy native native native
Performance cuDNN support full full partial partial full full full partial full partial full N/A4 partial
CPU/GPU generic backend Y Y Y Y Y Y Y Y Y Y Y
Multi-GPU data parallelism Y Y Y Y Y Y Y Y Y Y Y Y Y
Multi-GPU model parallelism Y Y Y Y Y Y Y Y Y Y
Multiprocessing5 full partial full
Distributed training ChainerMN THD Y 2017 torch-distlearn Y Y Spark Y Y
Misc Runtime debugging debug mode, typechecking, pdb pdb tfdbg Monitor pdb Java debuggers cntk.debugging Gallium.jl gdb pdb
Trainer abstraction native tnt Blocks, Lasagne, Keras native torchnet native native native native native
Reporter abstraction native tnt native torchnet native native native
Web interface TensorBoard DL4J-UI Nervana Cloud
Graph compilation engine 2017 XLA 2017 NNVM ngraph

Benchmarks

Benchmarks for convolutional networks can be found at convnet-benchmarks while some NLP benchmarks are at dynet-benchmark. Chainer wraps the latest available cuDNN kernels for CNNs and RNNs, so performance of most common networks that use these kernels is typically similar to that of other modern frameworks. As Chainer's define-by-run approach means the user's Python code is executed directly at runtime, particularly complex networks or those with very small tensor sizes may be slower than in static-graph frameworks.


  1. Define-by-run is in development as of June 2017 and tracked in dmlc/mxnet#5705. It is also possible using the much slower MinPy extension.

  2. Symbolic autograd is in development as of June 2017 and tracked in deeplearning4j/nd4j#1750.

  3. Symbolic autograd is available only with ngraph backend (experimental).

  4. Nervana provides kernels that are meant to compete with cuDNN.

  5. Multiprocessing provides a significant performance improvement only for frameworks that use Python at runtime.