@kuenishi kuenishi released this Dec 22, 2017 · 243 commits to master since this release

Assets 2

ChainerMN 1.1.0 release notes

ChainerMN is a multi node extension of a deep learning framework Chainer, add scalability over 1000 GPUs. 1.1.0 release is a minor update that adds several enhancements and bug fixes to 1.0, and supports latest Chainer release.

New experimental features include multi-node checkpointing and resuming. It also has several enhancements on DataSet distribution, supporting dynamically changing networks, It adds support to latest Chainer 3.2.0 and drops support on older Chainer versions such as 1.x and 2.x series. Also, pure_nccl communicator is now generally available and most recommended communicator.

bugfix

  • Fix array length bug of PureNcclCommunicator (#127)
  • fix setup.py (#119)

enhancement

  • Support a wider range of dynamically initialized models for MultiNodeOptimizer (#148)
  • Remove outdated cudnn variable to make compatible with CuPy v4 (#147, thanks @tkerola!)
  • Avoid sending SubDataset and use broadcast for datasets (#140)
  • Support tuple data communication (#139)
  • Chainer v3 support (#123)

feature

  • pure_nccl communicator is now generally available (#165)
  • Add simple and distributed checkpointing and automatic recovery (#144)
  • Support all-to-all (#135)

document

  • Update supported Chainer version in the document (#162)

installation

  • Update docs and add cupy as requirement (#171)

example

  • model-parallel seq2seq example (#122)
  • Dual parallel example (#121)

test

  • Fix a bug of point to point with GPU (#174)
  • Pass unit tests more than 3 processes (#172)
  • Refactor test directory structure to align Chainer's test dir (#169)
  • Move from nose to pytest (#167)
  • Refactor tests directory (#155)
  • Reduce the number of procs of MPI test for robust CI (#136)
  • Add Chainer v3 Test to Travis CI (#141)

other

  • Add chainer.utils.experimental to create_multi_node_n_step_rnn (#153)
  • Add chainer.utils.experimental to distributed_cpr (#152)
  • Update README (cache information for seq2seq) (#126)