OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training

The implementation of OD-SGD is based on the popular deep learning framework MXNet, and I just modify some source code files to adjust the execution orders of the operations. Therefore, the file list is almost the same with MXNet. OD-SGD is proposed to imporve the distributed deep learning training performance via increasing the overlap ratio of computation and communication process.

Features

OD-SGD can be applied to both parameter-server based framework (MXNet, TensorFlow) and end-to-end frameworks (PyTorch, Caffe).
For parameter-server based platforms, the global update in the server node makes use of Syhchronous SGD algorithm, while the local update in local workers can adopt differnt algoritms, while the compensation algorithm like DC-ASGD will be a better choice to ensure goode convergence accuracy.
When training with OD-SGD, you need to speicify the local optimizer, local learning rate, steps to change the local learning rate through "--local-optimizer", "--locla-lr" and "--local-lr-steps".

Ask Questions

Please send emails to xuyemaovip@nudt.edu.cn for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
R-package		R-package
amalgamation		amalgamation
benchmark/python/sparse		benchmark/python/sparse
cmake-build-debug		cmake-build-debug
cmake		cmake
cpp-package		cpp-package
cub		cub
deps		deps
dlpack		dlpack
dmlc-core		dmlc-core
docker		docker
docker_multiarch		docker_multiarch
docs		docs
example		example
include/mxnet		include/mxnet
make		make
matlab		matlab
mshadow		mshadow
nnvm		nnvm
perl-package		perl-package
plugin		plugin
ps-lite		ps-lite
python		python
scala-package		scala-package
setup-utils		setup-utils
src		src
tests		tests
tools		tools
.DS_Store		.DS_Store
CMakeLists.txt		CMakeLists.txt
CODEOWNERS		CODEOWNERS
CONTRIBUTORS.md		CONTRIBUTORS.md
DISCLAIMER		DISCLAIMER
Jenkinsfile		Jenkinsfile
KEYS		KEYS
LICENSE		LICENSE
MKL_README.md		MKL_README.md
Makefile		Makefile
NEWS.md		NEWS.md
NOTICE		NOTICE
README.md		README.md
appveyor.yml		appveyor.yml
config.mk		config.mk
prepare_mkl.sh		prepare_mkl.sh
readthedocs.yml		readthedocs.yml
snap.python		snap.python
snapcraft.yaml		snapcraft.yaml

License

CynthiaProtector/OD-SGD

Folders and files

Latest commit

History

Repository files navigation

OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training

Features

Ask Questions

About

Resources

License

Stars

Watchers

Forks

Languages