Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update en doc for Distributed Training #9130

Merged

Conversation

putcn
Copy link
Contributor

@putcn putcn commented Mar 15, 2018

fix: #8911

abhinavarora
abhinavarora previously approved these changes Mar 15, 2018
Copy link
Contributor

@abhinavarora abhinavarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a minor typo.

.. image:: src/ps_en.png
:width: 500

- Data shard: training data will be split into multiple partitions, trainers use the partitions of the whole dataset to do the training job.
- Trainer: each trainer reads the data shard, and train the neural network. Then the trainer will upload calculated "gradients" to parameter servers, and wait for parameters to be optimized on the parameter server side. When that finishes, the trainer download optimized parameters and continues its training.
- Parameter server: every parameter server stores part of the whole neural network model data. They will do optimization calculations when gradients are uploaded from trainers, and then send updated parameters to trainers.

PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and asynchronous SGD.
The training of synchronous random gradient descent for neural network can be archieved by cooperation of trainers and parameter servers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

achieved

@putcn
Copy link
Contributor Author

putcn commented Mar 15, 2018

thanks @abhinavarora, typo fixed.

Copy link
Contributor

@abhinavarora abhinavarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abhinavarora abhinavarora merged commit e382e42 into PaddlePaddle:develop Mar 15, 2018
@putcn putcn deleted the translate-distributed-training branch April 25, 2018 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translation Plan-分布式训练-汉译英-概述
2 participants