Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SINGA-487 Parallelize Comp. and Comm. using CUDA stream concurrency #558

Merged
merged 3 commits into from Nov 6, 2019

Conversation

chrishkchris
Copy link
Contributor

This PR deals with the Parallelization of Computation and Communication using CUDA stream concurrency, which can reduce the communication overhead.

Together with the result of PR #555, here is a simple test to make sure the training is correct:

ubuntu@ip-172-31-29-119:~/singa/examples/autograd$ /home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 mnist_dist.py
Starting Epoch 0:
Training loss = 931.969849, training accuracy = 0.675197
Evaluation accuracy = 0.913137, Elapsed Time = 0.733470s
Starting Epoch 1:
Training loss = 280.136505, training accuracy = 0.910273
Evaluation accuracy = 0.954975, Elapsed Time = 0.642032s
Starting Epoch 2:
Training loss = 188.183517, training accuracy = 0.939837
Evaluation accuracy = 0.967619, Elapsed Time = 0.650523s
Starting Epoch 3:
Training loss = 147.724915, training accuracy = 0.952941
Evaluation accuracy = 0.971012, Elapsed Time = 0.639127s
Starting Epoch 4:
Training loss = 125.514275, training accuracy = 0.959402
Evaluation accuracy = 0.974404, Elapsed Time = 0.637774s
Starting Epoch 5:
Training loss = 113.583031, training accuracy = 0.963174
Evaluation accuracy = 0.974918, Elapsed Time = 0.638678s
Starting Epoch 6:
Training loss = 105.422485, training accuracy = 0.965895
Evaluation accuracy = 0.979852, Elapsed Time = 0.637032s
Starting Epoch 7:
Training loss = 94.718765, training accuracy = 0.968850
Evaluation accuracy = 0.976871, Elapsed Time = 0.638873s
Starting Epoch 8:
Training loss = 87.026405, training accuracy = 0.971421
Evaluation accuracy = 0.976768, Elapsed Time = 0.637387s
Starting Epoch 9:
Training loss = 79.878670, training accuracy = 0.973708
Evaluation accuracy = 0.981805, Elapsed Time = 0.639177s

@chrishkchris chrishkchris changed the title SINGA-487 Parallelize Computation and Communication using CUDA stream concurrency SINGA-487 Parallelize Comp. and Comm. using CUDA stream concurrency Nov 5, 2019
@nudles nudles merged commit 9c8995e into apache:master Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants