SINGA-487 Fusing gradients to reduce network latency #560

chrishkchris · 2019-11-12T08:37:08Z

This PR reduces the network latency by fusing gradients in the same memory buffer before sending out with NCCL.
This can reduce much of the TCP/IP latency by reducing the number of NCCL API call.

Together with the result of PR #555, here is a simple test to make sure the training is correct:

ubuntu@ip-172-31-26-214:~/singa/examples/autograd$ python3 mnist_multiprocess.py
Starting Epoch 0:
Training loss = 831.072205, training accuracy = 0.700454
Evaluation accuracy = 0.927015, Elapsed Time = 0.676089s
Starting Epoch 1:
Training loss = 248.684601, training accuracy = 0.916183
Evaluation accuracy = 0.958265, Elapsed Time = 0.545179s
Starting Epoch 2:
Training loss = 172.330597, training accuracy = 0.943042
Evaluation accuracy = 0.967928, Elapsed Time = 0.543617s
Starting Epoch 3:
Training loss = 139.254807, training accuracy = 0.953425
Evaluation accuracy = 0.973067, Elapsed Time = 0.530805s
Starting Epoch 4:
Training loss = 115.329491, training accuracy = 0.960737
Evaluation accuracy = 0.976049, Elapsed Time = 0.530590s
Starting Epoch 5:
Training loss = 101.911728, training accuracy = 0.966179
Evaluation accuracy = 0.974095, Elapsed Time = 0.529574s
Starting Epoch 6:
Training loss = 90.820244, training accuracy = 0.969969
Evaluation accuracy = 0.980983, Elapsed Time = 0.530502s
Starting Epoch 7:
Training loss = 86.718071, training accuracy = 0.971037
Evaluation accuracy = 0.977590, Elapsed Time = 0.531085s
Starting Epoch 8:
Training loss = 79.507553, training accuracy = 0.973675
Evaluation accuracy = 0.976562, Elapsed Time = 0.529935s
Starting Epoch 9:
Training loss = 78.784409, training accuracy = 0.974025
Evaluation accuracy = 0.980469, Elapsed Time = 0.530919s

chrishkchris and others added 2 commits November 12, 2019 07:52

SINGA-487 Accumulate gradients to reduce network latency

6dd38da

SINGA-487 Accumulate gradients to reduce network latency

c23e03e

chrishkchris changed the title ~~SINGA-487 Accumulate gradients to reduce network latency~~ SINGA-487 Fusing gradients to reduce network latency Nov 12, 2019

nudles merged commit 58e346e into apache:master Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SINGA-487 Fusing gradients to reduce network latency #560

SINGA-487 Fusing gradients to reduce network latency #560

chrishkchris commented Nov 12, 2019 •

edited

SINGA-487 Fusing gradients to reduce network latency #560

SINGA-487 Fusing gradients to reduce network latency #560

Conversation

chrishkchris commented Nov 12, 2019 • edited

chrishkchris commented Nov 12, 2019 •

edited