-
Notifications
You must be signed in to change notification settings - Fork 2
/
info.json
20 lines (20 loc) · 1.55 KB
/
info.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"abstract": "Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. While a simple version of DeTAG with plain SGD and constant step size suffice for achieving theoretical limits, we additionally provide convergence bound for DeTAG under general non-increasing step size and momentum. Empirically, we compare DeTAG with other decentralized algorithms on multiple vision benchmarks, including CIFAR10/100 and ImageNet. We substantiate our theory and show DeTAG converges faster on unshuffled data and in sparse networks. Furthermore, we study a DeTAG variant, DeTAG*, that practically speeds up data-center-scale model training. This manuscript provides extended contents to its ICML version.",
"authors": [
"Yucheng Lu",
"Christopher De Sa"
],
"emails": [
"yl2967@cornell.edu",
"cdesa@cs.cornell.edu"
],
"id": "22-0044",
"issue": 93,
"pages": [
1,
62
],
"title": "Decentralized Learning: Theoretical Optimality and Practical Improvements",
"volume": 24,
"year": 2023
}