Simplify distributed training #864

futurely · 2019-09-19T09:39:20Z

🚀 Feature

Simplify distributed training so that users do not have to manually setup graph store, sampler and kvstore which is inefficient for development and error prone.

Motivation

The only difference between distributed and non-distributed training in PyTorch-BigGraph is adding an command line argument "--rank rank" and a few more configs.

The training script automatically handle the two situations.

Euler requires knowing the hosts in the cluster. But the training script is also very concise to run distributed training.

Pitch

The framework transparently starts distributed training so that users can use it with minimal effort and without worrying about the underlying details.

zheng-da · 2019-09-26T17:13:19Z

I totally agree with you that the training script should set up everything for distributed training. It's also in our plan.

github-actions · 2022-05-14T01:36:28Z

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions · 2022-05-22T01:33:06Z

This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.

futurely mentioned this issue Oct 15, 2019

[Roadmap] v0.5 release plan #930

Closed

32 tasks

github-actions bot added the stale-issue label May 14, 2022

github-actions bot closed this as completed May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify distributed training #864

Simplify distributed training #864

futurely commented Sep 19, 2019

zheng-da commented Sep 26, 2019

github-actions bot commented May 14, 2022

github-actions bot commented May 22, 2022

Simplify distributed training #864

Simplify distributed training #864

Comments

futurely commented Sep 19, 2019

🚀 Feature

Motivation

Pitch

zheng-da commented Sep 26, 2019

github-actions bot commented May 14, 2022

github-actions bot commented May 22, 2022