01 adam optimizer by EugeneLYC · Pull Request #1790 · deepspeedai/DeepSpeed

EugeneLYC · 2022-02-24T11:53:44Z

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

Author: @EugeneLYC, @conglongli, @minjiaz, Christopher De Sa, Yuxiong He
Paper: https://arxiv.org/abs/2202.06009

conglongli

Thanks Yucheng for the great work! I left a few comments that should be easy to fix. I think this PR should be ready to merge after the fix and adding the doc/tutorial.

deepspeed/runtime/engine.py

deepspeed/runtime/fp16/onebit/zoadam.py

conglongli

Thanks Yucheng for applying the requested changes and adding the doc/tutorial. I reviewed and left a few minor comments.

docs/_tutorials/zero-one-adam.md

deepspeed/runtime/fp16/onebit/zoadam.py

conglongli

LGTM, thank you Yucheng!

This reverts commit 3750c2c.

EugeneLYC added 2 commits February 22, 2022 09:04

add 0/1 Adam implementation

fcfe263

add pytest for 0/1 Adam and fix loading issue

8885646

EugeneLYC requested review from RezaYazdaniAminabadi, ShadenSmith, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, samyam and tjruwase as code owners February 24, 2022 11:53

conglongli and others added 4 commits February 24, 2022 20:33

Merge branch 'master' into 01Adam

4e7e9bb

fix formatting issue and a type

c17741b

Merge branch 'master' into 01Adam

cd69b55

Merge branch 'master' into 01Adam

adb7151

conglongli suggested changes Mar 5, 2022

View reviewed changes

deepspeed/runtime/engine.py Outdated Show resolved Hide resolved

deepspeed/runtime/fp16/onebit/zoadam.py Show resolved Hide resolved

deepspeed/runtime/fp16/onebit/zoadam.py Outdated Show resolved Hide resolved

conglongli and others added 3 commits March 4, 2022 22:48

Merge branch 'master' into 01Adam

6c9bdc1

Merge branch 'master' into 01Adam

6ddaf2e

fix mask/warning issues and add 0/1 Adam docs/tutorial

6c1946f

conglongli suggested changes Mar 8, 2022

View reviewed changes

docs/_tutorials/zero-one-adam.md Outdated Show resolved Hide resolved

docs/_tutorials/zero-one-adam.md Show resolved Hide resolved

deepspeed/runtime/fp16/onebit/zoadam.py Show resolved Hide resolved

EugeneLYC and others added 4 commits March 8, 2022 11:09

fix formatting and add more details to 0/1 Adam tutorial

99d35a2

Merge branch 'master' into 01Adam

db6f5b8

Merge branch 'master' into 01Adam

75d09d9

mark new tests as forced sequential

dfd6cf6

conglongli approved these changes Mar 8, 2022

View reviewed changes

jeffra and others added 4 commits March 8, 2022 14:59

disable new tests (testing hang issue in CI)

3750c2c

Revert "disable new tests (testing hang issue in CI)"

f90a5d7

This reverts commit 3750c2c.

fix naive all reduce hanging issue (still need testing)

727a043

remove the scaling after naive allreduce

419486e

conglongli and others added 6 commits March 9, 2022 15:24

mention 0/1 Adam in 1-bit Adam tutorial

7190a99

Merge branch 'master' into 01Adam

900cdf2

remove pytest.mark.sequential

a4a86f8

Merge branch '01Adam' of github.com:EugeneLYC/DeepSpeed-1 into 01Adam

ac08717

fix the comm volume saving number with FP16

a1e1615

Using GPT results (FP16) to update comm volume saving number

93680e7

conglongli enabled auto-merge (squash) March 10, 2022 04:48

fail fast during tests

a2f273e

jeffra disabled auto-merge March 11, 2022 05:28

jeffra merged commit b80e562 into deepspeedai:master Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01 adam optimizer#1790

01 adam optimizer#1790
jeffra merged 24 commits intodeepspeedai:masterfrom
EugeneLYC:01Adam

EugeneLYC commented Feb 24, 2022

Uh oh!

conglongli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

conglongli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

conglongli left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EugeneLYC commented Feb 24, 2022

Uh oh!

conglongli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

conglongli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

conglongli left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants