Another thing to merge. (MY EYES HURT) #1

bobisapotato · 2021-01-24T02:37:07Z

Commits on Jul 21, 2020
only global rank 0 can log tensorboard data; avoid multi gpu/node rac… …
1f97242
Commits on Jul 22, 2020
Update setup.py (microsoft#298)
871f7e6
Avoid deadlock for unsynchronized non-zero checkpointing (microsoft#297) …
3cc96e1
Commits on Jul 23, 2020
updates to amp to support grad clip and grad accumulation (microsoft#290 …
eb74c3f
pass steps_per_print to tput timer (microsoft#299)
ec94341
Commits on Jul 24, 2020
bump DSExamples (microsoft#300)
0f94f7e
Commits on Jul 25, 2020
DeepSpeed webinar announcement (microsoft#301)
7ae8f8b
Commits on Jul 27, 2020
Update README.md (microsoft#302)
67821f9
Commits on Jul 28, 2020
Fixing a typo (microsoft#303)
97c5427
Fix nv_peer_mem version (microsoft#304) …
e50b883
Commits on Aug 01, 2020
NameError: name 'mpu' is not defined (microsoft#305) …
9d07d75
Commits on Aug 07, 2020
Removing () from assertion. (microsoft#307) …
c35e944
Add webinar link (microsoft#309) …
29c5fe2
Commits on Aug 08, 2020
updates website gems after kramdown alert (microsoft#311)
903a41a
Commits on Aug 10, 2020
Fix+tests for get_lr from lr_scheduler before training starts (micros… …
cd68e6e
Commits on Aug 12, 2020
bumping DSE commit for pillow security fix (microsoft#312)
892ece6
Update deepspeed_lr_schedules.py (microsoft#314)
3437342
Commits on Aug 13, 2020
Update fan out flag for pdsh (microsoft#315) …
6855ba1
attach empty grad to its param to ensure it's copied after reduction (m… …
e1bea67
Commits on Aug 14, 2020
bump DSE (microsoft#317)
de0523d
Commits on Aug 18, 2020
Turn off multi-node launch if only 1 node (microsoft#322) …
e69b1ee
Commits on Aug 27, 2020
Add code owners for DeepSpeed team (microsoft#335) …
21d5f63
Commits on Aug 28, 2020
bump DSE
6823db3
Commits on Aug 31, 2020
Update deepspeed_checkpointing.py (microsoft#336) …
458c0d9
Samyamr/grad acc stage2 (microsoft#338) …
7240abf
Rename ds_config_func_bs8_zero2_gas10.json to ds_config_func_bs8_zero… …
7a356b2
Rename ds_config_func_bs8_zero0_gas10.json to ds_config_func_bs8_zero… …
6122a74
Update run_func_test.py …
f4726b7
Update .gitignore
e8dd47d
Commits on Sep 01, 2020
Switches BBS example to use mbsize=3 and gas=2 to fit in 16GB of memo… …
838f53b
Sparse attn + ops/runtime refactor + v0.3.0 (microsoft#343) …
e5bbc2e
Update Dockerfile
8716540
Update Dockerfile …
5518aae
Commits on Sep 02, 2020
update DSE and rename SA tests
1661e83
Commits on Sep 03, 2020
Update test_sparse_attention.py
1ebcd6c
Adding link to Sparse Attention in Navigation page (microsoft#355) …
6deac82
Commits on Sep 04, 2020
Jekyll installation instructions (microsoft#351)
ac12833
Commits on Sep 05, 2020
fixed a typo; this was fixed before but seems like it has been lost i… …
a64b0ab
Move code quality tests to Azure-hosted agents. (microsoft#368)
4d4eafb
Commits on Sep 06, 2020
Update installation instructions (microsoft#362)
9e83ef2
Update Sparse Attention Tutorial (microsoft#357) …
9dadf38
Commits on Sep 08, 2020
adding sparse attention to feature index page (microsoft#377)
b73894d
Commits on Sep 09, 2020
temp disable model tests
234bba0
Add 1-bit Adam support to DeepSpeed (microsoft#380) …
01726ce
fixing a link issue with SA tutorial (microsoft#387) …
161e8e6
Update test triggers to exclude docs
79093d7
ZeRO-Offload release (microsoft#391) …
41db1c2
Commits on Sep 10, 2020
Pipeline parallel training engine. (microsoft#392) …
65c2f97
Update documentation for 1-bit Adam (microsoft#388) …
093f09f
Fix datatype issue with sparse attention softmax (microsoft#363) …
dca0b78
Add openmpi to dockerfile
c0d5424
ZeRO tutorials (microsoft#384) …
2dea61f
fix for 16GB v100 nodes (microsoft#393)
b1d4bd7
Sparse attention: updating code tag in documentation (microsoft#394) …
be4b94b
Minjiaz/zero offload (microsoft#382) …
59ce90d
Adding sparse attention news index item (microsoft#376) …
c76769c
Landing page updates (microsoft#395) …
a8a8b3d
Update README.md
7baf3c3
Website edits (microsoft#398) …
6bb5c69
update docker image and bump DSE
b29229b
only add 1bit adam reqs if mpi is installed, update cond build for cp… …
240ea97
bump DSE and doc tweak
4b1df25
Update README.md
9693595
Update _config.yml
ea92ed2
Update news site with press release link
5dc4d6c
Update ZeRO-Offload blog post link (microsoft#401) …
d15015e
remove old pt file
15ca99c
readthedocs upgrade (microsoft#402)
c82756c
Commits on Sep 11, 2020
supporting different intermediate sizes other than 4 * hidden_dim (mi… …
e549be6
Revert "supporting different intermediate sizes other than 4 * hidden… …
4ac9bf6
Commits on Sep 13, 2020
scales throughput by logging freq (microsoft#408)
473ff98
Commits on Sep 15, 2020
pytest skips for tests requiring certain ops (microsoft#411) …
91b4a93
fix bug related to stitching reduced grads across communication parti… …
55ed105
add cpu-adam, reformat, add colors (microsoft#413)
a9e8325
Commits on Sep 16, 2020
Add Linear warmup+decay lr schedule (microsoft#414) …
0e942df
Minor doc fixes (microsoft#417) …
7d91be9
Overflow fix (microsoft#416) …
f5cce75
Fix a typo in comments (microsoft#415) …
4fef478
readthedocs yaml configuration (microsoft#410) …
5812e84
Commits on Sep 17, 2020
Fix few typos in the docs (microsoft#418)
c66f388
Remove pip --use-feature (microsoft#419)
5bc7d4e
Commits on Sep 18, 2020
Activation checkpointing bugfix and unit tests (microsoft#420) …
01b6e27
Revert "Activation checkpointing bugfix and unit tests (microsoft#420)… …
a74a604
Fix activation checkpoint unit tests for GPU systems (microsoft#421)
a825f99
Commits on Sep 21, 2020
Add configurable intermediate size to transformer kernels (microsoft#423 …
a148bd3
DSE bump (microsoft#427)
71f7df3
support dynamic sequence length in transformer kernels (microsoft#424) …
f0f2a70
Commits on Sep 24, 2020
Fix urls in tutorial (microsoft#436) …
5d40f00
Update azure.md (microsoft#437)
192cf7c
Update pipeline.md (microsoft#439)
0ca8215
Commits on Sep 25, 2020
link fix part two :-) (microsoft#441)
6d176c4
unit test rename (microsoft#442)
5412a33
Commits on Sep 28, 2020
fix typos (microsoft#446)
6f28ea3
Commits on Sep 29, 2020
Disable default installation of CPU Adam (microsoft#450) …
7b8be2a
Commits on Oct 01, 2020
Use parentesis around min and max to enable Windows build (microsoft#449 …
9557557
Commits on Oct 05, 2020
Update engine.py (microsoft#458) …
6717638
Commits on Oct 06, 2020
temporarily disable lr unit tests
11cf47e
turning off different tests (temp)
679fc13
Commits on Oct 07, 2020
gan tutorial (microsoft#462) …
2efea69
Fix printing momentum for non-deepspeed optimizer (microsoft#464) …
c39a76f
Commits on Oct 10, 2020
Add DeepSpeed_Adam optimizer (microsoft#468) …
23fc48f
Commits on Oct 12, 2020
fixing typo (microsoft#460)
e25f2a2
add compute cap of 6.0 to transformer kernels …
b8eb40e
revert previous (accidental) change
1afca8f
Commits on Oct 14, 2020
Add support for p100 in transformer kernels (microsoft#470) …
7ddfda8
Commits on Oct 19, 2020
updating website dependencies (microsoft#475)
d720fdb
Commits on Oct 30, 2020
Add CPUAdam optimizer for zero-offload in deepspeed engine (microsoft… …
f5aa254
fixing the AVX_256 compatibility (microsoft#497)
4c37d70
Commits on Nov 05, 2020
Fixing CPU-Adam convergence issue (microsoft#503) …
7d4d742
Commits on Nov 09, 2020
PLD documentation (microsoft#514) …
e351090
Fix PLD news url (microsoft#515) …
41fb24b
Commits on Nov 10, 2020
updating pld docs (microsoft#517)
e082d47
PLD release (microsoft#513) …
be1147c
Commits on Nov 11, 2020
fix bug on non-DLTS infra when no output path set (microsoft#523)
eea1c28
Update zero.md tutorial (microsoft#495) …
0ad4fd8
Commits on Nov 12, 2020
DeepSpeed JIT op + PyPI support (microsoft#496) …
31f46fe
ds_report bug fix on cpu and guard torch import in setup.py (microsof… …
ca9ab12
Installation documentation updates. (microsoft#525) …
d779bd5
Commits on Nov 13, 2020
Dependency pruning (microsoft#528) …
0dc8420
bump version
9941ce7
Commits on Nov 17, 2020
Fix layout bug in ZeRO Stage 1 checkpoint logic (microsoft#531) …
7752dc5
Commits on Nov 18, 2020
append job-name if explicit output dir is given (microsoft#539)
5b09be6
more fine-grained manifest file for includes/excludes (microsoft#540)
fdd81c3
Commits on Nov 19, 2020
ZeRO-1 tune max-elems + bug fix (microsoft#532) …
08c96a1
bump to v0.3.3
9de21b7
backwards compatability w. v020 ckpts, fix issue with zero-1 ckpts (m… …
dce054d
Fix setup.py for cpu-only environment installation (microsoft#538) …
d81cb26
Discover variables for NCCL backend on AML without mpi4py (microsoft#542 …
1b45917
bump version 0.3.4
6b28bc5
Commits on Nov 20, 2020
Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (microso… …
0178e6c
Commits on Nov 21, 2020
Support non-tensor state in checkpoint (microsoft#548)
6021b70
Commits on Nov 22, 2020
Adding static_loss_scale to unfused optimizer (microsoft#546)
bcd56f9
Commits on Nov 23, 2020
Bug fix for norm calculation in absence of model parallel group (micr… …
00c3a25
bump to 0.3.5
16313a9
Commits on Nov 24, 2020
Create main.yml
c18fb0d
Switch to CI to GitHub Actions (microsoft#556)
3347460
Update badges and CI name (microsoft#557)
1ef5cd2
Deprecate client ability to disable gradient reduction (microsoft#552) …
6e65c2c
Simplify dist init and only init if needed. (microsoft#553) …
0e831e2
Turn back on PP tests (microsoft#558)
eec44af
Commits on Nov 25, 2020
Adds long_description to setup.py (microsoft#560)
6009713
bump to 0.3.6 and fix manifest to include reqs (microsoft#561)
73c3262
update manifest
e4e2066
bump to 0.3.7
c51fa65
Commits on Nov 27, 2020
[doc] typo fix and clarification (microsoft#563) …
17f36f1
Commits on Dec 01, 2020
supporting different hidden dimensions (microsoft#559) …
c78c29f
tracking optimizer step in cpu-adam when loading checkpoint (microsof… …
9f52a36
Commits on Dec 02, 2020
[cifar tutorial] improve readability (microsoft#567) …
7a75f8b
Add 'latest' checkpoint save/load support (microsoft#569)
845921b
[engine] train should be able to get mode arg (microsoft#571)
2d1f7c0
Add compute capability 8.0 if on cuda 11+ (microsoft#572)
be33bea
[build] build against installed cuda-11.1 while torch built w/ cuda-1… …
ff58fa7
Commits on Dec 04, 2020
Fix potential random layout inconsistency issues in sparse attention … …
1e44d48
Commits on Dec 07, 2020
[build] make builder smarter and configurable wrt compute capabilitie… …
ce363d0
[build] add compute_86 (microsoft#577) …
e8b126d
Commits on Dec 08, 2020
Pipeline warnings and checkpoint portability (microsoft#588) …
2f62697
Commits on Dec 09, 2020
Pin triton to 0.2.3 for now, 0.3.0 is broken
d901a6d
bump to 0.3.8
cb7c7da
Add papers/videos to readme/website (microsoft#592)
19acd6c
Add AML video link
7300f3e
Commits on Dec 11, 2020
add manual workflow to run tests with precompiled ops
0518252
[build] fix computer capability arch flags, add PTX, handle PTX (micr… …
8a184b6
add DeepSpeedZeroConfig repr method (microsoft#596) …
66268bd
Supported customizing kwargs for lr_scheduler (microsoft#584) …
a4763f5
Update launcher to set local rank environ variable (microsoft#597) …
c5a449f
Commits on Dec 14, 2020
implement missing get_last_lr (microsoft#595) …
9f8e8f3
Commits on Dec 15, 2020
[doc] xref to hostfile discussion (microsoft#604) …
007466e
Fixes for RTD build errors (microsoft#606) …
6380ee3
Commits on Dec 17, 2020
Transformer-kernel - supporting any arbitrary sequence-length (micros… …
fd2f970
Commits on Dec 18, 2020
Ability to initialize distributed backend outside deepspeed runtime (m… …
7435b2f
Commits on Dec 23, 2020
Elastic training support (microsoft#602) …
81aeea3
Commits on Jan 04, 2021
update SA comp check to fix torch-cpu issue (microsoft#631)
24e0739
Support initialization with dict configuration (microsoft#632)
e6ac731
Commits on Jan 05, 2021
Allow DeepSpeed models to be initialized with optimizer=None (microso… …
a9a83a6
change dist to torch.distributed to fix bug in assert. (microsoft#638)
d38ad6a
docs: minor spelling tweaks (microsoft#623) …
46d2e28
Fix docstring format (microsoft#640)
5ab1279
Commits on Jan 06, 2021
Module replacement support (microsoft#586) …
44bd538
Update builder.py (microsoft#642)
64461da
Commits on Jan 07, 2021
Bump nokogiri from 1.10.10 to 1.11.0 in /docs (microsoft#630) …
8cea96d
Add deepspeed.init_distributed to RTD page (microsoft#645) …
4e2dc4e
Commits on Jan 08, 2021
document deepspeed.initialize() (microsoft#644) …
828d75b
add additional validation checks in elastic config (microsoft#646)
bc046dc
Remove a very verbose print statement. (microsoft#649) …
af212f6
version bump to 0.3.10
c14b839
LR scheduler unit tests (microsoft#429) …
da5563a
Commits on Jan 12, 2021
Handle actvitation checkpointing args that are None or non-tensors (m… …
adcfd26
squash latest flops profiling changes (microsoft#1) (microsoft#664) …
e2fbe4d
Move workspace memory-allocation to PyTorch (microsoft#661) …
981bc7d
Commits on Jan 14, 2021
Validate consistent ckpt tags across ranks (microsoft#667)
f032e56
Commits on Jan 15, 2021
Support optimizer AdamW type (microsoft#670)
865104b
skip empty lines in hostfile (microsoft#669)
6217a6c
Add AdamW to the supported optimizers (microsoft#672) …
c5e4264
add missing config menu entries (microsoft#652) …
e729a3f
doc fix (microsoft#651) …
7b07e12
Commits on Jan 19, 2021
add zero-offload paper (microsoft#680) …
82cecf6
Commits on Jan 20, 2021
[tutorials] typos (microsoft#676) …
7b0bee0
make test_pipe more stable (microsoft#683)
e59ba12
Fix ZeRO 2 + Pipelining (microsoft#677) …
34c83a5

…e for the log directory (#296)

* Avoid deadlock for unsynchronized non-zero checkpointing * Fix formatting issues Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* updates to amp to support grad clip and grad accumulation * zero grad using optimizer if in amp mode

* fix nv_peer_mem version in dockerfile * fix security issue, remove pillow dependency (this is only needed for cifar example which has its own requirements.txt)

mpu object is bound to the class instance.. the if statement uses `self.mpu' but just `mpu` is called in the following lines.. This raises a NameError

The parenthesis alter the evaluation of the assert() and it will always evaluate to True.

Add webinar on-demand links and update readme

* add fix and tests for get_lr from lr_scheduler before training starts

* update fan out flag for pdsh

…316)

* turn off multi-node launch if only 1 node

* Create CODEOWNERS

* Update deepspeed_checkpointing.py * formatting Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation * Gradient Accumulation support for Stage 2. Model tests added to test the feature * formatting * Update deepspeed_light.py removing comment * Update ds_config_func_bs8_zero1.json reverting this file back. Its not needed for this PR * defining baseline prefix Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…2_gas3.json

…0_gas3.json

Renaming config files to gas3

…ry. (#341)

Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>

Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.10 to 1.11.0. - [Release notes](https://github.com/sparklemotion/nokogiri/releases) - [Changelog](https://github.com/sparklemotion/nokogiri/blob/master/CHANGELOG.md) - [Commits](sparklemotion/nokogiri@v1.10.10...v1.11.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Remove a very verbose print statement. * Update engine.py

* Add Linear warmup+decay lr schedule Update lr schedule unit tests * LR scheduler unit tests for LR Range Test and 1Cycle * Disable yapf to preserve parameterizaton * Disable test_pipe.py for CI debugging * Disable test_lr_scheduler for CI debugging * Disable test_lr_scheduler for CI debugging * Enable all unit tests for CI debugging Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

@g-karthik

) Special thanks to @g-karthik for tracking this issue down.

Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* move workspace memory-allocation to PyTorch * refine the code based on the comments * remove unnecessary options * remove bsz from set_seq_len function

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Update README.md * Update index.md

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Fix ZeRO 2 + Pipelining

arashashari and others added 30 commits July 21, 2020 11:11

only global rank 0 can log tensorboard data; avoid multi gpu/node rac…

1f97242

…e for the log directory (#296)

Update setup.py (#298)

871f7e6

Avoid deadlock for unsynchronized non-zero checkpointing (#297)

3cc96e1

* Avoid deadlock for unsynchronized non-zero checkpointing * Fix formatting issues Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

updates to amp to support grad clip and grad accumulation (#290)

eb74c3f

* updates to amp to support grad clip and grad accumulation * zero grad using optimizer if in amp mode

pass steps_per_print to tput timer (#299)

ec94341

bump DSExamples (#300)

0f94f7e

DeepSpeed webinar announcement (#301)

7ae8f8b

Update README.md (#302)

67821f9

Fixing a typo (#303)

97c5427

Fix nv_peer_mem version (#304)

e50b883

* fix nv_peer_mem version in dockerfile * fix security issue, remove pillow dependency (this is only needed for cifar example which has its own requirements.txt)

NameError: name 'mpu' is not defined (#305)

9d07d75

mpu object is bound to the class instance.. the if statement uses `self.mpu' but just `mpu` is called in the following lines.. This raises a NameError

Removing () from assertion. (#307)

c35e944

The parenthesis alter the evaluation of the assert() and it will always evaluate to True.

Add webinar link (#309)

29c5fe2

Add webinar on-demand links and update readme

updates website gems after kramdown alert (#311)

903a41a

Fix+tests for get_lr from lr_scheduler before training starts (#310)

cd68e6e

* add fix and tests for get_lr from lr_scheduler before training starts

bumping DSE commit for pillow security fix (#312)

892ece6

Update deepspeed_lr_schedules.py (#314)

3437342

Update fan out flag for pdsh (#315)

6855ba1

* update fan out flag for pdsh

attach empty grad to its param to ensure it's copied after reduction (#…

e1bea67

…316)

bump DSE (#317)

de0523d

Turn off multi-node launch if only 1 node (#322)

e69b1ee

* turn off multi-node launch if only 1 node

Add code owners for DeepSpeed team (#335)

21d5f63

* Create CODEOWNERS

bump DSE

6823db3

Update deepspeed_checkpointing.py (#336)

458c0d9

* Update deepspeed_checkpointing.py * formatting Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Rename ds_config_func_bs8_zero2_gas10.json to ds_config_func_bs8_zero…

7a356b2

…2_gas3.json

Rename ds_config_func_bs8_zero0_gas10.json to ds_config_func_bs8_zero…

6122a74

…0_gas3.json

Update run_func_test.py

f4726b7

Renaming config files to gas3

Update .gitignore

e8dd47d

Switches BBS example to use mbsize=3 and gas=2 to fit in 16GB of memo…

838f53b

…ry. (#341)

jeffra and others added 29 commits December 22, 2020 22:26

Elastic training support (#602)

81aeea3

Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>

update SA comp check to fix torch-cpu issue (#631)

24e0739

Support initialization with dict configuration (#632)

e6ac731

Allow DeepSpeed models to be initialized with optimizer=None (#469)

a9a83a6

Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>

change dist to torch.distributed to fix bug in assert. (#638)

d38ad6a

docs: minor spelling tweaks (#623)

46d2e28

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Fix docstring format (#640)

5ab1279

Module replacement support (#586)

44bd538

Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Update builder.py (#642)

64461da

Add deepspeed.init_distributed to RTD page (#645)

4e2dc4e

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

document deepspeed.initialize() (#644)

828d75b

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

add additional validation checks in elastic config (#646)

bc046dc

Remove a very verbose print statement. (#649)

af212f6

* Remove a very verbose print statement. * Update engine.py

version bump to 0.3.10

c14b839

Handle actvitation checkpointing args that are None or non-tensors (#660

adcfd26

) Special thanks to @g-karthik for tracking this issue down.

squash latest flops profiling changes (#1) (#664)

e2fbe4d

Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Move workspace memory-allocation to PyTorch (#661)

981bc7d

* move workspace memory-allocation to PyTorch * refine the code based on the comments * remove unnecessary options * remove bsz from set_seq_len function

Validate consistent ckpt tags across ranks (#667)

f032e56

Support optimizer AdamW type (#670)

865104b

skip empty lines in hostfile (#669)

6217a6c

Add AdamW to the supported optimizers (#672)

c5e4264

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

add missing config menu entries (#652)

e729a3f

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

doc fix (#651)

7b07e12

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

add zero-offload paper (#680)

82cecf6

* Update README.md * Update index.md

[tutorials] typos (#676)

7b0bee0

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

make test_pipe more stable (#683)

e59ba12

Fix ZeRO 2 + Pipelining (#677)

34c83a5

* Fix ZeRO 2 + Pipelining

bobisapotato merged commit 6163a3c into bobisai:master Jan 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another thing to merge. (MY EYES HURT) #1

Another thing to merge. (MY EYES HURT) #1

bobisapotato commented Jan 24, 2021

Another thing to merge. (MY EYES HURT) #1

Another thing to merge. (MY EYES HURT) #1

Conversation

bobisapotato commented Jan 24, 2021