Merge Changes from DS Master #30

sdtblck · 2021-04-22T17:04:15Z

No description provided.

* fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com>

…have 'params' (microsoft#827) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Admin merging for pure-doc PR that does not trigger build.

* Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Ensure gradients of other partitions are cleared after reduction * Remove redundant code Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

@awan-10

Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., microsoft#813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (microsoft#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (microsoft#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit df8c40d. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit 7840085, reversing changes made to a6dba72. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit 6dbdd98. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* consistent checkpoint filenaming * backward compatible rename Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

As discussed in microsoft#662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: microsoft#662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

@g-karthik

* [doc] pipeline As @g-karthik flagged in microsoft#659 (comment) my previous correction PR had one sentence that said the wrong thing. So this PR attempts to rectify that. Thank you! * tweak

* see_memory_usage fixes * didn't expect pt-1.2 * fix the order of things * fix the order of things

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

…ped in training (microsoft#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in microsoft#707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

security alert related to older kramdown version

Bumps [kramdown](https://github.com/gettalong/kramdown) from 2.3.0 to 2.3.1. - [Release notes](https://github.com/gettalong/kramdown/releases) - [Changelog](https://github.com/gettalong/kramdown/blob/master/doc/news.page) - [Commits](https://github.com/gettalong/kramdown/commits) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

@samyam

* zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though.

…#955) * e-notation for large floats * handle ints too * readability * handle bool Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* faster flatten/unflatten with apex * switch to cpp flatten/unflatten * style * better comment * missing import * switch to build ops at run time * fixes Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

…#913) * update lr scheduler doc for doing per step or epoch update * work * trigger build Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Fix UnboundLocalError * Get full partition size

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>

* zinf tutorial * more megatron integration docs

* zinf tutorial * more megatron integration docs * ZInf + tiling docs

…icrosoft#983) * Add check to see if json file is already loaded * Update doc * Address review * Remove doc comment Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

@conglongli

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He Paper: https://arxiv.org/abs/2104.06069 Co-authored-by: sdtblck <46172032+sdtblck@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…microsoft#981) * use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Make reduce scatter optional for ZeRO-1 as workaround * Make allreduce default for ZeRO 1 Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…mpling

accesslint

There are accessibility issues in these changes.

accesslint · 2021-04-22T17:05:11Z

csrc/aio/py_test/run_read_sweep.sh

@@ -0,0 +1,59 @@
+#!/bin/bash
+if [[ $# -ne 2 ]]; then
+    echo "Usage: $0 <input file> <output log dir>"


Looks like this element is missing an accessible name or label. That makes it hard for people using screen readers or voice control to use the control.

sid and others added 30 commits March 12, 2021 00:56

test sparse self_attn fix

d190f1c

[WarmupDecayLR] fix log(0) & 1/log(1) bugs (microsoft#772)

18a26f3

* fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com>

bump to v0.3.12

35fd7cc

Bug fix: Remove client optimizer param_group list item that does not …

458ff02

…have 'params' (microsoft#827) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

[doc] pipeline doc typos/improvements (microsoft#659)

73d762c

Admin merging for pure-doc PR that does not trigger build.

ZeRO Stage 2: Clear reduced gradients (microsoft#856)

a75d971

* Ensure gradients of other partitions are cleared after reduction * Remove redundant code Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

[runner/launch] propagate the error (microsoft#854)

24335d4

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

docs: minor spelling tweaks (microsoft#858)

547d1c5

Allow args to be optional in deepspeed.initialize (microsoft#825)

871f304

Fix ZeRO3 save_checkpoint (microsoft#857)

fa87a73

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Make config objects json serializable (microsoft#862)

7bcd72a

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

bump version 0.3.13

12a53b4

consistent checkpoint filenaming (microsoft#865)

10c0bea

* consistent checkpoint filenaming * backward compatible rename Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

[doc] pipeline (microsoft#888)

22d5a1f

* [doc] pipeline As @g-karthik flagged in microsoft#659 (comment) my previous correction PR had one sentence that said the wrong thing. So this PR attempts to rectify that. Thank you! * tweak

[debug utils] see_memory_usage fixes (microsoft#890)

7f03282

* see_memory_usage fixes * didn't expect pt-1.2 * fix the order of things * fix the order of things

full fp32 weights reconstruction for zero 2+3 (microsoft#892)

7531c6b

save_fp16_model consolidated for zero3 (microsoft#893)

39013dd

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

mlperf attn initial commit

b4ac3b6

update kramdown (microsoft#901)

af2d8fc

security alert related to older kramdown version

update backward api doc (microsoft#903)

23ff6cb

We're hiring! + integration posts

8c9e16e

[website] We're hiring! + integration posts

c6b497d

[website] we're hiring!

c814abd

disable pipe test (microsoft#915)

8db4fdf

This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though.

stas00 and others added 25 commits April 14, 2021 07:46

[config] turn exponential notation back on for config dump (microsoft…

c87118b

…#955) * e-notation for large floats * handle ints too * readability * handle bool Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

document how to override ~/.cache/torch_extensions (microsoft#959)

7003d44

[zero] faster flatten/unflatten (cpp version) (microsoft#910)

8b8ed2a

* faster flatten/unflatten with apex * switch to cpp flatten/unflatten * style * better comment * missing import * switch to build ops at run time * fixes Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

update lr scheduler doc for doing per step or epoch update (microsoft…

c83e49f

…#913) * update lr scheduler doc for doing per step or epoch update * work * trigger build Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Fix ZeRO-3 UnboundLocalError (microsoft#968)

2805c39

* Fix UnboundLocalError * Get full partition size

ZeRO-Infinity (microsoft#976)

0d4a54a

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>

revert zero-inf change to launcher

72a30c1

[docs] zero-inf updates

598e50f

bump to 0.3.15

2c2a7f3

ZeRO-Infinity tutorial additions (microsoft#978)

3c47d09

* zinf tutorial * more megatron integration docs

[docs] add ZeRO-Inf news items

1a74195

refactor

5016e93

ZeRO-Infinity docs (microsoft#979)

11279ae

* zinf tutorial * more megatron integration docs * ZInf + tiling docs

[docs] zero-inf updates

5f570bb

assert no Z2/Z3 with pipeline and fix some docs links (microsoft#980)

fbece50

add option to force multi-node launcher mode (microsoft#977)

9e0dab4

[ZeRO Infinity] Allow Init to take a dict for the deepspeed config (m…

3525102

…icrosoft#983) * Add check to see if json file is already loaded * Update doc * Address review * Remove doc comment Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

make bold+italic work without escaping _ (microsoft#775)

835b4c8

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

remove debug prints: (microsoft#986)

eecef30

Use odd shape tensor to represent parameter data in partitioned state (…

894f21d

…microsoft#981) * use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Make reduce scatter optional for ZeRO-1 as workaround (microsoft#971)

0b80ad0

* Make reduce scatter optional for ZeRO-1 as workaround * Make allreduce default for ZeRO 1 Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Fix all Pipeline Module Parameters being sent to cuda:0 (microsoft#687)

669028f

Merge branch 'master' of git://github.com/microsoft/DeepSpeed into sa…

bf5487b

…mpling

remove communicate overflow (already in utils.CheckOverflow)

7c023f2

accesslint bot suggested changes Apr 22, 2021

View reviewed changes

sid and others added 2 commits April 22, 2021 19:14

Merge branch 'main' into sampling

8cbd6aa

Merge branch 'main' into sampling

aab5226

sdtblck merged commit 33e37e1 into main Apr 22, 2021

sdtblck deleted the sampling branch April 22, 2021 17:15

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge Changes from DS Master #30

Merge Changes from DS Master #30

sdtblck commented Apr 22, 2021

accesslint bot left a comment

accesslint bot Apr 22, 2021

Merge Changes from DS Master #30

Merge Changes from DS Master #30

Conversation

sdtblck commented Apr 22, 2021

accesslint bot left a comment

Choose a reason for hiding this comment

accesslint bot Apr 22, 2021

Choose a reason for hiding this comment