Skip to content

Commit

Permalink
merge upstream 20221023 (microsoft#108)
Browse files Browse the repository at this point in the history
* Fix the layer-past for GPT based models (microsoft#2196)

* Add gradient_average flag support for sparse grads (microsoft#2188)

* Add gradient_average flag support for sparse grads

* formatting fixes

* Add tests

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Adding additional instructiosn in the compression tutorial on pre-training distillation and quantization for GPT (microsoft#2197)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Log user config exactly (microsoft#2201)

* Fix the tensor-slicing copy for qkv parameters (microsoft#2198)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Refactor Distributed Tests (microsoft#2180)

Refactor Distributed unit tests

* fix table syntax (microsoft#2204)

Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Correctly detect offload configuration (microsoft#2208)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* add cuda 11.7 (microsoft#2211)

* add cuda 11.7

* formatting

* use torch 1.9 (microsoft#2215)

* [zero-3] print warning once and support torch parameter (microsoft#2127)

* print warning only once.

* add support for torch param and only warn on gpu 0

* remove type checking. will be done on a new PR with more tests.

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Add support of OPT models (microsoft#2205)

* add opt replace policy

* simplify inf. api

* fix opt replace policy

* fix use-cash & add relu

* Add support of custom MLP act. function

* Revert "simplify inf. api"

This reverts commit 9e910fc.

* fix the inference API (temp. solution)

* fix code formatting

* add unit tests for OPT models.

* refactor pre-attention layer norm configuration

* add support of opt-350m model

* refactor the HF model config initialization

* fix hf model config issue

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

* fix typos in readme. (microsoft#2218)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* [device abstraction] add device abstraction to allow other device than CUDA be used

* Fix regression w. dist_init_required (microsoft#2225)

* add doc for new bert example (microsoft#2224)

* Remove the random-generator from context during inference (microsoft#2228)

* Fix the tensor-slicing copy for qkv parameters

* remove the random-generator from context during inference

* formatting

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* allow saving ckpt w/o ckpt json + bloom copy fix (microsoft#2237)

* Correctly detect zero_offload (microsoft#2213)

* Correctly detect offload configuration

* Correctly detect offload configuration

* Handle deprecated cpu offload setting

* Correcly detect zero_offload setting

* Minor tweak

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* update videos (microsoft#2249)

* Refactor dist tests: Checkpointing (microsoft#2202)

Refactor distributed tests: checkpointing

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Make OPT policy backward compatible with pre-OPT transformers versions (microsoft#2254)

* fix ds-inference without policy (microsoft#2247)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* bump to 0.7.2

* Enable contiguous gradients with Z1+MoE (microsoft#2250)

MoE training with zero stage 1 only works with `contiguous gradients=True`.

* [rebase-202208] additional changes needed when rebase to 202208

* [rebase] cleanup direct cuda usage after merge

* Correctly detect CPU optimizer usage (microsoft#2257)

* Correctly detect CPU optimizer usage

* Update nv-transformers-v100.yml (microsoft#2259)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* [precommit] fix pre-commit issues

* Update half precision header guards (microsoft#2261)

* fix microsoft#2240: wrong time unit in flops_profiler (microsoft#2241)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* bump to 0.7.3

* Add blob storage to CI runners (microsoft#2260)

Add blob storage to CI runners and enable for transformers cache on inference tests

* Update replace_module.py, test-gptj.py related fix (microsoft#2269)

Fix RuntimeError: Boolean value of Tensor with more than one value is ambiguous when running test-gptj.py

* Fix OrderedDict import for python3.6 (microsoft#2267)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Ds inference/fix mp2 (microsoft#2270)

* Trajepl: nebula load fix (microsoft#2182)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: chenguo <chenguo@microsoft.com>

* prevent torch ext folder mkdir at tmp (microsoft#2274)

* Ds-inference Int8 support through ZeroQuant technology (microsoft#2217)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* add a new unit test for cuda ops (microsoft#2278)

Co-authored-by: cmikeh2 <connorholmes@microsoft.com>

* Add to codeowners file (microsoft#2279)

* [pin_memory] make pin_memory select device type

* Memory Access Utility (microsoft#2276)

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* Fp32 accuracy bug fix (microsoft#2285)

Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Arash Bakhtiari <arashb@users.noreply.github.com>

* Refactor universal checkpointing and tensor fragments (microsoft#2253)

* Refactor universal checkpointing and tensor fragments

* Formatting

* [ds-inference] fix progress bar (microsoft#2286)

when loading the non-sharded checkpoint update the progress bar (fix by @RezaYazdaniAminabadi) - I've just tested it to work.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Offload all gradients to nvme (microsoft#2282)

* fused bias relu unittest (microsoft#2297)

* fix for pytest picking up local deepspeed dir instead of installed deepspeed (microsoft#2299)

* Fix for Zero3 when MP>1 and at least one batch param undefined (microsoft#2289)

Co-authored-by: anthony.301 <anthony.301@mri.cluster>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* [downstream] merge from xpu support downstream

* Unit test for bias add kernel (microsoft#2298)

* added unit test

* Update pt_binding.cpp

* formatting

* Update test_bias_add.py

* Update relu.cu with mem_access_utils (microsoft#2306)

* Add tensor parallel inference unit tests (microsoft#2232)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Sam Ade Jacobs <samjacobs@microsoft.com>

* Fix the residual add mp scaling for  GPTNeoX (microsoft#2310)

* Add unit tests for residual_add kernels (microsoft#2307)

* add inference eval scripts (microsoft#2303)

* Upgrade P40 tests to torch 1.8 (microsoft#2316)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* ZeRO-Inference blog (microsoft#2271)

* ZeRO-Inference blog

* ZeRO-Inference blog

* Format fixes

* Apply feedback

* Feedback

* Update docs/_posts/2022-08-27-zero-inference.md

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update docs/_posts/2022-08-27-zero-inference.md

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address feedback

* Format fixes

* More tweaks

* long sequence, nvme offload

* Add image

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* ZeRO-Inference blog - wrap up  (microsoft#2321)

* ZeRO-Inference blog - Update README (microsoft#2322)

* refactor to use mem_access (microsoft#2317)

* add quant unit test (microsoft#2315)

* add quant unit test

* add codeowner

* format fix

* fix undefined symbol: curandSetPseudoRandomGeneratorSeed

* modify ref fn name and add comment

* add comments

* add 4bit quant 16groups

* fix

* modify groups in ref code

* parameterize tensor shape

* single param

* detach tensor

* remove -lcurand flag

* add back -lcurand flag

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* only override forward if using cuda-graph (microsoft#2291)

* Add more options to inference benchmark (microsoft#2325)

* bump to 0.7.4

* MOE residual matmult unit test (microsoft#2323)

MOE residual matmul unit tests

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* [device] port cuda device to literal_device() in new tests

* MOE matmult with memaccess (microsoft#2336)

* Fix formatting

* Remove redundant variable

* Refactor residual add kernels (microsoft#2333)

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* [accel_runtime] add pin_memory to accelerator runtime interface.

* mem access for quantize kernel (microsoft#2331)

* mem access for quantize kernel

* format

* format fp32

* modify quant kernel

* modify quant kernel2

* modify format

* format

* fix comments in pytest

* fix comments in pytest

* format

* rerun

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>

* increase min pre-commit versions (microsoft#2346)

* Extend scratch buffer for long prompts (microsoft#2212)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fix zero docs (microsoft#2350)

* Inference profiling updates/fixes (microsoft#2348) (microsoft#2349)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Kernel Data Conversion Utility (microsoft#2327)

* Unify macro definitions and constants in a single file

* Conversion utility implementation.

* Fix reversion from formatting

* Bugfixes after testing with correct DeepSpeed

* Inline markers are available on both HIP + CUDA

* Add Onebit Optimzers in __init__ (microsoft#2340)

Co-authored-by: Saeyeol Lee <sylee@si-anlaytics.ai>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* [accelerator abstraction] merge from microsoft#2320

* docs(mixture-of-experts-inference): fix typo in tuto (microsoft#2345)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* download cifar to blob storage (microsoft#2342)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Refactor gptj_residual_add kernels for better readability (microsoft#2358)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

* Updated issue templates (microsoft#2363)

* Update issue templates

* fix cuda invalid config error in dequant kernel (microsoft#2362)

* format

* remove round fn

* Add missing pytest fixture scope (microsoft#2353)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Extend residual_add kernel tests to conver pre_attn_norm (microsoft#2354)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Refactor fused_bias_residual kernels for better readability (microsoft#2356)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Capture error message during sweep tests (microsoft#2351)

* Collect error messages in results.csv

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* fix an exception when recursively casting dicts to fp16 (microsoft#2370)

* Refactor remaining distributed tests (microsoft#2216)

* batch of refactored tests

* more test refactoring

* fp16 test refactor

* more refactors

* added DistributedFixture class

* applied DistributedFixture to first batch of tests as a trial

* added DistributedFixture test and documentation

* last tests

* fixes for refactored tests

* remove subdirs in workflow files

* fix pytest syntax error

* fix another syntax error

* update imports

* use DistFixture with elastic checkpoint test

* missing import

* update to shared class tmpdir for elastic test

* moved test files

* avoid duplicate test file name

* last refactor and moving test files

* formatting

* fix broken import

* testing forked AMD tests

* update abstract method

* use blob storage for accelerate and transformers tests

* upgrade torch for acclerate CI

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Fix the MLP output tensor's shape (microsoft#2380)

* allow building with latest CUDA (11.8), it is backwards compatible (microsoft#2390)

* pin transformers version for unit tests (microsoft#2402)

* Change type to tuple in replace_wo_policy isinstance check (microsoft#2387)

Update the isinstance check inside the `replace_wo_policy` function to `tuple` and `str` instead of `dict`, since the layers are provided as a `tuple` type.

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Molly Smith <mosm@microsoft.com>
Co-authored-by: Lok Chand Koppaka <lokoppak@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* Checkpoint backwards-compatbility workaround (microsoft#2384)

* Add predicated global load (microsoft#2373)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

* change call site of literal_device, on_accel_device and accel_runtime to get_accelerator() call

* add new interface definition from olruwase/accelerator_abstraction

* MII blog post (microsoft#2418)

Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* Fix figure reference (microsoft#2419)

* [docs] update news items

* [docs] add mii repo link

* Add SLURM Multinode Runner (microsoft#2404)

Signed-off-by: Dashiell Stander <dstander@protonmail.com>
Co-authored-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Fix issue with corrupted output on long generation for GPT (microsoft#2359)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* MII blog title update on Readme

* DeepSpeed-MII title change in website

* Fix GPT Neo-X multi-gpu inference (microsoft#2401)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* MII-Public and MII-Azure subheading in mii post

* CI fixes related to triton (microsoft#2422)

* [docs] update mii blog title (microsoft#2423)

* add SD injection policy (microsoft#2381)

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

* [accelerator abstraction] remove name() from interface, device_name() should be used.

* merge with master (ec13da6)

* fix checkpoint loading when it is a dictionary (microsoft#2425)

* Make error regex more generic in collect_results.py (microsoft#2415)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fixes microsoft#2389 (microsoft#2411)

truncating expert param storage for checkpointing

Co-authored-by: Alexander Jipa <azzhipa@amazon.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Fix for inference gpt-j test (microsoft#2430)

* fix for gpt-j failing due to tokenizer error

* limit number of gpt-j tokens generated due to low memory

* Fixing bug 2361 (microsoft#2410)

* fixing bug 2361

* adding pytest for config initialization

* chaning expected output to FusedAdam

* remove print statement

* running yapf on modified files

* running pre-commit formatting

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Universal checkpoint for zero stage 1 (microsoft#2284)

* Refactor universal checkpointing and tensor fragments

* Formatting

* Support zero stage1; Expand TP dim

* Remove debug prints

* Detect sharded optimizer state

* Format fixes

* Encode reshaping guide

* More symbolic constants

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* only add deps if extra is explictly called (microsoft#2432)

* Add TestInjectionPolicy inference unittest class for testing custom injection policies (microsoft#2426)

This PR adds a TestInjectionPolicy inference unittest class for testing custom injection policies.

This test differs from the existing tests in that the injection_policy dictionary is explicitly specified when calling the DeepSpeed init_inference API.

The google/t5-v1_1-small text2text-generation model and the roberta-large fill-mask model are added as tests with the injection policy explicitly specified.

This is done to expand our unittest coverage to test the path where the replace_wo_policy function is invoked (see microsoftGH-2387).

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* [memory estimators] new config args sync (microsoft#2431)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* parallelize writing of layer checkpoint files across data parallel instances (microsoft#1419)

* parallelize layer checkpoints across data parallel groups

* use partition_uniform to determine start/end index values

* formatting fix

* config: add option for parallel write of layer checkpoints in pipeline stage

* yapf fixes

* enable parallel layer write according to config param

* avoid extraneous makedir when rank 0 writes all layers

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Fix broken link to DeepSpeed Megatron fork (microsoft#2440)

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

* bump to 0.7.5

* [OpBuilder] Add op builder abstraction

* convert op builder usage in merged code

* merge diff files from upstream

* [OpBuilder] add create_op_builder interface in abstract_accelerator.py

* remove files that is deleted from upstream

* [OpBuilder] add left over op builder usage in tests

* [OpBuilder] fix op builder usage in tests

* [OpBuilder] fix <op builder>.NAME usage in tests to follow op builder abstraction design

* import get_accelerator from deepspeed.accelerator directly

* [OpBuilder] remove unused function and sync with main

* add missing import

* revert changes in device.py to avoid conflict with main

* fix alexnet_model to use /tmp instead of /blob

* Mingzhi/solve pr108 b (microsoft#115)

* move ALL_OPs from __init__.py to all_Op.py to solve circular import

* delete deepspeedexamples

* fix import

* fix regression (microsoft#117)

* fix pin_memory

* fix regression

* fix error

Signed-off-by: Dashiell Stander <dstander@protonmail.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Mikhail Druzhinin <dipetm@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Minjia Zhang <33713995+minjiaz@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Kamal Raj <kamalraj97@gmail.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Arash Bakhtiari <arashb@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Zhihong Chen <gdst_czh@163.com>
Co-authored-by: Siddharth Singh <siddharth9820@gmail.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: 叶志晟 <yzs981130@126.com>
Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com>
Co-authored-by: trajep <trajepl@gmail.com>
Co-authored-by: chenguo <chenguo@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: anthony.301 <anthony.301@mri.cluster>
Co-authored-by: Sam Ade Jacobs <samjacobs@microsoft.com>
Co-authored-by: Guanhua Wang <alexwgh333@gmail.com>
Co-authored-by: Saeyeol Lee <78332687+l4d2boomer@users.noreply.github.com>
Co-authored-by: Saeyeol Lee <sylee@si-anlaytics.ai>
Co-authored-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Co-authored-by: Matt Smith <matt@mjksmith.com>
Co-authored-by: Thomas-MMJ <112830596+Thomas-MMJ@users.noreply.github.com>
Co-authored-by: lekurile <113481193+lekurile@users.noreply.github.com>
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Molly Smith <mosm@microsoft.com>
Co-authored-by: Lok Chand Koppaka <lokoppak@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Dashiell Stander <dstander@protonmail.com>
Co-authored-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>
Co-authored-by: Andrey Chernykh <andrew.chernyh@gmail.com>
Co-authored-by: Alexander Jipa <alexander.jipa@gmail.com>
Co-authored-by: Alexander Jipa <azzhipa@amazon.com>
Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com>
Co-authored-by: Adam Moody <moody20@llnl.gov>
Co-authored-by: AGUL <mingzhi.liu@intel.com>
  • Loading branch information
Show file tree
Hide file tree
Showing 220 changed files with 6,778 additions and 6,542 deletions.
43 changes: 43 additions & 0 deletions .github/ISSUE_TEMPLATE/compression_bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
name: Bug report (compression)
about: Create a DeepSpeed compression related issue to help us improve
title: "[BUG]"
labels: bug,compression
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**ds_report output**
Please run `ds_report` to give us details about your setup.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**System info (please complete the following information):**
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]
- Python version
- Any other relevant info about your setup

**Launcher context**
Are you launching your experiment with the `deepspeed` launcher, MPI, or something else?

**Docker context**
Are you using a specific docker image that you can share?

**Additional context**
Add any other context about the problem here.
41 changes: 41 additions & 0 deletions .github/ISSUE_TEMPLATE/inference_bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
name: Bug report (inference)
about: Create a DeepSpeed inference related issue to help us improve
title: "[BUG]"
labels: bug,inference
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Simple inference script to reproduce
2. What packages are required and their versions
3. How to run the script
4. ...

**Expected behavior**
A clear and concise description of what you expected to happen.

**ds_report output**
Please run `ds_report` to give us details about your setup.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**System info (please complete the following information):**
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup

**Docker context**
Are you using a specific docker image that you can share?

**Additional context**
Add any other context about the problem here.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
name: Bug report
about: Create a report to help us improve
name: Bug report (training)
about: Create a DeepSpeed training related issue to help us improve
title: "[BUG]"
labels: bug
labels: bug,training
assignees: ''

---
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
which hipcc
hipcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand Down Expand Up @@ -67,5 +67,5 @@ jobs:
run: |
if [[ -d ./torch-extensions ]]; then rm -rf ./torch-extensions; fi
cd tests
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --verbose unit/{autotuning,checkpoint,comm,compression,elasticity,inference,launcher,monitor,ops,profiling,runtime,utils}
#TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --verbose -m 'sequential' unit/{autotuning,checkpoint,comm,compression,elasticity,inference,launcher,monitor,ops,profiling,runtime,utils}
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --forked -n 4 --verbose unit/
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --forked --verbose -m 'sequential' unit/
6 changes: 3 additions & 3 deletions .github/workflows/nv-accelerate-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip uninstall --yes torch torchvision triton
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu111
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -58,4 +58,4 @@ jobs:
# tmp fix: force newer datasets version
pip install "datasets>=2.0.0"
pip list
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --verbose tests/deepspeed
HF_DATASETS_CACHE=/blob/datasets_cache/ TRANSFORMERS_CACHE=/blob/transformers_cache/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --verbose tests/deepspeed
4 changes: 2 additions & 2 deletions .github/workflows/nv-inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -51,7 +51,7 @@ jobs:
- name: Install deepspeed
run: |
pip uninstall --yes deepspeed
pip install .[dev,1bit,autotuning,sparse_attn,inf]
pip install .[dev,1bit,autotuning,inf]
ds_report
- name: Unit tests
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/nv-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -42,7 +42,7 @@ jobs:
- name: Install deepspeed
run: |
pip uninstall --yes deepspeed
pip install .[dev,1bit,autotuning,sparse_attn,inf]
pip install .[dev,1bit,autotuning,inf]
ds_report
- name: Unit tests
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/nv-torch-latest-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -53,13 +53,13 @@ jobs:
- name: Install deepspeed
run: |
pip uninstall --yes deepspeed
pip install .[dev,1bit,autotuning,sparse_attn]
pip install .[dev,1bit,autotuning]
ds_report
- name: Unit tests
run: |
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
if [[ -d ./torch-extensions ]]; then rm -rf ./torch-extensions; fi
cd tests
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --forked --verbose -n 4 unit/{autotuning,checkpoint,comm,compression,elasticity,inference,launcher,monitor,ops,profiling,runtime,utils} --torch_ver="1.12" --cuda_ver="11.3"
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --forked --verbose -m 'sequential' unit/{autotuning,checkpoint,comm,compression,elasticity,inference,launcher,monitor,ops,profiling,runtime,utils} --torch_ver="1.12" --cuda_ver="11.3"
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --forked --verbose -n 4 unit/ --torch_ver="1.12" --cuda_ver="11.3"
TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --forked --verbose -m 'sequential' unit/ --torch_ver="1.12" --cuda_ver="11.3"
4 changes: 2 additions & 2 deletions .github/workflows/nv-torch-nightly-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu113
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -46,7 +46,7 @@ jobs:
- name: Install deepspeed
run: |
pip uninstall --yes deepspeed
pip install .[dev,1bit,autotuning,sparse_attn]
pip install .[dev,1bit,autotuning]
ds_report
- name: Unit tests
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/nv-torch18-p40.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu101
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -53,7 +53,7 @@ jobs:
- name: Install deepspeed
run: |
pip uninstall --yes deepspeed
pip install .[dev,1bit,autotuning,sparse_attn]
pip install .[dev,1bit,autotuning]
ds_report
- name: Unit tests
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/nv-torch18-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -53,7 +53,7 @@ jobs:
- name: Install deepspeed
run: |
pip uninstall --yes deepspeed
pip install .[dev,1bit,autotuning,sparse_attn]
pip install .[dev,1bit,autotuning]
ds_report
- name: Unit tests
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/nv-transformers-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
which nvcc
nvcc --version
pip install --upgrade pip
pip uninstall --yes torch torchvision
pip uninstall --yes torch torchvision triton
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand All @@ -54,7 +54,7 @@ jobs:
git clone https://github.com/huggingface/transformers
cd transformers
# if needed switch to the last known good SHA until transformers@master is fixed
# git checkout 1cc453d33
git checkout 6268694e2
git rev-parse --short HEAD
# scipy/sklearn required for tests, using the 'dev' extra forces torch re-install
pip install .[testing]
Expand All @@ -65,4 +65,4 @@ jobs:
# force protobuf version due to issues
pip install "protobuf<4.21.0"
pip list
WANDB_DISABLED=true TORCH_EXTENSIONS_DIR=./torch-extensions RUN_SLOW=1 pytest --color=yes --durations=0 --verbose tests/deepspeed
HF_DATASETS_CACHE=/blob/datasets_cache/ TRANSFORMERS_CACHE=/blob/transformers_cache/ WANDB_DISABLED=true TORCH_EXTENSIONS_DIR=./torch-extensions RUN_SLOW=1 pytest --color=yes --durations=0 --verbose tests/deepspeed
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,9 @@ repos:
--check-filenames,
--check-hidden
]

- repo: https://github.com/pycqa/flake8
rev: 4.0.1
hooks:
- id: flake8
args: ['--ignore=E,F403,F405,F541,F841,W', '--select=E9,F,W6', '--per-file-ignores=__init__.py:F401']
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@
## Latest News
<b> DeepSpeed trained the world's most powerful language models ([MT-530B](https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/), [BLOOM](https://huggingface.co/blog/bloom-megatron-deepspeed)); [learn how](https://www.deepspeed.ai/tutorials/large-models-w-deepspeed/).</b>

* [2022/10] [DeepSpeed-MII: instant speedup on 24,000+ open-source DL models with up to 40x cheaper inference](https://www.deepspeed.ai/2022/10/10/mii.html)
* [2022/09] [ZeRO-Inference: Democratizing massive model inference](https://www.deepspeed.ai/2022/09/09/zero-inference.html)
* [2022/07] [Azure and DeepSpeed empower easy-to-use and high-performance model training](https://azure.microsoft.com/en-us/blog/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed/)
* [2022/07] [DeepSpeed Compression: A composable library for extreme compression](https://www.microsoft.com/en-us/research/blog/deepspeed-compression-a-composable-library-for-extreme-compression-and-zero-cost-quantization/)
* [2022/03] [Supporting efficient large model training on AMD Instinct GPUs with DeepSpeed](https://cloudblogs.microsoft.com/opensource/2022/03/21/supporting-efficient-large-model-training-on-amd-instinct-gpus-with-deepspeed/)
* [2022/03] [Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam](https://www.deepspeed.ai/tutorials/zero-one-adam/)

---

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/all_gather.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from benchmarks.communication.utils import *
from benchmarks.communication.constants import *
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

import time

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/all_reduce.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from benchmarks.communication.utils import *
from benchmarks.communication.constants import *
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

import time

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/all_to_all.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from benchmarks.communication.utils import *
from benchmarks.communication.constants import *
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

import time

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/broadcast.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import torch
from benchmarks.communication.utils import *
from benchmarks.communication.constants import *
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

import time

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/constants.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

DEFAULT_WARMUPS = 5
DEFAULT_TRIALS = 50
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/pt2pt.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from benchmarks.communication.utils import *
from benchmarks.communication.constants import *
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

import time

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/communication/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import math
import argparse
from benchmarks.communication.constants import *
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

global dist

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/inference/bert-bench.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import deepspeed
import argparse
from transformers import pipeline
from deepspeed.accelerator.real_accelerator import get_accelerator
from deepspeed.accelerator import get_accelerator

parser = argparse.ArgumentParser()
parser.add_argument("--model", "-m", type=str, help="hf model name")
Expand Down
20 changes: 15 additions & 5 deletions benchmarks/inference/collect_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,14 @@ def get_generated_text(file_content, gen_text_n):
return {f"generated-text-{key}": val for key, val in matches}


def get_error(file_content):
matches = re.findall(r"Error:\s+(.+?)\n", file_content)
if matches is []:
return False
else:
return {f"error": val for val in matches}


if __name__ == "__main__":
# List to collect data from all benchmarks
benchmarks_data = []
Expand Down Expand Up @@ -112,15 +120,17 @@ def get_generated_text(file_content, gen_text_n):
perf_data = get_perf_data(file_content)
if not perf_data:
print(
f"WARNING: Could not detect benchmark performance data for file {file_path}, skipping"
f"WARNING: Could not detect benchmark performance data for file {file_path}"
)
continue

generated_text = get_generated_text(file_content, args.gen_text_n)
if not generated_text:
print(
f"WARNING: Could not detect generated text for file {file_path}, skipping"
)
print(f"WARNING: Could not detect generated text for file {file_path}")

error = get_error(file_content)
if error:
print(f"Error found in {file_path}, collecting error info...")
benchmarks_data.append({"branch": branch, **params, **error})
continue

benchmarks_data.append({
Expand Down

0 comments on commit 77020fe

Please sign in to comment.