Skip to content

Commit

Permalink
Merge r1.13.0 main (#5570)
Browse files Browse the repository at this point in the history
* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* Rename Speech Dataset Processor to Speech Data Processor (#5378)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Megatron Export Update (#5343)

* export update for Megatron + change ORT optimization

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated export_utils to use autocast instead of manually casting >:/

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* removed dtype from LayerNorm

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* added comment

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* reverting changes on FloatCast

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* Cherry-picked changes from megatron-norm

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* updated asr_model import to cast_utils

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* updated del onnx_model place

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* changed ort optimization to basic -> temp fix

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Disable sync_batch_comm in validation_step for GPT (#5397)

* disable sync_batch_comm in validation_step

Signed-off-by: ericharper <complex451@gmail.com>

* Read sync_batch_comm from config or default to False

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Empty

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Comment out test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Radtts 1.13 (#5451)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358)
* [TTS] add CI test for RADTTS training recipe.

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) (#5478)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* export_utils bugfix (#5480)

* updated export_utils

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Export fixes for Riva (#5496)

* Export fixes for Riva

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Cleaning up training_utils

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* added set_start_method + function param bugfix (#5539)

* added set_start_method + function param bugfix

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upper bound torchmetrics

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>

* remove notebook (#5548)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>

* update readme

Signed-off-by: ericharper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
  • Loading branch information
12 people committed Dec 8, 2022
1 parent a15aabd commit 584ca56
Show file tree
Hide file tree
Showing 6 changed files with 4 additions and 904 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ Install it manually if not using the NVIDIA PyTorch container.
git clone https://github.com/ericharper/apex.git
cd apex
git checkout nm_v1.11.0
git checkout nm_v1.13.0
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
Transformer Engine
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ model:
apex_transformer_log_level: 30 # Python logging level displays logs with severity greater than or equal to this
gradient_as_bucket_view: True # PyTorch DDP argument. Allocate gradients in a contiguous bucket to save memory (less fragmentation and buffer memory)
sync_batch_comm: False # Enable stream synchronization after each p2p communication between pipeline stages
use_unified_checkpoint: True # Use model parallel independent checkpointing

## Activation Checkpointing
# NeMo Megatron supports 'selective' activation checkpointing where only the memory intensive part of attention is checkpointed.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.

import torch.multiprocessing as mp
from lightning_lite.plugins.environments import TorchElasticEnvironment
from omegaconf.omegaconf import OmegaConf, open_dict
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks.timer import Timer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.

import torch.multiprocessing as mp
from lightning_lite.plugins.environments import TorchElasticEnvironment
from omegaconf.omegaconf import OmegaConf, open_dict
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks.timer import Timer
Expand Down
4 changes: 1 addition & 3 deletions tests/collections/tts/test_tts_exportables.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
import tempfile

import pytest
import torch
from omegaconf import OmegaConf

from nemo.collections.tts.models import FastPitchModel, HifiGanModel, RadTTSModel
Expand Down Expand Up @@ -81,5 +80,4 @@ def test_RadTTSModel_export_to_torchscript(self, radtts_model):
model = radtts_model.cuda()
with tempfile.TemporaryDirectory() as tmpdir:
filename = os.path.join(tmpdir, 'rad.ts')
with torch.cuda.amp.autocast(enabled=True):
model.export(output=filename, verbose=True, check_trace=True)
model.export(output=filename, verbose=True, check_trace=True)
Loading

0 comments on commit 584ca56

Please sign in to comment.