From f9898b7444a0ecfa66ae606b682186d3cb499803 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 10 Sep 2020 15:22:28 -0600 Subject: [PATCH 01/51] initial commit Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 35 +++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 docs/source/conversational_ai.rst diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst new file mode 100644 index 0000000000000..5ee9a8193a2ba --- /dev/null +++ b/docs/source/conversational_ai.rst @@ -0,0 +1,35 @@ +Conversational AI +----------------- + +Using NeMo Models +^^^^^^^^^^^^^^^^^ + +NeMo is a conversational AI toolkit that uses PyTorch Lightning for +training and fine-tuning of automatic speech recognition(ASR), +natural language processing (NLP), and text-to-speech (TTS) applications and research. + +.. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. + +Example: Speech to Text (ASR) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Train Convolutional ASR models with NeMo and PyTorch Lightning. + +.. code-block:: python + + trainer = Trainer(**cfg.trainer) + asr_model = EncDecCTCModel(cfg.model, trainer) + trainer.fit(asr_model) + +.. note:: NeMo models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +Transcribe audio with QuartzNet pretrained on 1000's of hours of audio. + +.. code-block:: python + + quartznet = EncDecCTCModel.from_pretrained('QuartzNet15x5Base-En') + + files = ['path/to/my.wav'] # file should be less than 25 seconds + + for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): + print(f"Audio in {fname} was recognized as: {transcription}") \ No newline at end of file From 371242ee4093d3b08593f65d3d37aa9f33aa5dbe Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 10 Sep 2020 15:41:56 -0600 Subject: [PATCH 02/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 38 ++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 5ee9a8193a2ba..2977c2af3070f 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -10,6 +10,23 @@ natural language processing (NLP), and text-to-speech (TTS) applications and res .. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. +Installing NeMo +^^^^^^^^^^^^^^^ + +Installing the latest NeMo release is a simple pip install. + +.. code-block:: bash + + pip install nemo_toolkit[all] + +To install a specific branch from GitHub: + +.. code-block:: bash + + python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[nlp] + +.. note:: Replace {BRANCH} with the specific branch name from GitHub. + Example: Speech to Text (ASR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -23,7 +40,26 @@ Train Convolutional ASR models with NeMo and PyTorch Lightning. .. note:: NeMo models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. -Transcribe audio with QuartzNet pretrained on 1000's of hours of audio. +Training NeMo models with PyTorch Lightning and Hydra is simple from the command line. + +.. code-block:: bash + + python examples/asr/speech_to_text.py --config-name=quartznet_15x5 \ + model.train_ds.manifest_filepath=/librispeech-train-all.json \ + model.validation_ds.manifest_filepath=/librispeech-dev-other.json \ + trainer.gpus=4 trainer.max_epochs=128 model.train_ds.batch_size=64 \ + +trainer.precision=16 \ + +model.validation_ds.num_workers=16 \ + +model.train_ds.num_workers=16 + +Optionally launch Tensorboard to view training results + +.. code-block:: bash + + tensorboard --bind_all --logdir nemo_experiments + + +Transcribe audio with QuartzNet pretrained on 7000+ hours of audio. .. code-block:: python From 547f02e550288c1f674c069cb56ee049c2ce276a Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 10 Sep 2020 16:54:05 -0600 Subject: [PATCH 03/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 32 +++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 2977c2af3070f..db8f8b3ad1d23 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -44,13 +44,18 @@ Training NeMo models with PyTorch Lightning and Hydra is simple from the command .. code-block:: bash - python examples/asr/speech_to_text.py --config-name=quartznet_15x5 \ - model.train_ds.manifest_filepath=/librispeech-train-all.json \ - model.validation_ds.manifest_filepath=/librispeech-dev-other.json \ - trainer.gpus=4 trainer.max_epochs=128 model.train_ds.batch_size=64 \ - +trainer.precision=16 \ - +model.validation_ds.num_workers=16 \ - +model.train_ds.num_workers=16 + python NeMo/examples/asr/speech_to_text.py --config-name=quartznet_15x5 \ + trainer.gpus=4 \ + trainer.max_epochs=128 \ + +trainer.precision=16 \ + +trainer.amp_level=O1 \ + model.train_ds.manifest_filepath=/librispeech-train-all.json \ + model.validation_ds.manifest_filepath=/librispeech-dev-other.json \ + model.train_ds.batch_size=64 \ + +model.validation_ds.num_workers=16 \ + +model.train_ds.num_workers=16 + +.. note:: Training NeMo ASR models can take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. Optionally launch Tensorboard to view training results @@ -68,4 +73,15 @@ Transcribe audio with QuartzNet pretrained on 7000+ hours of audio. files = ['path/to/my.wav'] # file should be less than 25 seconds for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): - print(f"Audio in {fname} was recognized as: {transcription}") \ No newline at end of file + print(f"Audio in {fname} was recognized as: {transcription}") + +Example: Voice Activity Detection (VAD) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Train a MatchboxNet model with a modified decoder head for recognizing speakers. + +.. code-block:: python + + trainer = Trainer(**cfg.trainer) + speaker_model = EncDecSpeakerLabelModel(cfg=cfg.model, trainer=trainer) + trainer.fit(speaker_model) \ No newline at end of file From 455e123476149685d6195728cfbcc0a75992957f Mon Sep 17 00:00:00 2001 From: ericharper Date: Fri, 11 Sep 2020 15:21:35 -0600 Subject: [PATCH 04/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index db8f8b3ad1d23..dc1d33565287d 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -4,12 +4,33 @@ Conversational AI Using NeMo Models ^^^^^^^^^^^^^^^^^ -NeMo is a conversational AI toolkit that uses PyTorch Lightning for -training and fine-tuning of automatic speech recognition(ASR), -natural language processing (NLP), and text-to-speech (TTS) applications and research. +NeMo is a toolkit for doing research in Conversational AI. NeMo makes it easy to build complex +automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) +applications. + +Conversational AI architectures are typically very large and require a lot of data and compute +for training. NeMo uses PyTorch Lightning for for easy and performant multi-gpu/multi-node +mixed-precision training. .. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. +NeMo Models contain everything needed to to train and reproduce state of the art Conversational AI +research and applications. This includes + +- neural network architectures +- datasets/dataloaders +- data preprocessing/postprocessing +- data augmentors +- optimizers and schedulers +- tokenizers, language models + +NeMo uses Hydra for configuring both NeMo models and the PyTorch Lightning Trainer. +Depending on the domain and application, many different AI libraries will have to be configured +to build the application. Hydra makes it easy to bring all of these libraries together +and do all the configuration from .yaml or the Hydra CLI. + +.. note:: Every NeMo model has an example configuration and run script that contains all configuration needed for training. + Installing NeMo ^^^^^^^^^^^^^^^ From ebe0b844d07e1fa61e31e3f940739d7dafc137fa Mon Sep 17 00:00:00 2001 From: ericharper Date: Fri, 11 Sep 2020 15:32:22 -0600 Subject: [PATCH 05/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 38 ++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index dc1d33565287d..432cd51d8cebb 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -31,6 +31,10 @@ and do all the configuration from .yaml or the Hydra CLI. .. note:: Every NeMo model has an example configuration and run script that contains all configuration needed for training. +The end result of using NeMo, Pytorch Lightning, and Hydra is that +NeMo models all have the same look and feel so that is easy to do Conversational AI research +across multiple domains and all NeMo models are fully compatible with the PyTorch ecosystem. + Installing NeMo ^^^^^^^^^^^^^^^ @@ -51,7 +55,39 @@ To install a specific branch from GitHub: Example: Speech to Text (ASR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Train Convolutional ASR models with NeMo and PyTorch Lightning. +Everything needed to train Convolutional ASR models with NeMo, PyTorch Lightning, and Hydra is +included with NeMo. + +Configurations are in .yaml files included with NeMo/examples + +.. code-block:: yaml + + # configure the PyTorch Lightning Trainer + trainer: + gpus: 0 # number of gpus + max_epochs: 5 + max_steps: null # computed at runtime if not set + num_nodes: 1 + distributed_backend: ddp + ... + # configure the ASR model + encoder: + cls: nemo.collections.asr.modules.ConvASREncoder + params: + feat_in: *n_mels + activation: relu + conv_mask: true + + jasper: + - filters: 128 + repeat: 1 + kernel: [11] + stride: [1] + dilation: [1] + dropout: *dropout + # an all other configuration, data, optimizer, etc + + .. code-block:: python From 8762d81074915c01cedbdad82d93a8dcf569b342 Mon Sep 17 00:00:00 2001 From: ericharper Date: Fri, 11 Sep 2020 15:41:08 -0600 Subject: [PATCH 06/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 36 ++++++++++++------------------- 1 file changed, 14 insertions(+), 22 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 432cd51d8cebb..b4c3e029a5ba2 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -22,7 +22,8 @@ research and applications. This includes - data preprocessing/postprocessing - data augmentors - optimizers and schedulers -- tokenizers, language models +- tokenizers +- language models NeMo uses Hydra for configuring both NeMo models and the PyTorch Lightning Trainer. Depending on the domain and application, many different AI libraries will have to be configured @@ -55,8 +56,7 @@ To install a specific branch from GitHub: Example: Speech to Text (ASR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Everything needed to train Convolutional ASR models with NeMo, PyTorch Lightning, and Hydra is -included with NeMo. +Everything needed to train Convolutional ASR models is included with NeMo. Configurations are in .yaml files included with NeMo/examples @@ -85,19 +85,22 @@ Configurations are in .yaml files included with NeMo/examples stride: [1] dilation: [1] dropout: *dropout - # an all other configuration, data, optimizer, etc - - + ... + # all other configuration, data, optimizer, etc + ... .. code-block:: python - trainer = Trainer(**cfg.trainer) - asr_model = EncDecCTCModel(cfg.model, trainer) - trainer.fit(asr_model) + @hydra.main(config_name="config") + def main(cfg): + trainer = Trainer(**cfg.trainer) + asr_model = EncDecCTCModel(cfg.model, trainer) + trainer.fit(asr_model) .. note:: NeMo models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. -Training NeMo models with PyTorch Lightning and Hydra is simple from the command line. +Hydra makes it so that every aspect of the NeMo model, +including the PyTorch Lightning Trainer can be modified from the command line. .. code-block:: bash @@ -130,15 +133,4 @@ Transcribe audio with QuartzNet pretrained on 7000+ hours of audio. files = ['path/to/my.wav'] # file should be less than 25 seconds for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): - print(f"Audio in {fname} was recognized as: {transcription}") - -Example: Voice Activity Detection (VAD) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Train a MatchboxNet model with a modified decoder head for recognizing speakers. - -.. code-block:: python - - trainer = Trainer(**cfg.trainer) - speaker_model = EncDecSpeakerLabelModel(cfg=cfg.model, trainer=trainer) - trainer.fit(speaker_model) \ No newline at end of file + print(f"Audio in {fname} was recognized as: {transcription}") \ No newline at end of file From e79dad5bbba4cd48ac0dbba9cdd0555c8b252f18 Mon Sep 17 00:00:00 2001 From: ericharper Date: Fri, 11 Sep 2020 15:49:49 -0600 Subject: [PATCH 07/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index b4c3e029a5ba2..6e57ac4a40a6b 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -56,7 +56,18 @@ To install a specific branch from GitHub: Example: Speech to Text (ASR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Everything needed to train Convolutional ASR models is included with NeMo. +Everything needed to train Convolutional ASR models is included with NeMo. +NeMo supports multiple Speech Recognition architectures, including Jasper +and QuartzNet. These models can be trained from scratch on custom datasets or +pretrained checkpoints trained on thousands of hour of audio can be restored for +immediate use. + +Some typical ASR tasks are included with NeMo: + +- Audio transcription +- Speech Commands +- Voice Activity Detection +- Byte Pair/Word Piece Training Configurations are in .yaml files included with NeMo/examples @@ -72,7 +83,7 @@ Configurations are in .yaml files included with NeMo/examples ... # configure the ASR model encoder: - cls: nemo.collections.asr.modules.ConvASREncoder + _target_: nemo.collections.asr.modules.ConvASREncoder params: feat_in: *n_mels activation: relu From 1f15741780e54827f8cd4a3dd090e8e4a3e167c4 Mon Sep 17 00:00:00 2001 From: ericharper Date: Fri, 11 Sep 2020 16:00:00 -0600 Subject: [PATCH 08/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 69 ++++++++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 6e57ac4a40a6b..a561743ab5f70 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -100,6 +100,8 @@ Configurations are in .yaml files included with NeMo/examples # all other configuration, data, optimizer, etc ... +The example speech-to-text script is just: + .. code-block:: python @hydra.main(config_name="config") @@ -144,4 +146,69 @@ Transcribe audio with QuartzNet pretrained on 7000+ hours of audio. files = ['path/to/my.wav'] # file should be less than 25 seconds for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): - print(f"Audio in {fname} was recognized as: {transcription}") \ No newline at end of file + print(f"Audio in {fname} was recognized as: {transcription}") + +Any aspect of ASR training or model architecture design can easily be customized +since every NeMo model is a Lightning Module. + +.. code-block:: python + + class EncDecCTCModel(ASRModel): + """Base class for encoder decoder CTC-based models.""" + ... + @typecheck() + def forward(self, input_signal, input_signal_length): + processed_signal, processed_signal_len = self.preprocessor( + input_signal=input_signal, length=input_signal_length, + ) + # Spec augment is not applied during evaluation/testing + if self.spec_augmentation is not None and self.training: + processed_signal = self.spec_augmentation(input_spec=processed_signal) + encoded, encoded_len = self.encoder(audio_signal=processed_signal, length=processed_signal_len) + log_probs = self.decoder(encoder_output=encoded) + greedy_predictions = log_probs.argmax(dim=-1, keepdim=False) + return log_probs, encoded_len, greedy_predictions + + # PTL-specific methods + def training_step(self, batch, batch_nb): + audio_signal, audio_signal_len, transcript, transcript_len = batch + log_probs, encoded_len, predictions = self.forward( + input_signal=audio_signal, input_signal_length=audio_signal_len + ) + loss_value = self.loss( + log_probs=log_probs, targets=transcript, input_lengths=encoded_len, target_lengths=transcript_len + ) + wer_num, wer_denom = self._wer(predictions, transcript, transcript_len) + tensorboard_logs = { + 'train_loss': loss_value, + 'training_batch_wer': wer_num / wer_denom, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + } + return {'loss': loss_value, 'log': tensorboard_logs} + +Additionally, NeMo Models and Neural Modules come with Neural Type checking. +Neural type checking is extremely use when combining many different neural +network architectures for a production grade application. + +.. code-block:: python + + @property + def input_types(self) -> Optional[Dict[str, NeuralType]]: + if hasattr(self.preprocessor, '_sample_rate'): + audio_eltype = AudioSignal(freq=self.preprocessor._sample_rate) + else: + audio_eltype = AudioSignal() + return { + "input_signal": NeuralType(('B', 'T'), audio_eltype), + "input_signal_length": NeuralType(tuple('B'), LengthsType()), + } + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + return { + "outputs": NeuralType(('B', 'T', 'D'), LogprobsType()), + "encoded_lengths": NeuralType(tuple('B'), LengthsType()), + "greedy_predictions": NeuralType(('B', 'T'), LabelsType()), + } + + From 5f95d3933e1c2bf81cae9500bf187a3e35acde0c Mon Sep 17 00:00:00 2001 From: ericharper Date: Fri, 11 Sep 2020 16:07:04 -0600 Subject: [PATCH 09/51] updated Signed-off-by: ericharper --- docs/source/index.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/source/index.rst b/docs/source/index.rst index c683f44a431e6..50a6fe2c06cc9 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -6,6 +6,13 @@ PyTorch Lightning Documentation =============================== +.. toctree:: + :maxdepth: 1 + :name: conversational_ai + :caption: Conversational AI + + conversational_ai + .. toctree:: :maxdepth: 1 :name: start From c71ced7dca51da4eade48bc42cbe505914d95cd6 Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 14 Sep 2020 11:41:06 -0600 Subject: [PATCH 10/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index a561743ab5f70..cc14afa65b9a6 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -1,21 +1,24 @@ Conversational AI ----------------- -Using NeMo Models -^^^^^^^^^^^^^^^^^ - -NeMo is a toolkit for doing research in Conversational AI. NeMo makes it easy to build complex -automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) +NVIDIA NeMo Models +^^^^^^^^^^^^^^^^^^ + +`NVIDIA NeMo `_ is a toolkit for building +Conversational AI applications. NeMo has separate collections for Automatic Speech Recognition (ASR), +Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of +prebuilt modules that include everything needed to train on your own data. +Every module can easily be customized, extended, and composed to create complex Conversational AI applications. Conversational AI architectures are typically very large and require a lot of data and compute -for training. NeMo uses PyTorch Lightning for for easy and performant multi-gpu/multi-node +for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. .. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. -NeMo Models contain everything needed to to train and reproduce state of the art Conversational AI -research and applications. This includes +NeMo Models contain everything needed to train and reproduce state of the art Conversational AI +research and applications, including: - neural network architectures - datasets/dataloaders @@ -25,7 +28,7 @@ research and applications. This includes - tokenizers - language models -NeMo uses Hydra for configuring both NeMo models and the PyTorch Lightning Trainer. +NeMo uses `Hydra `_ for configuring both NeMo models and the PyTorch Lightning Trainer. Depending on the domain and application, many different AI libraries will have to be configured to build the application. Hydra makes it easy to bring all of these libraries together and do all the configuration from .yaml or the Hydra CLI. From c5980c750008f58ae7fc1ed5bbc2bcad14b8db28 Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 14 Sep 2020 12:01:18 -0600 Subject: [PATCH 11/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index cc14afa65b9a6..9aa5abcf8a637 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -31,12 +31,12 @@ research and applications, including: NeMo uses `Hydra `_ for configuring both NeMo models and the PyTorch Lightning Trainer. Depending on the domain and application, many different AI libraries will have to be configured to build the application. Hydra makes it easy to bring all of these libraries together -and do all the configuration from .yaml or the Hydra CLI. +so that each can be configured from .yaml or the Hydra CLI. -.. note:: Every NeMo model has an example configuration and run script that contains all configuration needed for training. +.. note:: Every NeMo model has an example configuration file and run script that contains all configurations needed for training. The end result of using NeMo, Pytorch Lightning, and Hydra is that -NeMo models all have the same look and feel so that is easy to do Conversational AI research +NeMo models all have the same look and feel so that it is easy to do Conversational AI research across multiple domains and all NeMo models are fully compatible with the PyTorch ecosystem. Installing NeMo @@ -56,6 +56,19 @@ To install a specific branch from GitHub: .. note:: Replace {BRANCH} with the specific branch name from GitHub. +For Docker users, the NeMo container is available on +`NGC `_ + +.. code-block:: bash + + # TODO: update container tag when available + docker pull nvcr.io/nvidia/nemo:v0.11 + + +.. code-block:: bash + + docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:v0.11 + Example: Speech to Text (ASR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 3d684fd84e604552b786908b1f4b715e2580863a Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 14 Sep 2020 16:04:57 -0600 Subject: [PATCH 12/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 58 +++++++++++++++++++++++-------- 1 file changed, 44 insertions(+), 14 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 9aa5abcf8a637..560ea24e55f39 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -69,23 +69,30 @@ For Docker users, the NeMo container is available on docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:v0.11 -Example: Speech to Text (ASR) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Automatic Speech Recognition (ASR) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Everything needed to train Convolutional ASR models is included with NeMo. NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. These models can be trained from scratch on custom datasets or -pretrained checkpoints trained on thousands of hour of audio can be restored for +pretrained checkpoints trained on thousands of hours of audio that can be restored for immediate use. Some typical ASR tasks are included with NeMo: -- Audio transcription -- Speech Commands -- Voice Activity Detection -- Byte Pair/Word Piece Training +- `Audio transcription `_ +- `Byte Pair/Word Piece Training `_ +- `Speech Commands `_ +- `Voice Activity Detection `_ +- `Speaker Recognition `_ -Configurations are in .yaml files included with NeMo/examples +Specify Model Configurations with YAML File +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +See `here `_ +for the entire speech to text .yaml file. .. code-block:: yaml @@ -116,7 +123,10 @@ Configurations are in .yaml files included with NeMo/examples # all other configuration, data, optimizer, etc ... -The example speech-to-text script is just: +Developing ASR Model From Scratch +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`speech_to_text.py `_ .. code-block:: python @@ -126,10 +136,9 @@ The example speech-to-text script is just: asr_model = EncDecCTCModel(cfg.model, trainer) trainer.fit(asr_model) -.. note:: NeMo models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. -Hydra makes it so that every aspect of the NeMo model, -including the PyTorch Lightning Trainer can be modified from the command line. +Hydra makes every aspect of the NeMo model, +including the PyTorch Lightning Trainer, customizable from the command line. .. code-block:: bash @@ -146,14 +155,35 @@ including the PyTorch Lightning Trainer can be modified from the command line. .. note:: Training NeMo ASR models can take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. -Optionally launch Tensorboard to view training results +NeMo Experiment Manager +^^^^^^^^^^^^^^^^^^^^^^^ + +NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, +TensorBoard Logging and Weights and Biases logging. The Experiment Manager is included by default +in all NeMo example scripts. + +.. code-block:: python + + exp_manager(trainer, cfg.get("exp_manager", None)) + +And is configurable via .yaml with Hydra. + +.. code-block:: bash + + exp_manager: + exp_dir: null + name: *name + create_tensorboard_logger: True + create_checkpoint_callback: True + +Optionally launch Tensorboard to view training results in ./nemo_experiments (by default). .. code-block:: bash tensorboard --bind_all --logdir nemo_experiments -Transcribe audio with QuartzNet pretrained on 7000+ hours of audio. +Transcribe audio with QuartzNet pretrained on ~3300 hours of audio. .. code-block:: python From 492836e51bcfacd1c9f7a2e30ce97b98df3554fe Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 14 Sep 2020 19:08:21 -0600 Subject: [PATCH 13/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 560ea24e55f39..b35221c6fd61a 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -182,8 +182,10 @@ Optionally launch Tensorboard to view training results in ./nemo_experiments (by tensorboard --bind_all --logdir nemo_experiments +Using State-Of-The-Art Pre-trained Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Transcribe audio with QuartzNet pretrained on ~3300 hours of audio. +Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. .. code-block:: python @@ -194,8 +196,11 @@ Transcribe audio with QuartzNet pretrained on ~3300 hours of audio. for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}") +NeMo Model Under the Hood +^^^^^^^^^^^^^^^^^^^^^^^^^ + Any aspect of ASR training or model architecture design can easily be customized -since every NeMo model is a Lightning Module. +with PyTorch Lightning since every NeMo model is a Lightning Module. .. code-block:: python @@ -232,9 +237,12 @@ since every NeMo model is a Lightning Module. } return {'loss': loss_value, 'log': tensorboard_logs} +Neural Types in NeMo +^^^^^^^^^^^^^^^^^^^^ + Additionally, NeMo Models and Neural Modules come with Neural Type checking. -Neural type checking is extremely use when combining many different neural -network architectures for a production grade application. +Neural type checking is extremely useful when combining many different neural +network architectures for a production-grade application. .. code-block:: python @@ -256,5 +264,3 @@ network architectures for a production grade application. "encoded_lengths": NeuralType(tuple('B'), LengthsType()), "greedy_predictions": NeuralType(('B', 'T'), LabelsType()), } - - From 03b6346104210e1ac62230773b11acb13a53329b Mon Sep 17 00:00:00 2001 From: ericharper Date: Tue, 15 Sep 2020 09:28:39 -0600 Subject: [PATCH 14/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 47 +++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index b35221c6fd61a..e260385b5a293 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -52,7 +52,7 @@ To install a specific branch from GitHub: .. code-block:: bash - python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[nlp] + python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all] .. note:: Replace {BRANCH} with the specific branch name from GitHub. @@ -158,8 +158,8 @@ including the PyTorch Lightning Trainer, customizable from the command line. NeMo Experiment Manager ^^^^^^^^^^^^^^^^^^^^^^^ -NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, -TensorBoard Logging and Weights and Biases logging. The Experiment Manager is included by default +The Experiment Manager leverages PyTorch Lightning for model checkpointing, +TensorBoard Logging, and Weights and Biases logging. The Experiment Manager is included by default in all NeMo example scripts. .. code-block:: python @@ -264,3 +264,44 @@ network architectures for a production-grade application. "encoded_lengths": NeuralType(tuple('B'), LengthsType()), "greedy_predictions": NeuralType(('B', 'T'), LabelsType()), } + +Natural Language Processing (NLP) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Everything needed to train BERT based NLP models is included with NeMo. +NeMo supports language models from `HuggingFace Transformers `_ +and model parallel architectures from `NVIDIA Megatron-LM `_. + +With NeMo, any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks +that are included with NeMo: + +- `Glue Benchmark (All tasks) `_ +- Intent Slot Classification +- `Language Modeling (BERT Pretraining) `_ +- `Question Answering `_ +- Text Classification (including Sentiment Analysis) +- Token Classifcation +- `Punctuation and Capitalization `_ + + +Tokenizers +^^^^^^^^^^ + +Tokenization is the process of converting natural langauge text into integer arrays +which can be used for machine learning. +For NLP tasks, tokenization is an essential part of data preprocessing. +NeMo supports all BERT-like model tokenizers from +`HuggingFace's AutoTokenizer `_ +and also supports `Google's SentencePieceTokenizer `_ +which can be trained on custom data. + +To see the list of supported tokenizers: + +..code-block:: python + + from nemo.collections import nlp as nemo_nlp + + nemo_nlp.modules.get_tokenizer_list() + +See `here `_ +for a full tutorial on using tokenizers in NeMO. \ No newline at end of file From 7719148433552e20ccf2c77b68d819ea16bb1e3f Mon Sep 17 00:00:00 2001 From: ericharper Date: Tue, 15 Sep 2020 09:38:42 -0600 Subject: [PATCH 15/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index e260385b5a293..8dae8334d17b3 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -276,13 +276,15 @@ With NeMo, any of the HuggingFace encoders or Megatron-LM encoders can easily be that are included with NeMo: - `Glue Benchmark (All tasks) `_ -- Intent Slot Classification +- `Intent Slot Classification `_ - `Language Modeling (BERT Pretraining) `_ - `Question Answering `_ -- Text Classification (including Sentiment Analysis) -- Token Classifcation +- `Text Classification `_ (including Sentiment Analysis) +- `Token Classifcation `_ (including Named Entity Recognition) - `Punctuation and Capitalization `_ +Named Entity Recognition +^^^^^^^^^^^^^^^^^^^^^^^^ Tokenizers ^^^^^^^^^^ From bead0cc8199e9909412850015855b5135b11dde8 Mon Sep 17 00:00:00 2001 From: ericharper Date: Tue, 15 Sep 2020 10:18:51 -0600 Subject: [PATCH 16/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 159 +++++++++++++++++++++++++++++- 1 file changed, 154 insertions(+), 5 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 8dae8334d17b3..5e9febd8e5715 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -237,8 +237,8 @@ with PyTorch Lightning since every NeMo model is a Lightning Module. } return {'loss': loss_value, 'log': tensorboard_logs} -Neural Types in NeMo -^^^^^^^^^^^^^^^^^^^^ +Neural Types in NeMo ASR +^^^^^^^^^^^^^^^^^^^^^^^^ Additionally, NeMo Models and Neural Modules come with Neural Type checking. Neural type checking is extremely useful when combining many different neural @@ -283,8 +283,106 @@ that are included with NeMo: - `Token Classifcation `_ (including Named Entity Recognition) - `Punctuation and Capitalization `_ -Named Entity Recognition -^^^^^^^^^^^^^^^^^^^^^^^^ +Named Entity Recognition (NER) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text. +This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify +geographical, geopolitical, organizational, persons, events, and natural phenomenon entities. + +Specify NER Model Configurations with YAML File +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +..note NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +See `here `_ +for the entire NER (token classification) .yaml file. + +.. code-block:: yaml + + # configure any argument of the PyTorch Lightning Trainer + trainer: + gpus: 1 # the number of gpus, 0 for CPU + num_nodes: 1 + max_epochs: 5 + ... + # configure any aspect of the token classification model here + model: + dataset: + data_dir: ??? # /path/to/data + class_balancing: null # choose from [null, weighted_loss]. Weighted_loss enables the weighted class balancing of the loss, may be used for handling unbalanced classes + max_seq_length: 128 + ... + tokenizer: + tokenizer_name: ${model.language_model.pretrained_model_name} # or sentencepiece + vocab_file: null # path to vocab file + ... + # the language model can be from HuggingFace or Megatron-LM + language_model: + pretrained_model_name: bert-base-uncased + lm_checkpoint: null + ... + # the classifier for the downstream task + head: + num_fc_layers: 2 + fc_dropout: 0.5 + activation: 'relu' + ... + # all other configuration: train/val/test/ data, optimizer, experiment manager, etc + ... + +Developing NER Model From Scratch +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`token_classification.py `_ + +.. code-block:: python + + @hydra.main(config_path="conf", config_name="token_classification_config") + def main(cfg: DictConfig) -> None: + trainer = pl.Trainer(**cfg.trainer) + model = TokenClassificationModel(cfg.model, trainer=trainer) + trainer.fit(model) + +After training, we can do inference with the saved NER model using PyTorch Lightning. + +Inference from file: + +.. code-block:: python + + gpu = 1 if cfg.trainer.gpus != 0 else 0 + trainer = pl.Trainer(gpus=gpu) + model.set_trainer(trainer) + model.evaluate_from_file( + text_file=os.path.join(cfg.model.dataset.data_dir, cfg.model.validation_ds.text_file), + labels_file=os.path.join(cfg.model.dataset.data_dir, cfg.model.validation_ds.labels_file), + output_dir=exp_dir, + add_confusion_matrix=True, + normalize_confusion_matrix=True, + ) + +Or we can run inference on a few examples: + +..code-block:: python + + queries = ['we bought four shirts from the nvidia gear store in santa clara.', 'Nvidia is a company in Santa Clara.'] + results = model.add_predictions(queries) + + for query, result in zip(queries, results): + logging.info(f'Query : {query}') + logging.info(f'Result: {result.strip()}\n') + +Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trainer, customizable from the command line. + +.. code-block:: bash + + python token_classification.py \ + model.language_model.pretrained_model_name=bert-base-cased \ + model.head.num_fc_layers=2 \ + model.dataset.data_dir=/path/to/my/data \ + trainer.max_epochs=5 \ + trainer.gpus=[0,1] + Tokenizers ^^^^^^^^^^ @@ -306,4 +404,55 @@ To see the list of supported tokenizers: nemo_nlp.modules.get_tokenizer_list() See `here `_ -for a full tutorial on using tokenizers in NeMO. \ No newline at end of file +for a full tutorial on using tokenizers in NeMO. + +NeMo NER Model Under the Hood +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Any aspect of NLP training or model architecture design can easily be customized with PyTorch Lightning +since every NeMo model is a Lightning Module. + +.. code-block:: python + + class TokenClassificationModel(ModelPT): + """ + Token Classification Model with BERT, applicable for tasks such as Named Entity Recognition + """ + ... + @typecheck() + def forward(self, input_ids, token_type_ids, attention_mask): + hidden_states = self.bert_model( + input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask + ) + logits = self.classifier(hidden_states=hidden_states) + return logits + + def training_step(self, batch, batch_idx): + """ + Lightning calls this inside the training loop with the data from the training dataloader + passed in as `batch`. + """ + input_ids, input_type_ids, input_mask, subtokens_mask, loss_mask, labels = batch + logits = self(input_ids=input_ids, token_type_ids=input_type_ids, attention_mask=input_mask) + + loss = self.loss(logits=logits, labels=labels, loss_mask=loss_mask) + tensorboard_logs = {'train_loss': loss, 'lr': self._optimizer.param_groups[0]['lr']} + return {'loss': loss, 'log': tensorboard_logs} + ... + +Neural Types in NeMo NLP +^^^^^^^^^^^^^^^^^^^^^^^^ + +Additionally, NeMo Models and Neural Modules come with Neural Type checking. +Neural type checking is extremely useful when combining many different neural network architectures +for a production-grade application. + +.. code-block:: python + + @property + def input_types(self) -> Optional[Dict[str, NeuralType]]: + return self.bert_model.input_types + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + return self.classifier.output_types \ No newline at end of file From ef533f99514ec99daf3a082db5ca515cd44d3bc6 Mon Sep 17 00:00:00 2001 From: ericharper Date: Tue, 15 Sep 2020 10:54:30 -0600 Subject: [PATCH 17/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 5e9febd8e5715..0275a75297fa6 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -363,7 +363,7 @@ Inference from file: Or we can run inference on a few examples: -..code-block:: python +.. code-block:: python queries = ['we bought four shirts from the nvidia gear store in santa clara.', 'Nvidia is a company in Santa Clara.'] results = model.add_predictions(queries) @@ -397,7 +397,7 @@ which can be trained on custom data. To see the list of supported tokenizers: -..code-block:: python +.. code-block:: python from nemo.collections import nlp as nemo_nlp From c75727bacb7880e0afce5c49de10fc01f987c3c7 Mon Sep 17 00:00:00 2001 From: ericharper Date: Tue, 15 Sep 2020 16:28:28 -0600 Subject: [PATCH 18/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 0275a75297fa6..6078ca03f2da4 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -1,5 +1,5 @@ Conversational AI ------------------ +================= NVIDIA NeMo Models ^^^^^^^^^^^^^^^^^^ @@ -146,7 +146,6 @@ including the PyTorch Lightning Trainer, customizable from the command line. trainer.gpus=4 \ trainer.max_epochs=128 \ +trainer.precision=16 \ - +trainer.amp_level=O1 \ model.train_ds.manifest_filepath=/librispeech-train-all.json \ model.validation_ds.manifest_filepath=/librispeech-dev-other.json \ model.train_ds.batch_size=64 \ @@ -455,4 +454,8 @@ for a production-grade application. @property def output_types(self) -> Optional[Dict[str, NeuralType]]: - return self.classifier.output_types \ No newline at end of file + return self.classifier.output_types + +Text-To-Speech (TTS) +^^^^^^^^^^^^^^^^^^^^ + From dba591154b22396cb54e8670863eea8207f8f8f0 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 12:39:31 -0600 Subject: [PATCH 19/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 155 ++++++++++++++++++++++++++---- 1 file changed, 136 insertions(+), 19 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 6078ca03f2da4..582d40d5013fa 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -86,8 +86,8 @@ Some typical ASR tasks are included with NeMo: - `Voice Activity Detection `_ - `Speaker Recognition `_ -Specify Model Configurations with YAML File -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Specify ASR Model Configurations with YAML File +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. @@ -105,23 +105,25 @@ for the entire speech to text .yaml file. distributed_backend: ddp ... # configure the ASR model - encoder: - _target_: nemo.collections.asr.modules.ConvASREncoder - params: - feat_in: *n_mels - activation: relu - conv_mask: true - - jasper: - - filters: 128 - repeat: 1 - kernel: [11] - stride: [1] - dilation: [1] - dropout: *dropout - ... - # all other configuration, data, optimizer, etc - ... + model: + ... + encoder: + _target_: nemo.collections.asr.modules.ConvASREncoder + params: + feat_in: *n_mels + activation: relu + conv_mask: true + + jasper: + - filters: 128 + repeat: 1 + kernel: [11] + stride: [1] + dilation: [1] + dropout: *dropout + ... + # all other configuration, data, optimizer, preprocessor, etc + ... Developing ASR Model From Scratch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -459,3 +461,118 @@ for a production-grade application. Text-To-Speech (TTS) ^^^^^^^^^^^^^^^^^^^^ +Everything needed to train TTS models and generate audio is included with NeMo. +Models can be trained from scratch on your own data or pretrained models can be downloaded +automatically. NeMo currently supports: + +Mel Spectogram Generators: + +- `Tacotron 2 `_ +- `Glow-TTS `_ + +Audio Generators: + +- Griffin-Lim +- `WaveGlow `_ +- `SqueezeWave `_ + +Specify TTS Model Configurations with YAML File +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +..note NeMo Models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +`glow_tts.yaml `_ + +.. code-block:: yaml + + # configure the PyTorch Lightning Trainer + trainer: + gpus: -1 # number of gpus + max_epochs: 350 + num_nodes: 1 + distributed_backend: ddp + ... + + # configure the TTS model + model: + ... + encoder: + _target_: nemo.collections.tts.modules.glow_tts.TextEncoder + params: + n_vocab: 148 + out_channels: *n_mels + hidden_channels: 192 + filter_channels: 768 + filter_channels_dp: 256 + ... + # all other configuration, data, optimizer, parser, preprocessor, etc + ... + +Developing TTS Model From Scratch +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`glow_tts.py `_ + +.. code-block:: python + + @hydra.main(config_path="conf", config_name="glow_tts") + def main(cfg): + trainer = pl.Trainer(**cfg.trainer) + model = GlowTTSModel(cfg=cfg.model, trainer=trainer) + trainer.fit(model) + +Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trainer, customizable from the command line. + +.. code-block:: bash + + python NeMo/examples/tts/glow_tts.py \ + trainer.gpus=4 \ + trainer.max_epochs=400 \ + ... + train_dataset=/path/to/train/data \ + validation_datasets=/path/to/val/data \ + model.train_ds.batch_size = 64 \ + +..note Training NeMo TTTs models from scratch take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. + +Using State-Of-The-Art Pre-trained TTS Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Generate speech using models trained on `LJSpeech `, +around 24 hours of single speaker data. + +.. code-block:: python + + # load pretrained spectrogram model + spec_gen = SpecModel.from_pretrained('GlowTTS-22050Hz').cuda() + + # load pretrained Generators + vocoder = WaveGlowModel.from_pretrained('WaveGlow-22050Hz').cuda() + + def infer(spec_gen_model, vocder_model, str_input): + with torch.no_grad(): + parsed = spec_gen.parse(text_to_generate) + spectrogram = spec_gen.generate_spectrogram(tokens=parsed) + audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) + if isinstance(spectrogram, torch.Tensor): + spectrogram = spectrogram.to('cpu').numpy() + if len(spectrogram.shape) == 3: + spectrogram = spectrogram[0] + if isinstance(audio, torch.Tensor): + audio = audio.to('cpu').numpy() + return spectrogram, audio + + text_to_generate = input("Input what you want the model to say: ") + spec, audio = infer(spec_gen, vocoder, text_to_generate) + +NeMo TTS Model Under the Hood +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + + + + + + + + From 4bcefcf1d333d5ae88ce8c99f1369c755cdb29bd Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 14:27:09 -0600 Subject: [PATCH 20/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 114 +++++++++++++++++++++++++++++- 1 file changed, 112 insertions(+), 2 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 582d40d5013fa..2cfb0c9fbfeee 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -86,6 +86,9 @@ Some typical ASR tasks are included with NeMo: - `Voice Activity Detection `_ - `Speaker Recognition `_ +See `here `_ +for a full tutorial on doing ASR with NeMo, PyTorch Lightning, and Hydra. + Specify ASR Model Configurations with YAML File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -183,8 +186,8 @@ Optionally launch Tensorboard to view training results in ./nemo_experiments (by tensorboard --bind_all --logdir nemo_experiments -Using State-Of-The-Art Pre-trained Model -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Using State-Of-The-Art Pre-trained ASR Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. @@ -290,6 +293,8 @@ Named Entity Recognition (NER) NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text. This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify geographical, geopolitical, organizational, persons, events, and natural phenomenon entities. +See `here `_ +for a full tutorial on doing NER with NeMo, PyTorch Lightning, and Hydra. Specify NER Model Configurations with YAML File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -407,6 +412,35 @@ To see the list of supported tokenizers: See `here `_ for a full tutorial on using tokenizers in NeMO. + +Using a Pre-trained NER Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +NeMo has pre-trained NER models that can be used to get +to get started with Token Classification right away. +Models are automatically downloaded from NGC, +cached locally to disk, +and loaded into GPU memory using the `.from_pretrained` method. + +.. code-block:: python + + # load pre-trained NER model + pretrained_ner_model = TokenClassificationModel.from_pretrained(model_name="NERModel") + + # define the list of queries for inference + queries = [ + 'we bought four shirts from the nvidia gear store in santa clara.', + 'Nvidia is a company.', + 'The Adventures of Tom Sawyer by Mark Twain is an 1876 novel about a young boy growing ' + + 'up along the Mississippi River.', + ] + results = pretrained_ner_model.add_predictions(queries) + + for query, result in zip(queries, results): + print() + print(f'Query : {query}') + print(f'Result: {result.strip()}\n') + NeMo NER Model Under the Hood ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -428,6 +462,7 @@ since every NeMo model is a Lightning Module. logits = self.classifier(hidden_states=hidden_states) return logits + # PTL-specfic methods def training_step(self, batch, batch_idx): """ Lightning calls this inside the training loop with the data from the training dataloader @@ -476,6 +511,7 @@ Audio Generators: - `WaveGlow `_ - `SqueezeWave `_ + Specify TTS Model Configurations with YAML File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -541,6 +577,9 @@ Using State-Of-The-Art Pre-trained TTS Model Generate speech using models trained on `LJSpeech `, around 24 hours of single speaker data. +See `here `_ +for a full tutorial on generating speech with NeMo, PyTorch Lightning, and Hydra. + .. code-block:: python # load pretrained spectrogram model @@ -568,7 +607,78 @@ around 24 hours of single speaker data. NeMo TTS Model Under the Hood ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Any aspect of TTS training or model architecture design can easily +be customized with PyTorch Lightning since every NeMo model is a LightningModule. + +`glow_tts.py `_ + +.. code-block:: python + +class GlowTTSModel(SpectrogramGenerator): + """ + GlowTTS model used to generate spectrograms from text + Consists of a text encoder and an invertible spectrogram decoder + """ + ... + # NeMo models come with neural type checking + @typecheck( + input_types={ + "x": NeuralType(('B', 'T'), TokenIndex()), + "x_lengths": NeuralType(('B'), LengthsType()), + "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), + "y_lengths": NeuralType(('B'), LengthsType(), optional=True), + "gen": NeuralType(optional=True), + "noise_scale": NeuralType(optional=True), + "length_scale": NeuralType(optional=True), + } + ) + def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): + if gen: + return self.glow_tts.generate_spect( + text=x, text_lengths=x_lengths, noise_scale=noise_scale, length_scale=length_scale + ) + else: + return self.glow_tts(text=x, text_lengths=x_lengths, spect=y, spect_lengths=y_lengths) + ... + def step(self, y, y_lengths, x, x_lengths): + z, y_m, y_logs, logdet, logw, logw_, y_lengths, attn = self( + x=x, x_lengths=x_lengths, y=y, y_lengths=y_lengths, gen=False + ) + + l_mle, l_length, logdet = self.loss( + z=z, + y_m=y_m, + y_logs=y_logs, + logdet=logdet, + logw=logw, + logw_=logw_, + x_lengths=x_lengths, + y_lengths=y_lengths, + ) + + loss = sum([l_mle, l_length]) + + return l_mle, l_length, logdet, loss, attn + + # PTL-specfic methods + def training_step(self, batch, batch_idx): + y, y_lengths, x, x_lengths = batch + + y, y_lengths = self.preprocessor(input_signal=y, length=y_lengths) + + l_mle, l_length, logdet, loss, _ = self.step(y, y_lengths, x, x_lengths) + + output = { + "loss": loss, # required + "progress_bar": {"l_mle": l_mle, "l_length": l_length, "logdet": logdet}, + "log": {"loss": loss, "l_mle": l_mle, "l_length": l_length, "logdet": logdet}, + } + + return output + ... + Neural Types in NeMo TTS + ^^^^^^^^^^^^^^^^^^^^^^^^ From c781eca47a65225ec4a2585243ee590337ed5c7e Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 14:31:37 -0600 Subject: [PATCH 21/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 2cfb0c9fbfeee..d16bd5d6b585a 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -244,7 +244,7 @@ with PyTorch Lightning since every NeMo model is a Lightning Module. Neural Types in NeMo ASR ^^^^^^^^^^^^^^^^^^^^^^^^ -Additionally, NeMo Models and Neural Modules come with Neural Type checking. +NeMo Models and Neural Modules come with Neural Type checking. Neural type checking is extremely useful when combining many different neural network architectures for a production-grade application. @@ -479,7 +479,7 @@ since every NeMo model is a Lightning Module. Neural Types in NeMo NLP ^^^^^^^^^^^^^^^^^^^^^^^^ -Additionally, NeMo Models and Neural Modules come with Neural Type checking. +NeMo Models and Neural Modules come with Neural Type checking. Neural type checking is extremely useful when combining many different neural network architectures for a production-grade application. @@ -680,6 +680,25 @@ class GlowTTSModel(SpectrogramGenerator): Neural Types in NeMo TTS ^^^^^^^^^^^^^^^^^^^^^^^^ + NeMo Models and Neural Modules come with Neural Type checking. + Neural type checking is extremely useful when combining many different neural network architectures + for a production-grade application. + + .. code-block:: python + + @typecheck( + input_types={ + "x": NeuralType(('B', 'T'), TokenIndex()), + "x_lengths": NeuralType(('B'), LengthsType()), + "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), + "y_lengths": NeuralType(('B'), LengthsType(), optional=True), + "gen": NeuralType(optional=True), + "noise_scale": NeuralType(optional=True), + "length_scale": NeuralType(optional=True), + } + ) + def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): + ... From 7d4b332577e2a69201bee2b1234e30c32e254c76 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 14:37:25 -0600 Subject: [PATCH 22/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 144 +++++++++++++++--------------- 1 file changed, 72 insertions(+), 72 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index d16bd5d6b585a..430eed473c7d6 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -614,91 +614,91 @@ be customized with PyTorch Lightning since every NeMo model is a LightningModule .. code-block:: python -class GlowTTSModel(SpectrogramGenerator): - """ - GlowTTS model used to generate spectrograms from text - Consists of a text encoder and an invertible spectrogram decoder - """ - ... - # NeMo models come with neural type checking - @typecheck( - input_types={ - "x": NeuralType(('B', 'T'), TokenIndex()), - "x_lengths": NeuralType(('B'), LengthsType()), - "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), - "y_lengths": NeuralType(('B'), LengthsType(), optional=True), - "gen": NeuralType(optional=True), - "noise_scale": NeuralType(optional=True), - "length_scale": NeuralType(optional=True), - } - ) - def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): - if gen: - return self.glow_tts.generate_spect( - text=x, text_lengths=x_lengths, noise_scale=noise_scale, length_scale=length_scale - ) - else: - return self.glow_tts(text=x, text_lengths=x_lengths, spect=y, spect_lengths=y_lengths) - ... - def step(self, y, y_lengths, x, x_lengths): - z, y_m, y_logs, logdet, logw, logw_, y_lengths, attn = self( - x=x, x_lengths=x_lengths, y=y, y_lengths=y_lengths, gen=False + class GlowTTSModel(SpectrogramGenerator): + """ + GlowTTS model used to generate spectrograms from text + Consists of a text encoder and an invertible spectrogram decoder + """ + ... + # NeMo models come with neural type checking + @typecheck( + input_types={ + "x": NeuralType(('B', 'T'), TokenIndex()), + "x_lengths": NeuralType(('B'), LengthsType()), + "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), + "y_lengths": NeuralType(('B'), LengthsType(), optional=True), + "gen": NeuralType(optional=True), + "noise_scale": NeuralType(optional=True), + "length_scale": NeuralType(optional=True), + } ) + def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): + if gen: + return self.glow_tts.generate_spect( + text=x, text_lengths=x_lengths, noise_scale=noise_scale, length_scale=length_scale + ) + else: + return self.glow_tts(text=x, text_lengths=x_lengths, spect=y, spect_lengths=y_lengths) + ... + def step(self, y, y_lengths, x, x_lengths): + z, y_m, y_logs, logdet, logw, logw_, y_lengths, attn = self( + x=x, x_lengths=x_lengths, y=y, y_lengths=y_lengths, gen=False + ) - l_mle, l_length, logdet = self.loss( - z=z, - y_m=y_m, - y_logs=y_logs, - logdet=logdet, - logw=logw, - logw_=logw_, - x_lengths=x_lengths, - y_lengths=y_lengths, - ) + l_mle, l_length, logdet = self.loss( + z=z, + y_m=y_m, + y_logs=y_logs, + logdet=logdet, + logw=logw, + logw_=logw_, + x_lengths=x_lengths, + y_lengths=y_lengths, + ) - loss = sum([l_mle, l_length]) + loss = sum([l_mle, l_length]) - return l_mle, l_length, logdet, loss, attn + return l_mle, l_length, logdet, loss, attn - # PTL-specfic methods - def training_step(self, batch, batch_idx): - y, y_lengths, x, x_lengths = batch + # PTL-specfic methods + def training_step(self, batch, batch_idx): + y, y_lengths, x, x_lengths = batch - y, y_lengths = self.preprocessor(input_signal=y, length=y_lengths) + y, y_lengths = self.preprocessor(input_signal=y, length=y_lengths) - l_mle, l_length, logdet, loss, _ = self.step(y, y_lengths, x, x_lengths) + l_mle, l_length, logdet, loss, _ = self.step(y, y_lengths, x, x_lengths) - output = { - "loss": loss, # required - "progress_bar": {"l_mle": l_mle, "l_length": l_length, "logdet": logdet}, - "log": {"loss": loss, "l_mle": l_mle, "l_length": l_length, "logdet": logdet}, - } + output = { + "loss": loss, # required + "progress_bar": {"l_mle": l_mle, "l_length": l_length, "logdet": logdet}, + "log": {"loss": loss, "l_mle": l_mle, "l_length": l_length, "logdet": logdet}, + } - return output - ... + return output + ... - Neural Types in NeMo TTS - ^^^^^^^^^^^^^^^^^^^^^^^^ +Neural Types in NeMo TTS +^^^^^^^^^^^^^^^^^^^^^^^^ - NeMo Models and Neural Modules come with Neural Type checking. - Neural type checking is extremely useful when combining many different neural network architectures - for a production-grade application. +NeMo Models and Neural Modules come with Neural Type checking. +Neural type checking is extremely useful when combining many different neural network architectures +for a production-grade application. - .. code-block:: python +.. code-block:: python - @typecheck( - input_types={ - "x": NeuralType(('B', 'T'), TokenIndex()), - "x_lengths": NeuralType(('B'), LengthsType()), - "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), - "y_lengths": NeuralType(('B'), LengthsType(), optional=True), - "gen": NeuralType(optional=True), - "noise_scale": NeuralType(optional=True), - "length_scale": NeuralType(optional=True), - } - ) - def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): - ... + @typecheck( + input_types={ + "x": NeuralType(('B', 'T'), TokenIndex()), + "x_lengths": NeuralType(('B'), LengthsType()), + "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), + "y_lengths": NeuralType(('B'), LengthsType(), optional=True), + "gen": NeuralType(optional=True), + "noise_scale": NeuralType(optional=True), + "length_scale": NeuralType(optional=True), + } + ) + def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): + ... From 721147885806add5d061595efc69748786a8600c Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 14:47:41 -0600 Subject: [PATCH 23/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 6 ++++-- docs/source/index.rst | 13 +++++++------ 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 430eed473c7d6..7b1ce6b744b0a 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -1,5 +1,7 @@ -Conversational AI -================= +.._ NeMo: + +NVIDIA NeMo +=========== NVIDIA NeMo Models ^^^^^^^^^^^^^^^^^^ diff --git a/docs/source/index.rst b/docs/source/index.rst index 50a6fe2c06cc9..bf7f9c11f8667 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -6,12 +6,6 @@ PyTorch Lightning Documentation =============================== -.. toctree:: - :maxdepth: 1 - :name: conversational_ai - :caption: Conversational AI - - conversational_ai .. toctree:: :maxdepth: 1 @@ -113,6 +107,13 @@ PyTorch Lightning Documentation test_set production_inference +.. toctree:: + :maxdepth: 1 + :name: conversational_ai + :caption: Conversational AI + + conversational_ai + .. toctree:: :maxdepth: 1 :name: community From c4af30855e934f66e1a8a13842ddff546458947a Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 15:09:34 -0600 Subject: [PATCH 24/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 83 ++++++++++++++++--------------- docs/source/index.rst | 2 +- 2 files changed, 44 insertions(+), 41 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 7b1ce6b744b0a..ed83466290097 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -1,10 +1,10 @@ -.._ NeMo: +NeMo +==== -NVIDIA NeMo -=========== +---------- NVIDIA NeMo Models -^^^^^^^^^^^^^^^^^^ +------------------ `NVIDIA NeMo `_ is a toolkit for building Conversational AI applications. NeMo has separate collections for Automatic Speech Recognition (ASR), @@ -71,8 +71,37 @@ For Docker users, the NeMo container is available on docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:v0.11 +Experiment Manager +^^^^^^^^^^^^^^^^^^ + +NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, +TensorBoard Logging, and Weights and Biases logging. The Experiment Manager is included by default +in all NeMo example scripts. + +.. code-block:: python + + exp_manager(trainer, cfg.get("exp_manager", None)) + +And is configurable via .yaml with Hydra. + +.. code-block:: bash + + exp_manager: + exp_dir: null + name: *name + create_tensorboard_logger: True + create_checkpoint_callback: True + +Optionally launch Tensorboard to view training results in ./nemo_experiments (by default). + +.. code-block:: bash + + tensorboard --bind_all --logdir nemo_experiments + +-------- + Automatic Speech Recognition (ASR) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------------- Everything needed to train Convolutional ASR models is included with NeMo. NeMo supports multiple Speech Recognition architectures, including Jasper @@ -161,32 +190,6 @@ including the PyTorch Lightning Trainer, customizable from the command line. .. note:: Training NeMo ASR models can take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. -NeMo Experiment Manager -^^^^^^^^^^^^^^^^^^^^^^^ - -The Experiment Manager leverages PyTorch Lightning for model checkpointing, -TensorBoard Logging, and Weights and Biases logging. The Experiment Manager is included by default -in all NeMo example scripts. - -.. code-block:: python - - exp_manager(trainer, cfg.get("exp_manager", None)) - -And is configurable via .yaml with Hydra. - -.. code-block:: bash - - exp_manager: - exp_dir: null - name: *name - create_tensorboard_logger: True - create_checkpoint_callback: True - -Optionally launch Tensorboard to view training results in ./nemo_experiments (by default). - -.. code-block:: bash - - tensorboard --bind_all --logdir nemo_experiments Using State-Of-The-Art Pre-trained ASR Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -202,8 +205,8 @@ Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}") -NeMo Model Under the Hood -^^^^^^^^^^^^^^^^^^^^^^^^^ +NeMo ASR Model Under the Hood +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Any aspect of ASR training or model architecture design can easily be customized with PyTorch Lightning since every NeMo model is a Lightning Module. @@ -271,8 +274,10 @@ network architectures for a production-grade application. "greedy_predictions": NeuralType(('B', 'T'), LabelsType()), } +-------- + Natural Language Processing (NLP) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +--------------------------------- Everything needed to train BERT based NLP models is included with NeMo. NeMo supports language models from `HuggingFace Transformers `_ @@ -495,8 +500,10 @@ for a production-grade application. def output_types(self) -> Optional[Dict[str, NeuralType]]: return self.classifier.output_types +-------- + Text-To-Speech (TTS) -^^^^^^^^^^^^^^^^^^^^ +-------------------- Everything needed to train TTS models and generate audio is included with NeMo. Models can be trained from scratch on your own data or pretrained models can be downloaded @@ -702,8 +709,4 @@ for a production-grade application. def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): ... - - - - - +-------- diff --git a/docs/source/index.rst b/docs/source/index.rst index bf7f9c11f8667..9fc966ef0de28 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -109,7 +109,7 @@ PyTorch Lightning Documentation .. toctree:: :maxdepth: 1 - :name: conversational_ai + :name: Conversational AI :caption: Conversational AI conversational_ai From 3dfd02ad68f7d8cbda4e2c445d22af6c0b11799a Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 16 Sep 2020 15:24:16 -0600 Subject: [PATCH 25/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index ed83466290097..0e56a7340976e 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -1,11 +1,6 @@ NeMo ==== ----------- - -NVIDIA NeMo Models ------------------- - `NVIDIA NeMo `_ is a toolkit for building Conversational AI applications. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of @@ -19,6 +14,11 @@ mixed-precision training. .. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. +---------- + +NeMo Models +----------- + NeMo Models contain everything needed to train and reproduce state of the art Conversational AI research and applications, including: @@ -117,7 +117,7 @@ Some typical ASR tasks are included with NeMo: - `Voice Activity Detection `_ - `Speaker Recognition `_ -See `here `_ +See this `asr notebook `_ for a full tutorial on doing ASR with NeMo, PyTorch Lightning, and Hydra. Specify ASR Model Configurations with YAML File @@ -125,7 +125,7 @@ Specify ASR Model Configurations with YAML File NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. -See `here `_ +See this `asr config `_ for the entire speech to text .yaml file. .. code-block:: yaml @@ -300,7 +300,7 @@ Named Entity Recognition (NER) NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text. This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify geographical, geopolitical, organizational, persons, events, and natural phenomenon entities. -See `here `_ +See this `NER notebook `_ for a full tutorial on doing NER with NeMo, PyTorch Lightning, and Hydra. Specify NER Model Configurations with YAML File @@ -308,7 +308,7 @@ Specify NER Model Configurations with YAML File ..note NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. -See `here `_ +See this `token classification config `_ for the entire NER (token classification) .yaml file. .. code-block:: yaml @@ -416,7 +416,7 @@ To see the list of supported tokenizers: nemo_nlp.modules.get_tokenizer_list() -See `here `_ +See this `tokenizer notebook `_ for a full tutorial on using tokenizers in NeMO. @@ -526,7 +526,7 @@ Specify TTS Model Configurations with YAML File ..note NeMo Models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. -`glow_tts.yaml `_ +`tts/conf/glow_tts.yaml `_ .. code-block:: yaml @@ -556,7 +556,7 @@ Specify TTS Model Configurations with YAML File Developing TTS Model From Scratch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -`glow_tts.py `_ +`tts/glow_tts.py `_ .. code-block:: python @@ -586,7 +586,7 @@ Using State-Of-The-Art Pre-trained TTS Model Generate speech using models trained on `LJSpeech `, around 24 hours of single speaker data. -See `here `_ +See this `TTS notebook `_ for a full tutorial on generating speech with NeMo, PyTorch Lightning, and Hydra. .. code-block:: python @@ -708,5 +708,3 @@ for a production-grade application. ) def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): ... - --------- From e3c90004de3c8a23c9a94af11c264971888cf345 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 10:31:56 -0600 Subject: [PATCH 26/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 0e56a7340976e..a51ae0f02654a 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -35,7 +35,7 @@ Depending on the domain and application, many different AI libraries will have t to build the application. Hydra makes it easy to bring all of these libraries together so that each can be configured from .yaml or the Hydra CLI. -.. note:: Every NeMo model has an example configuration file and run script that contains all configurations needed for training. +.. note:: Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training. The end result of using NeMo, Pytorch Lightning, and Hydra is that NeMo models all have the same look and feel so that it is easy to do Conversational AI research @@ -50,13 +50,12 @@ Installing the latest NeMo release is a simple pip install. pip install nemo_toolkit[all] -To install a specific branch from GitHub: +To install the main branch from GitHub: .. code-block:: bash - python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all] + python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all] -.. note:: Replace {BRANCH} with the specific branch name from GitHub. For Docker users, the NeMo container is available on `NGC `_ @@ -106,8 +105,8 @@ Automatic Speech Recognition (ASR) Everything needed to train Convolutional ASR models is included with NeMo. NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. These models can be trained from scratch on custom datasets or -pretrained checkpoints trained on thousands of hours of audio that can be restored for -immediate use. +finetuned using pretrained checkpoints trained on thousands of hours of audio +that can be restored for immediate use. Some typical ASR tasks are included with NeMo: @@ -166,7 +165,9 @@ Developing ASR Model From Scratch .. code-block:: python - @hydra.main(config_name="config") + # TODO: add comment explaining hydra_runner + # hydra_runner calls hydra.main and is useful for multi-node experiments + @hydra_runner(config_path="conf", config_name="config") def main(cfg): trainer = Trainer(**cfg.trainer) asr_model = EncDecCTCModel(cfg.model, trainer) @@ -200,7 +201,7 @@ Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. quartznet = EncDecCTCModel.from_pretrained('QuartzNet15x5Base-En') - files = ['path/to/my.wav'] # file should be less than 25 seconds + files = ['path/to/my.wav'] # file duration should be less than 25 seconds for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}") @@ -279,9 +280,10 @@ network architectures for a production-grade application. Natural Language Processing (NLP) --------------------------------- -Everything needed to train BERT based NLP models is included with NeMo. +Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo. NeMo supports language models from `HuggingFace Transformers `_ -and model parallel architectures from `NVIDIA Megatron-LM `_. +and `NVIDIA Megatron-LM `_. +NeMo can also be used for pretraining BERT-based language models from HuggingFace. With NeMo, any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks that are included with NeMo: From e49401ec738ef7c0a8d6a1b6cd6feb653b11a149 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 10:46:32 -0600 Subject: [PATCH 27/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index a51ae0f02654a..61fce8ec18079 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -282,10 +282,10 @@ Natural Language Processing (NLP) Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo. NeMo supports language models from `HuggingFace Transformers `_ -and `NVIDIA Megatron-LM `_. +and `NVIDIA Megatron-LM `_ BERT and Bio-Megatron models. NeMo can also be used for pretraining BERT-based language models from HuggingFace. -With NeMo, any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks +Any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks that are included with NeMo: - `Glue Benchmark (All tasks) `_ @@ -353,7 +353,8 @@ Developing NER Model From Scratch .. code-block:: python - @hydra.main(config_path="conf", config_name="token_classification_config") + # TODO: add comment explaining hydra_runner + @hydra_runner(config_path="conf", config_name="token_classification_config") def main(cfg: DictConfig) -> None: trainer = pl.Trainer(**cfg.trainer) model = TokenClassificationModel(cfg.model, trainer=trainer) @@ -425,7 +426,7 @@ for a full tutorial on using tokenizers in NeMO. Using a Pre-trained NER Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -NeMo has pre-trained NER models that can be used to get +NeMo has pre-trained NER models that can be used to get started with Token Classification right away. Models are automatically downloaded from NGC, cached locally to disk, @@ -562,7 +563,8 @@ Developing TTS Model From Scratch .. code-block:: python - @hydra.main(config_path="conf", config_name="glow_tts") + # TODO: add comment explaining hydra_runner + @hydra_runner(config_path="conf", config_name="glow_tts") def main(cfg): trainer = pl.Trainer(**cfg.trainer) model = GlowTTSModel(cfg=cfg.model, trainer=trainer) From 8ce740d28b0e68b91a5b665dd75e078f7dd4bd1e Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 10:53:16 -0600 Subject: [PATCH 28/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 61fce8ec18079..a88a09df3258a 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -510,7 +510,9 @@ Text-To-Speech (TTS) Everything needed to train TTS models and generate audio is included with NeMo. Models can be trained from scratch on your own data or pretrained models can be downloaded -automatically. NeMo currently supports: +automatically. NeMo currently supports a two step inference procedure. +First, a model is used to generate a mel spectrogram from text. +Second, a model is used to generate audio from a mel spectrogram. Mel Spectogram Generators: From f22314bf044f1c43256aa97acbd143971b8953f5 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 13:39:18 -0600 Subject: [PATCH 29/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index a88a09df3258a..2b74d84abbe1f 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -165,7 +165,6 @@ Developing ASR Model From Scratch .. code-block:: python - # TODO: add comment explaining hydra_runner # hydra_runner calls hydra.main and is useful for multi-node experiments @hydra_runner(config_path="conf", config_name="config") def main(cfg): @@ -353,7 +352,7 @@ Developing NER Model From Scratch .. code-block:: python - # TODO: add comment explaining hydra_runner + # hydra_runner calls hydra.main and is useful for multi-node experiments @hydra_runner(config_path="conf", config_name="token_classification_config") def main(cfg: DictConfig) -> None: trainer = pl.Trainer(**cfg.trainer) @@ -565,7 +564,7 @@ Developing TTS Model From Scratch .. code-block:: python - # TODO: add comment explaining hydra_runner + # hydra_runner calls hydra.main and is useful for multi-node experiments @hydra_runner(config_path="conf", config_name="glow_tts") def main(cfg): trainer = pl.Trainer(**cfg.trainer) From 6a96c11bd71f035c19c6a247fa75d1ccb8a5aa0d Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 14:16:49 -0600 Subject: [PATCH 30/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 2b74d84abbe1f..6de77f613c6b9 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -419,8 +419,33 @@ To see the list of supported tokenizers: nemo_nlp.modules.get_tokenizer_list() See this `tokenizer notebook `_ -for a full tutorial on using tokenizers in NeMO. +for a full tutorial on using tokenizers in NeMo. +Language Models +^^^^^^^^^^^^^^^ + +Language models are used to extract information from (tokenized) text. +Much of the state-of-the-art in natural language processing is achieved +by fine-tuning pretrained language models on the downstream task. + +With NeMo, you can either `pretrain `_ +a BERT model on your data or use a pretrained lanugage model from `HuggingFace Transformers `_ +or `NVIDIA Megatron-LM `_. + +To see the list of language models available in NeMo: + +.. code-block:: python + + nemo_nlp.modules.get_pretrained_lm_models_list(include_external=True) + +Easily switch between any language model in the above list by using `.get_lm_model`. + +.. code-block:: python + + nemo_nlp.modules.get_lm_model(pretrained_model_name='distilbert-base-uncased') + +See this `language model notebook `_ +for a full tutorial on using pretrained language models in NeMo. Using a Pre-trained NER Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 057de92786ec1d5414dba51b905ea7cd97f4849c Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 17:33:08 -0600 Subject: [PATCH 31/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 6de77f613c6b9..c87fc9392ac08 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -4,7 +4,7 @@ NeMo `NVIDIA NeMo `_ is a toolkit for building Conversational AI applications. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of -prebuilt modules that include everything needed to train on your own data. +prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create complex Conversational AI applications. @@ -23,7 +23,7 @@ NeMo Models contain everything needed to train and reproduce state of the art Co research and applications, including: - neural network architectures -- datasets/dataloaders +- datasets/data loaders - data preprocessing/postprocessing - data augmentors - optimizers and schedulers @@ -38,8 +38,8 @@ so that each can be configured from .yaml or the Hydra CLI. .. note:: Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training. The end result of using NeMo, Pytorch Lightning, and Hydra is that -NeMo models all have the same look and feel so that it is easy to do Conversational AI research -across multiple domains and all NeMo models are fully compatible with the PyTorch ecosystem. +NeMo models all have the same look and feel. This makes it easy to do Conversational AI research +across multiple domains. NeMo models are also fully compatible with the PyTorch ecosystem. Installing NeMo ^^^^^^^^^^^^^^^ From 921b8ba8fa1c0756b1b04179c8ed164cba01cd9d Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 17:42:25 -0600 Subject: [PATCH 32/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index c87fc9392ac08..bba6b68fc076d 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -105,7 +105,7 @@ Automatic Speech Recognition (ASR) Everything needed to train Convolutional ASR models is included with NeMo. NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. These models can be trained from scratch on custom datasets or -finetuned using pretrained checkpoints trained on thousands of hours of audio +fine-tuned using pre-trained checkpoints trained on thousands of hours of audio that can be restored for immediate use. Some typical ASR tasks are included with NeMo: @@ -205,6 +205,12 @@ Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}") +To see the available pretrained checkpoints: + +.. code-block python + + EncDecCTCModel.list_available_models() + NeMo ASR Model Under the Hood ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 0ebbc072652662ef3c9c61422d148c042e02c375 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 17:42:48 -0600 Subject: [PATCH 33/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index bba6b68fc076d..8b9af1bba3bcd 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -207,7 +207,7 @@ Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. To see the available pretrained checkpoints: -.. code-block python +.. code-block:: python EncDecCTCModel.list_available_models() From 4f2d5b948009272e0a4f1b568059de05512ca119 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 17 Sep 2020 17:50:24 -0600 Subject: [PATCH 34/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 8b9af1bba3bcd..f99c73ec720b7 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -649,6 +649,16 @@ for a full tutorial on generating speech with NeMo, PyTorch Lightning, and Hydra text_to_generate = input("Input what you want the model to say: ") spec, audio = infer(spec_gen, vocoder, text_to_generate) +To see the available pretrained checkpoints: + +.. code-block:: python + + # spec generator + GlowTTSModel.list_available_models() + + # vocoder + WaveGlowModel.list_available_models() + NeMo TTS Model Under the Hood ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 4f6c9c5ff61c94a610e23d8f6b22dfdc69e8e060 Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 21 Sep 2020 08:56:34 -0600 Subject: [PATCH 35/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index f99c73ec720b7..89cc9a9d876ec 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -1,12 +1,12 @@ NeMo ==== -`NVIDIA NeMo `_ is a toolkit for building -Conversational AI applications. NeMo has separate collections for Automatic Speech Recognition (ASR), +`NVIDIA NeMo `_ is a toolkit for building new State-of-the-Art +Conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. -Every module can easily be customized, extended, and composed to create complex Conversational AI -applications. +Every module can easily be customized, extended, and composed to create new Conversational AI +model architectures. Conversational AI architectures are typically very large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node From e949d5fbcca2f057c9a4a057cc4f194c16a4f034 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 10:25:17 -0600 Subject: [PATCH 36/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 89cc9a9d876ec..d9b65ea193f3a 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -103,8 +103,9 @@ Automatic Speech Recognition (ASR) ---------------------------------- Everything needed to train Convolutional ASR models is included with NeMo. -NeMo supports multiple Speech Recognition architectures, including Jasper -and QuartzNet. These models can be trained from scratch on custom datasets or +NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. +`NeMo Speech Models ` +can be trained from scratch on custom datasets or fine-tuned using pre-trained checkpoints trained on thousands of hours of audio that can be restored for immediate use. From 199b4f5f8a83b1426a468c4a127239849e52eee2 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 10:31:57 -0600 Subject: [PATCH 37/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index d9b65ea193f3a..d480a4a37c90d 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -287,7 +287,8 @@ Natural Language Processing (NLP) --------------------------------- Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo. -NeMo supports language models from `HuggingFace Transformers `_ +`NeMo NLP Models ` +include `HuggingFace Transformers `_ and `NVIDIA Megatron-LM `_ BERT and Bio-Megatron models. NeMo can also be used for pretraining BERT-based language models from HuggingFace. @@ -540,12 +541,13 @@ Text-To-Speech (TTS) -------------------- Everything needed to train TTS models and generate audio is included with NeMo. -Models can be trained from scratch on your own data or pretrained models can be downloaded +`NeMo TTS Models ` +can be trained from scratch on your own data or pretrained models can be downloaded automatically. NeMo currently supports a two step inference procedure. First, a model is used to generate a mel spectrogram from text. Second, a model is used to generate audio from a mel spectrogram. -Mel Spectogram Generators: +Mel Spectrogram Generators: - `Tacotron 2 `_ - `Glow-TTS `_ From a57cee96a6cc678a6343e108c58f212095d8fc04 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:11:59 -0600 Subject: [PATCH 38/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index d480a4a37c90d..10a390e0d65f1 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -757,3 +757,19 @@ for a production-grade application. ) def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): ... + +-------- + +Learn More +^^^^^^^^^^ + +`NVIDIA NeMo ` is actively being developed on GitHub. +Go there to see our +`tutorials `, +or make a `contribution `! + +.. note:: Most NeMo tutorial notebooks can be run on `Google Colab ` + + + From f93b1646e17b346102cfd476d81103e59cba5281 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:14:55 -0600 Subject: [PATCH 39/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 10a390e0d65f1..acd0c7db6ae9f 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -104,7 +104,7 @@ Automatic Speech Recognition (ASR) Everything needed to train Convolutional ASR models is included with NeMo. NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. -`NeMo Speech Models ` +`NeMo Speech Models `_ can be trained from scratch on custom datasets or fine-tuned using pre-trained checkpoints trained on thousands of hours of audio that can be restored for immediate use. @@ -287,7 +287,7 @@ Natural Language Processing (NLP) --------------------------------- Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo. -`NeMo NLP Models ` +`NeMo NLP Models `_ include `HuggingFace Transformers `_ and `NVIDIA Megatron-LM `_ BERT and Bio-Megatron models. NeMo can also be used for pretraining BERT-based language models from HuggingFace. @@ -541,7 +541,7 @@ Text-To-Speech (TTS) -------------------- Everything needed to train TTS models and generate audio is included with NeMo. -`NeMo TTS Models ` +`NeMo TTS Models `_ can be trained from scratch on your own data or pretrained models can be downloaded automatically. NeMo currently supports a two step inference procedure. First, a model is used to generate a mel spectrogram from text. @@ -763,13 +763,13 @@ for a production-grade application. Learn More ^^^^^^^^^^ -`NVIDIA NeMo ` is actively being developed on GitHub. +`NVIDIA NeMo `_ is actively being developed on GitHub. Go there to see our -`tutorials `, -or make a `contribution `! +`tutorials `_, +or make a `contribution `_! -.. note:: Most NeMo tutorial notebooks can be run on `Google Colab ` +.. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_ From f4ea2f057c03e4d97635ed5e0cdf4d55973697ae Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:16:12 -0600 Subject: [PATCH 40/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index acd0c7db6ae9f..e4a7fbfad0832 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -765,7 +765,7 @@ Learn More `NVIDIA NeMo `_ is actively being developed on GitHub. Go there to see our -`tutorials `_, or make a `contribution `_! From 8a6b7d6a8b73d80a52f83ab74c58d2d5e7ca15a4 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:16:49 -0600 Subject: [PATCH 41/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index e4a7fbfad0832..f182e69c222e3 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -764,8 +764,7 @@ Learn More ^^^^^^^^^^ `NVIDIA NeMo `_ is actively being developed on GitHub. -Go there to see our -`tutorials `_ , `example scripts `_, or make a `contribution `_! From 4df0bb86ab1e03dcf2f16b7022586cb42c4ef1aa Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:22:05 -0600 Subject: [PATCH 42/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index f182e69c222e3..86e7b81e0b426 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -764,11 +764,15 @@ Learn More ^^^^^^^^^^ `NVIDIA NeMo `_ is actively being developed on GitHub. -Go there to see our `tutorials `_ , +Go there to see our `tutorials `_, `example scripts `_, or make a `contribution `_! .. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_ - - +Also, see our `developer guide `_ and +our collections of pretrained +`ASR `_, +`NLP `_, +and `TTS `_ models +on NVIDIA NGC. \ No newline at end of file From 42fe63a67c6cae8304b38988e6ddbf7cd5200aaa Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:22:52 -0600 Subject: [PATCH 43/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 86e7b81e0b426..bead99bb3f7ff 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -768,7 +768,7 @@ Go there to see our `tutorials `_, `example scripts `_, or make a `contribution `_! -.. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_ +.. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_. Also, see our `developer guide `_ and our collections of pretrained From fadef5d8b61991def89f4265a6a00c38b4314740 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:24:03 -0600 Subject: [PATCH 44/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index bead99bb3f7ff..e4af2c699c3d9 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -775,4 +775,4 @@ our collections of pretrained `ASR `_, `NLP `_, and `TTS `_ models -on NVIDIA NGC. \ No newline at end of file +on `NVIDIA NGC `_. \ No newline at end of file From b57d7e68918d46e68e31b9f87690b8d21407b03f Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 11:39:52 -0600 Subject: [PATCH 45/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index e4af2c699c3d9..cc87fa1e0051f 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -763,16 +763,22 @@ for a production-grade application. Learn More ^^^^^^^^^^ -`NVIDIA NeMo `_ is actively being developed on GitHub. -Go there to see our `tutorials `_, -`example scripts `_, -or make a `contribution `_! - -.. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_. - -Also, see our `developer guide `_ and -our collections of pretrained +Download pre-trained `ASR `_, `NLP `_, and `TTS `_ models -on `NVIDIA NGC `_. \ No newline at end of file +on `NVIDIA NGC `_ to quickly get started with NeMo. + + +Become an expert on Building Conversational AI applications with +our `tutorials `_, +and `example scripts `_, + +.. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_. + +`NVIDIA NeMo `_ is actively being developed on GitHub. +`Contributions `_ are welcome! + +See our `developer guide `_ for +more information on core NeMo concepts, ASR/NLP/TTS collections, +and the NeMo API. From 568f44ba4da50729e7068ee470c2945c6b2ccd84 Mon Sep 17 00:00:00 2001 From: ericharper Date: Wed, 23 Sep 2020 14:10:10 -0600 Subject: [PATCH 46/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index cc87fa1e0051f..90b91ed924264 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -48,7 +48,7 @@ Installing the latest NeMo release is a simple pip install. .. code-block:: bash - pip install nemo_toolkit[all] + pip install nemo_toolkit[all]==1.0.0a1 To install the main branch from GitHub: @@ -56,6 +56,11 @@ To install the main branch from GitHub: python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all] +To install from a local clone of NeMo: + +.. code-block:: bash + + ./reinstall.sh # from cloned NeMo's git root For Docker users, the NeMo container is available on `NGC `_ From b5704f462b37cefc5347852efd6a41289784fd78 Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 28 Sep 2020 10:39:48 -0600 Subject: [PATCH 47/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 90b91ed924264..7b3b5d3e2fcf2 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -48,7 +48,7 @@ Installing the latest NeMo release is a simple pip install. .. code-block:: bash - pip install nemo_toolkit[all]==1.0.0a1 + pip install nemo_toolkit[all]==1.0.0b1 To install the main branch from GitHub: @@ -68,12 +68,12 @@ For Docker users, the NeMo container is available on .. code-block:: bash # TODO: update container tag when available - docker pull nvcr.io/nvidia/nemo:v0.11 + docker pull nvcr.io/nvidia/nemo:1.0.0b1 .. code-block:: bash - docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:v0.11 + docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:1.0.0b1 Experiment Manager ^^^^^^^^^^^^^^^^^^ From f1da81e880c676554d02acee9b886f5f6195e336 Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 28 Sep 2020 11:22:48 -0600 Subject: [PATCH 48/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index 7b3b5d3e2fcf2..a687601f97207 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -63,7 +63,7 @@ To install from a local clone of NeMo: ./reinstall.sh # from cloned NeMo's git root For Docker users, the NeMo container is available on -`NGC `_ +`NGC `_: .. code-block:: bash @@ -73,7 +73,7 @@ For Docker users, the NeMo container is available on .. code-block:: bash - docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:1.0.0b1 + docker run --runtime=nvidia -it --rm -v --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:1.0.0b1 Experiment Manager ^^^^^^^^^^^^^^^^^^ @@ -766,7 +766,7 @@ for a production-grade application. -------- Learn More -^^^^^^^^^^ +---------- Download pre-trained `ASR `_, From a1e3063868688347d80fa8f4291634a0d76578c9 Mon Sep 17 00:00:00 2001 From: ericharper Date: Thu, 1 Oct 2020 12:20:21 -0600 Subject: [PATCH 49/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index a687601f97207..a0a9461c927a9 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -44,7 +44,19 @@ across multiple domains. NeMo models are also fully compatible with the PyTorch Installing NeMo ^^^^^^^^^^^^^^^ -Installing the latest NeMo release is a simple pip install. +Before installing NeMo, please install Cython first. + +.. code-block:: bash + + pip install Cython + +For ASR and TTS models, also install these linux utilities. + +.. code-block:: bash + + apt-get update && apt-get install -y libsndfile1 ffmpeg + +Then installing the latest NeMo release is a simple pip install. .. code-block:: bash @@ -63,14 +75,12 @@ To install from a local clone of NeMo: ./reinstall.sh # from cloned NeMo's git root For Docker users, the NeMo container is available on -`NGC `_: +`NGC `_. .. code-block:: bash - # TODO: update container tag when available docker pull nvcr.io/nvidia/nemo:1.0.0b1 - .. code-block:: bash docker run --runtime=nvidia -it --rm -v --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:1.0.0b1 From 3d3882d508ebc243decb95fc90bc61ea1b63b2f9 Mon Sep 17 00:00:00 2001 From: ericharper Date: Mon, 5 Oct 2020 09:51:17 -0600 Subject: [PATCH 50/51] updated Signed-off-by: ericharper --- docs/source/conversational_ai.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conversational_ai.rst b/docs/source/conversational_ai.rst index a0a9461c927a9..cbcf42a8ef17c 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/conversational_ai.rst @@ -79,7 +79,7 @@ For Docker users, the NeMo container is available on .. code-block:: bash - docker pull nvcr.io/nvidia/nemo:1.0.0b1 + docker pull nvcr.io/nvidia/nemo:v1.0.0b1 .. code-block:: bash From da4f8f158fed6fd79e4973683bfbf7700bb3d775 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Wed, 7 Oct 2020 14:20:19 -0400 Subject: [PATCH 51/51] doc clean up --- .../{conversational_ai.rst => asr_tts.rst} | 61 +++++++++++-------- docs/source/index.rst | 6 +- 2 files changed, 38 insertions(+), 29 deletions(-) rename docs/source/{conversational_ai.rst => asr_tts.rst} (97%) diff --git a/docs/source/conversational_ai.rst b/docs/source/asr_tts.rst similarity index 97% rename from docs/source/conversational_ai.rst rename to docs/source/asr_tts.rst index cbcf42a8ef17c..5d47ccdc8e5a8 100644 --- a/docs/source/conversational_ai.rst +++ b/docs/source/asr_tts.rst @@ -1,5 +1,13 @@ +######### +ASR & TTS +######### +These are amazing ecosystems to help with Automatic Speech Recognition (ASR) and Text to speech (TTS). + +---- + +**** NeMo -==== +**** `NVIDIA NeMo `_ is a toolkit for building new State-of-the-Art Conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), @@ -17,7 +25,7 @@ mixed-precision training. ---------- NeMo Models ------------ +=========== NeMo Models contain everything needed to train and reproduce state of the art Conversational AI research and applications, including: @@ -42,7 +50,7 @@ NeMo models all have the same look and feel. This makes it easy to do Conversati across multiple domains. NeMo models are also fully compatible with the PyTorch ecosystem. Installing NeMo -^^^^^^^^^^^^^^^ +--------------- Before installing NeMo, please install Cython first. @@ -86,7 +94,7 @@ For Docker users, the NeMo container is available on docker run --runtime=nvidia -it --rm -v --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:1.0.0b1 Experiment Manager -^^^^^^^^^^^^^^^^^^ +------------------ NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, TensorBoard Logging, and Weights and Biases logging. The Experiment Manager is included by default @@ -115,7 +123,7 @@ Optionally launch Tensorboard to view training results in ./nemo_experiments (by -------- Automatic Speech Recognition (ASR) ----------------------------------- +================================== Everything needed to train Convolutional ASR models is included with NeMo. NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. @@ -136,7 +144,7 @@ See this `asr notebook `_ @@ -208,7 +216,7 @@ including the PyTorch Lightning Trainer, customizable from the command line. Using State-Of-The-Art Pre-trained ASR Model -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +-------------------------------------------- Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. @@ -228,7 +236,7 @@ To see the available pretrained checkpoints: EncDecCTCModel.list_available_models() NeMo ASR Model Under the Hood -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +----------------------------- Any aspect of ASR training or model architecture design can easily be customized with PyTorch Lightning since every NeMo model is a Lightning Module. @@ -269,7 +277,7 @@ with PyTorch Lightning since every NeMo model is a Lightning Module. return {'loss': loss_value, 'log': tensorboard_logs} Neural Types in NeMo ASR -^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------ NeMo Models and Neural Modules come with Neural Type checking. Neural type checking is extremely useful when combining many different neural @@ -299,7 +307,7 @@ network architectures for a production-grade application. -------- Natural Language Processing (NLP) ---------------------------------- +================================= Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo. `NeMo NLP Models `_ @@ -319,7 +327,7 @@ that are included with NeMo: - `Punctuation and Capitalization `_ Named Entity Recognition (NER) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------ NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text. This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify @@ -328,7 +336,7 @@ See this `NER notebook `_ @@ -421,9 +429,10 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai trainer.max_epochs=5 \ trainer.gpus=[0,1] +----------- Tokenizers -^^^^^^^^^^ +========== Tokenization is the process of converting natural langauge text into integer arrays which can be used for machine learning. @@ -445,7 +454,7 @@ See this `tokenizer notebook `_ @@ -575,7 +584,7 @@ Audio Generators: Specify TTS Model Configurations with YAML File -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +----------------------------------------------- ..note NeMo Models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. @@ -607,7 +616,7 @@ Specify TTS Model Configurations with YAML File ... Developing TTS Model From Scratch -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +--------------------------------- `tts/glow_tts.py `_ @@ -635,7 +644,7 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai ..note Training NeMo TTTs models from scratch take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. Using State-Of-The-Art Pre-trained TTS Model -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +-------------------------------------------- Generate speech using models trained on `LJSpeech `, around 24 hours of single speaker data. @@ -678,7 +687,7 @@ To see the available pretrained checkpoints: WaveGlowModel.list_available_models() NeMo TTS Model Under the Hood -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +----------------------------- Any aspect of TTS training or model architecture design can easily be customized with PyTorch Lightning since every NeMo model is a LightningModule. @@ -751,7 +760,7 @@ be customized with PyTorch Lightning since every NeMo model is a LightningModule ... Neural Types in NeMo TTS -^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------ NeMo Models and Neural Modules come with Neural Type checking. Neural type checking is extremely useful when combining many different neural network architectures @@ -776,7 +785,7 @@ for a production-grade application. -------- Learn More ----------- +========== Download pre-trained `ASR `_, diff --git a/docs/source/index.rst b/docs/source/index.rst index 9fc966ef0de28..39995447b6546 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -109,10 +109,10 @@ PyTorch Lightning Documentation .. toctree:: :maxdepth: 1 - :name: Conversational AI - :caption: Conversational AI + :name: Partner Domain Frameworks + :caption: Partner Domain Frameworks - conversational_ai + asr_tts .. toctree:: :maxdepth: 1