Merge branch 'main' into dialogue_state_tracking_refactor

NVIDIA · Apr 28, 2022 · fb32ff0 · fb32ff0
2 parents f7f3177 + f776442
commit fb32ff0
Show file tree

Hide file tree

Showing 416 changed files with 57,420 additions and 40,695 deletions.
diff --git a/.github/workflows/import-test.yml b/.github/workflows/import-test.yml
@@ -2,6 +2,9 @@ name: CI-Import-Check
 
 on:
   push:
+  pull_request:
+    paths:
+      - "**"
 
 jobs:
   ci-import-check:

diff --git a/Dockerfile b/Dockerfile
@@ -56,10 +56,6 @@ WORKDIR /tmp/nemo
 COPY requirements .
 RUN for f in $(ls requirements*.txt); do pip install --disable-pip-version-check --no-cache-dir -r $f; done
 
-# install nemo_text_processing dependencies
-COPY nemo_text_processing /tmp/nemo/nemo_text_processing/
-RUN /bin/bash /tmp/nemo/nemo_text_processing/setup.sh
-
 # install k2, skip if installation fails
 COPY scripts /tmp/nemo/scripts/
 RUN /bin/bash /tmp/nemo/scripts/speech_recognition/k2/setup.sh; exit 0
@@ -70,7 +66,7 @@ COPY . .
 
 # start building the final container
 FROM nemo-deps as nemo
-ARG NEMO_VERSION=1.8.0
+ARG NEMO_VERSION=1.9.0
 
 # Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container
 # version information as runtime environment variable for introspection purposes

diff --git a/Jenkinsfile b/Jenkinsfile
diff --git a/README.rst b/README.rst
@@ -68,7 +68,7 @@ Key Features
     * `Information retrieval <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/information_retrieval.html>`_
     * `Entity Linking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/entity_linking.html>`_
     * `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/sgd_qa.html>`_   
-    * `Prompt Tuning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/megatron_finetuning.html#prompt-tuning>`_
+    * `Prompt Tuning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/prompt_learning.html>`_
     * `NGC collection of pre-trained NLP models. <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_nlp>`_
 * `Speech synthesis (TTS) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tts/intro.html#>`_
     * Spectrogram generation: Tacotron2, GlowTTS, TalkNet, FastPitch, FastSpeech2, Mixer-TTS, Mixer-TTS-X
@@ -123,6 +123,26 @@ FAQ can be found on NeMo's `Discussions board <https://github.com/NVIDIA/NeMo/di
 Installation
 ------------
 
+Conda
+~~~~~
+
+We recommend installing NeMo in a fresh Conda environment.
+
+.. code-block:: bash
+
+    conda create --name nemo python==3.8
+    conda activate nemo
+
+Install PyTorch using their `configurator <https://pytorch.org/get-started/locally/>`_. 
+
+.. code-block:: bash
+
+    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
+
+.. note::
+
+  The command used to install PyTorch may depend on your system.
+
 Pip
 ~~~
 Use this installation mode if you want the latest released version.

diff --git a/docs/source/_static/css/custom.css b/docs/source/_static/css/custom.css
@@ -1,3 +1,11 @@
+/* Import the Roboto Thin Font */
+@import url('https://fonts.googleapis.com/css2?family=Roboto:wght@400&display=swap');
+
+body {
+ font-size: 100%;
+ font-family: 'Roboto', sans-serif;
+}
+
 
 /* Width of template */
 
@@ -13,21 +21,21 @@ h1
 {
 	color: #76b900;
     text-align: center;
-	background-color: #333333;
+	background-color: #ffffff;
 }
 
 h2
 {
 	color: #ffffff;
-	background-color: #76b900;
+	background-color: #ffffff;  /* #76b900 */
     Padding: 5px;
 }
 
 h3
 {
 	padding-top: 0px;
-	border-top: solid 3px #76b900;
-	border-bottom: solid 3px #76b900;
+	border-top: solid 3px #000000;  /* #76b900 */
+	border-bottom: solid 3px #000000;  /* #76b900 */
 }
 
 p
@@ -197,3 +205,10 @@ thead td
 {
 	margin-top: 50px;
 }
+
+
+/* Logo */
+.navbar-brand-box {
+    background-color: #ffffff;
+}
+
diff --git a/docs/source/asr/data/benchmark_zh.csv b/docs/source/asr/data/benchmark_zh.csv
@@ -1,3 +1,4 @@
 Model,Model Base Class,Model Card
 stt_zh_citrinet_512,EncDecCTCModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_zh_citrinet_512"
 stt_zh_citrinet_1024_gamma_0_25,EncDecCTCModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_zh_citrinet_1024_gamma_0_25"
+stt_zh_conformer_transducer_large,EncDecCTCModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_zh_conformer_transducer_large"
diff --git a/docs/source/asr/datasets.rst b/docs/source/asr/datasets.rst
@@ -74,7 +74,7 @@ are located in the remaining directories in an ``audio`` subdirectory.
 
    .. code-block:: bash
 
-     cd <nemo_root>/scripts
+     cd <nemo_root>/scripts/dataset_processing
      python fisher_audio_to_wav.py \
        --data_root=<fisher_root> --dest_root=<conversion_target_dir>
 
@@ -235,7 +235,7 @@ Conversion to Tarred Datasets
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 You can easily convert your existing NeMo-compatible ASR datasets using the
-`conversion script here <https://github.com/NVIDIA/NeMo/blob/v1.0.2/scripts/speech_recognition/convert_to_tarred_audio_dataset.py>`_.
+`conversion script here <https://github.com/NVIDIA/NeMo/blob/main/scripts/speech_recognition/convert_to_tarred_audio_dataset.py>`_.
 
 .. code::
 
@@ -320,4 +320,4 @@ Tha parameter train_ds.bucketing_strategy can be set to specify one of these str
 The fully_randomized strategy would have lower speedup than synced_randomized but may give better accuracy.
 
 Bucketing may improve the training speed more than 2x but may affect the final accuracy of the model slightly. Training for more epochs and using 'synced_randomized' strategy help to fill this gap.
-Currently bucketing feature is just supported for tarred datasets.
+Currently bucketing feature is just supported for tarred datasets.
diff --git a/docs/source/asr/intro.rst b/docs/source/asr/intro.rst
@@ -39,21 +39,6 @@ The full documentation tree is as follows:
    results
    configs
    api
+   resources
 
-Resources and Documentation
----------------------------
-
-Hands-on speech recognition tutorial notebooks can be found under `the ASR tutorials folder <https://github.com/NVIDIA/NeMo/tree/v1.0.2/tutorials/asr/>`_.
-If you are a beginner to NeMo, consider trying out the `ASR with NeMo <https://github.com/NVIDIA/NeMo/tree/v1.0.2/tutorials/asr/ASR_with_NeMo.ipynb>`_ tutorial.
-This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
-
-If you are looking for information about a particular ASR model, or would like to find out more about the model
-architectures available in the `nemo_asr` collection, refer to the :doc:`Models <./models>` section.
-
-NeMo includes preprocessing scripts for several common ASR datasets. The :doc:`Datasets <./datasets>` section contains instructions on 
-running those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
-
-Information about how to load model checkpoints (either local files or pretrained ones from NGC), as well as a list of the checkpoints 
-available on NGC are located on the :doc:`Checkpoints <./results>` section.
-
-Documentation regarding the configuration files specific to the ``nemo_asr`` models can be found on the :doc:`Configuration Files <./configs>` section.
+.. include:: resources.rst
diff --git a/docs/source/asr/resources.rst b/docs/source/asr/resources.rst
@@ -0,0 +1,17 @@
+Resources and Documentation
+---------------------------
+
+Hands-on speech recognition tutorial notebooks can be found under `the ASR tutorials folder <https://github.com/NVIDIA/NeMo/tree/v1.0.2/tutorials/asr/>`_.
+If you are a beginner to NeMo, consider trying out the `ASR with NeMo <https://github.com/NVIDIA/NeMo/tree/v1.0.2/tutorials/asr/ASR_with_NeMo.ipynb>`_ tutorial.
+This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
+
+If you are looking for information about a particular ASR model, or would like to find out more about the model
+architectures available in the `nemo_asr` collection, refer to the :doc:`Models <./models>` section.
+
+NeMo includes preprocessing scripts for several common ASR datasets. The :doc:`Datasets <./datasets>` section contains instructions on
+running those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
+
+Information about how to load model checkpoints (either local files or pretrained ones from NGC), as well as a list of the checkpoints
+available on NGC are located on the :doc:`Checkpoints <./results>` section.
+
+Documentation regarding the configuration files specific to the ``nemo_asr`` models can be found on the :doc:`Configuration Files <./configs>` section.
diff --git a/docs/source/asr/speaker_diarization/intro.rst b/docs/source/asr/speaker_diarization/intro.rst
@@ -26,27 +26,6 @@ The full documentation tree is as follows:
    results
    configs
    api
+   resources
 
-Resource and Documentation Guide
---------------------------------
-
-Hands-on speaker diarization tutorial notebooks can be found under ``<NeMo_git_root>/tutorials/speaker_tasks/``.
-
-There are tutorials for performing inference using :ref:`MarbleNet_model` and :ref:`TitaNet_model`, 
-and how one can get ASR transcriptions combined with Speaker labels along with voice activity time stamps with NeMo asr collections.
-
-Most of the tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
-
-If you are looking for information about a particular model used for speaker diarization inference, or would like to find out more about the model
-architectures available in the `nemo_asr` collection, check out the :doc:`Models <./models>` page.
-
-Documentation on dataset preprocessing can be found on the :doc:`Datasets <./datasets>` page.
-NeMo includes preprocessing scripts for several common ASR datasets, and this page contains instructions on running
-those scripts.
-It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
-
-Information about how to load model checkpoints (either local files or pretrained ones from NGC), perform inference, as well as a list
-of the checkpoints available on NGC are located on the :doc:`Checkpoints <./results>` page.
-
-Documentation for configuration files specific to the ``nemo_asr`` models can be found on the
-:doc:`Configuration Files <./configs>` page.
+.. include:: resources.rst
diff --git a/docs/source/asr/speaker_diarization/resources.rst b/docs/source/asr/speaker_diarization/resources.rst
@@ -0,0 +1,24 @@
+
+Resource and Documentation Guide
+--------------------------------
+
+Hands-on speaker diarization tutorial notebooks can be found under ``<NeMo_git_root>/tutorials/speaker_tasks/``.
+
+There are tutorials for performing inference using :ref:`MarbleNet_model` and :ref:`TitaNet_model`,
+and how one can get ASR transcriptions combined with Speaker labels along with voice activity time stamps with NeMo asr collections.
+
+Most of the tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
+
+If you are looking for information about a particular model used for speaker diarization inference, or would like to find out more about the model
+architectures available in the `nemo_asr` collection, check out the :doc:`Models <./models>` page.
+
+Documentation on dataset preprocessing can be found on the :doc:`Datasets <./datasets>` page.
+NeMo includes preprocessing scripts for several common ASR datasets, and this page contains instructions on running
+those scripts.
+It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
+
+Information about how to load model checkpoints (either local files or pretrained ones from NGC), perform inference, as well as a list
+of the checkpoints available on NGC are located on the :doc:`Checkpoints <./results>` page.
+
+Documentation for configuration files specific to the ``nemo_asr`` models can be found on the
+:doc:`Configuration Files <./configs>` page.
diff --git a/docs/source/asr/speaker_recognition/intro.rst b/docs/source/asr/speaker_recognition/intro.rst
@@ -18,25 +18,6 @@ The full documentation tree:
    datasets
    results
    api
+   resources
 
-Resource and Documentation Guide
---------------------------------
-
-Hands-on speaker recognition tutorial notebooks can be found under
-`the speaker recognition tutorials folder <https://github.com/NVIDIA/NeMo/tree/main/tutorials/speaker_tasks/>`_. This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
-
-If you are looking for information about a particular SpeakerNet model, or would like to find out more about the model
-architectures available in the ``nemo_asr`` collection, check out the :doc:`Models <./models>` page.
-
-Documentation on dataset preprocessing can be found on the :doc:`Datasets <./datasets>` page.
-NeMo includes preprocessing and other scripts for speaker_recognition in <nemo/scripts/speaker_tasks/> folder, and this page contains instructions on running
-those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
-
-Information about how to load model checkpoints (either local files or pretrained ones from NGC), perform inference, as well as a list
-of the checkpoints available on NGC are located on the :doc:`Checkpoints <./results>` page.
-
-Documentation for configuration files specific to the ``nemo_asr`` models can be found on the
-:doc:`Configuration Files <./configs>` page.
-
-
-For a clear step-by-step tutorial we advise you to refer to the tutorials found in `folder <https://github.com/NVIDIA/NeMo/tree/main/tutorials/speaker_tasks/>`_.
+.. include:: resources.rst
diff --git a/docs/source/asr/speaker_recognition/resources.rst b/docs/source/asr/speaker_recognition/resources.rst
@@ -0,0 +1,22 @@
+
+Resource and Documentation Guide
+--------------------------------
+
+Hands-on speaker recognition tutorial notebooks can be found under
+`the speaker recognition tutorials folder <https://github.com/NVIDIA/NeMo/tree/main/tutorials/speaker_tasks/>`_. This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
+
+If you are looking for information about a particular SpeakerNet model, or would like to find out more about the model
+architectures available in the ``nemo_asr`` collection, check out the :doc:`Models <./models>` page.
+
+Documentation on dataset preprocessing can be found on the :doc:`Datasets <./datasets>` page.
+NeMo includes preprocessing and other scripts for speaker_recognition in <nemo/scripts/speaker_tasks/> folder, and this page contains instructions on running
+those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
+
+Information about how to load model checkpoints (either local files or pretrained ones from NGC), perform inference, as well as a list
+of the checkpoints available on NGC are located on the :doc:`Checkpoints <./results>` page.
+
+Documentation for configuration files specific to the ``nemo_asr`` models can be found on the
+:doc:`Configuration Files <./configs>` page.
+
+
+For a clear step-by-step tutorial we advise you to refer to the tutorials found in `folder <https://github.com/NVIDIA/NeMo/tree/main/tutorials/speaker_tasks/>`_.
diff --git a/docs/source/asr/speech_classification/intro.rst b/docs/source/asr/speech_classification/intro.rst
@@ -23,25 +23,6 @@ The full documentation tree is as follows:
    datasets
    results
    configs
+   resources.rst
 
-
-Resource and Documentation Guide
---------------------------------
-
-Hands-on speech classification tutorial notebooks can be found under ``<NeMo_git_repo>/tutorials/asr/``.
-There are training and offline & online microphone inference tutorials for Speech Command Detection and Voice Activity Detection tasks.
-This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
-
-If you are looking for information about a particular Speech Classification model or would like to find out more about the model
-architectures available in the `nemo_asr` collection, check out the :doc:`Models <./models>` page.
-
-Documentation on dataset preprocessing can be found on the :doc:`Datasets <./datasets>` page.
-NeMo includes preprocessing scripts for several common ASR datasets, and this page contains instructions on running
-those scripts.
-It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
-
-Information about how to load model checkpoints (either local files or pretrained ones from NGC), perform inference, as well as a list
-of the checkpoints available on NGC are located on the :doc:`Checkpoints <./results>` page.
-
-Documentation for configuration files specific to the ``nemo_asr`` models can be found on the
-:doc:`Configuration Files <./configs>` page.
+.. include:: resources.rst
diff --git a/docs/source/asr/speech_classification/resources.rst b/docs/source/asr/speech_classification/resources.rst
@@ -0,0 +1,20 @@
+Resource and Documentation Guide
+--------------------------------
+
+Hands-on speech classification tutorial notebooks can be found under ``<NeMo_git_repo>/tutorials/asr/``.
+There are training and offline & online microphone inference tutorials for Speech Command Detection and Voice Activity Detection tasks.
+This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab.
+
+If you are looking for information about a particular Speech Classification model or would like to find out more about the model
+architectures available in the `nemo_asr` collection, check out the :doc:`Models <./models>` page.
+
+Documentation on dataset preprocessing can be found on the :doc:`Datasets <./datasets>` page.
+NeMo includes preprocessing scripts for several common ASR datasets, and this page contains instructions on running
+those scripts.
+It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data.
+
+Information about how to load model checkpoints (either local files or pretrained ones from NGC), perform inference, as well as a list
+of the checkpoints available on NGC are located on the :doc:`Checkpoints <./results>` page.
+
+Documentation for configuration files specific to the ``nemo_asr`` models can be found on the
+:doc:`Configuration Files <./configs>` page.