diff --git a/docs/source/asr_tts.rst b/docs/source/asr_tts.rst new file mode 100644 index 0000000000000..5d47ccdc8e5a8 --- /dev/null +++ b/docs/source/asr_tts.rst @@ -0,0 +1,808 @@ +######### +ASR & TTS +######### +These are amazing ecosystems to help with Automatic Speech Recognition (ASR) and Text to speech (TTS). + +---- + +**** +NeMo +**** + +`NVIDIA NeMo `_ is a toolkit for building new State-of-the-Art +Conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), +Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of +prebuilt modules that include everything needed to train on your data. +Every module can easily be customized, extended, and composed to create new Conversational AI +model architectures. + +Conversational AI architectures are typically very large and require a lot of data and compute +for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node +mixed-precision training. + +.. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. + +---------- + +NeMo Models +=========== + +NeMo Models contain everything needed to train and reproduce state of the art Conversational AI +research and applications, including: + +- neural network architectures +- datasets/data loaders +- data preprocessing/postprocessing +- data augmentors +- optimizers and schedulers +- tokenizers +- language models + +NeMo uses `Hydra `_ for configuring both NeMo models and the PyTorch Lightning Trainer. +Depending on the domain and application, many different AI libraries will have to be configured +to build the application. Hydra makes it easy to bring all of these libraries together +so that each can be configured from .yaml or the Hydra CLI. + +.. note:: Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training. + +The end result of using NeMo, Pytorch Lightning, and Hydra is that +NeMo models all have the same look and feel. This makes it easy to do Conversational AI research +across multiple domains. NeMo models are also fully compatible with the PyTorch ecosystem. + +Installing NeMo +--------------- + +Before installing NeMo, please install Cython first. + +.. code-block:: bash + + pip install Cython + +For ASR and TTS models, also install these linux utilities. + +.. code-block:: bash + + apt-get update && apt-get install -y libsndfile1 ffmpeg + +Then installing the latest NeMo release is a simple pip install. + +.. code-block:: bash + + pip install nemo_toolkit[all]==1.0.0b1 + +To install the main branch from GitHub: + +.. code-block:: bash + + python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all] + +To install from a local clone of NeMo: + +.. code-block:: bash + + ./reinstall.sh # from cloned NeMo's git root + +For Docker users, the NeMo container is available on +`NGC `_. + +.. code-block:: bash + + docker pull nvcr.io/nvidia/nemo:v1.0.0b1 + +.. code-block:: bash + + docker run --runtime=nvidia -it --rm -v --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:1.0.0b1 + +Experiment Manager +------------------ + +NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, +TensorBoard Logging, and Weights and Biases logging. The Experiment Manager is included by default +in all NeMo example scripts. + +.. code-block:: python + + exp_manager(trainer, cfg.get("exp_manager", None)) + +And is configurable via .yaml with Hydra. + +.. code-block:: bash + + exp_manager: + exp_dir: null + name: *name + create_tensorboard_logger: True + create_checkpoint_callback: True + +Optionally launch Tensorboard to view training results in ./nemo_experiments (by default). + +.. code-block:: bash + + tensorboard --bind_all --logdir nemo_experiments + +-------- + +Automatic Speech Recognition (ASR) +================================== + +Everything needed to train Convolutional ASR models is included with NeMo. +NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. +`NeMo Speech Models `_ +can be trained from scratch on custom datasets or +fine-tuned using pre-trained checkpoints trained on thousands of hours of audio +that can be restored for immediate use. + +Some typical ASR tasks are included with NeMo: + +- `Audio transcription `_ +- `Byte Pair/Word Piece Training `_ +- `Speech Commands `_ +- `Voice Activity Detection `_ +- `Speaker Recognition `_ + +See this `asr notebook `_ +for a full tutorial on doing ASR with NeMo, PyTorch Lightning, and Hydra. + +Specify ASR Model Configurations with YAML File +----------------------------------------------- + +NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +See this `asr config `_ +for the entire speech to text .yaml file. + +.. code-block:: yaml + + # configure the PyTorch Lightning Trainer + trainer: + gpus: 0 # number of gpus + max_epochs: 5 + max_steps: null # computed at runtime if not set + num_nodes: 1 + distributed_backend: ddp + ... + # configure the ASR model + model: + ... + encoder: + _target_: nemo.collections.asr.modules.ConvASREncoder + params: + feat_in: *n_mels + activation: relu + conv_mask: true + + jasper: + - filters: 128 + repeat: 1 + kernel: [11] + stride: [1] + dilation: [1] + dropout: *dropout + ... + # all other configuration, data, optimizer, preprocessor, etc + ... + +Developing ASR Model From Scratch +--------------------------------- + +`speech_to_text.py `_ + +.. code-block:: python + + # hydra_runner calls hydra.main and is useful for multi-node experiments + @hydra_runner(config_path="conf", config_name="config") + def main(cfg): + trainer = Trainer(**cfg.trainer) + asr_model = EncDecCTCModel(cfg.model, trainer) + trainer.fit(asr_model) + + +Hydra makes every aspect of the NeMo model, +including the PyTorch Lightning Trainer, customizable from the command line. + +.. code-block:: bash + + python NeMo/examples/asr/speech_to_text.py --config-name=quartznet_15x5 \ + trainer.gpus=4 \ + trainer.max_epochs=128 \ + +trainer.precision=16 \ + model.train_ds.manifest_filepath=/librispeech-train-all.json \ + model.validation_ds.manifest_filepath=/librispeech-dev-other.json \ + model.train_ds.batch_size=64 \ + +model.validation_ds.num_workers=16 \ + +model.train_ds.num_workers=16 + +.. note:: Training NeMo ASR models can take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. + + +Using State-Of-The-Art Pre-trained ASR Model +-------------------------------------------- + +Transcribe audio with QuartzNet model pretrained on ~3300 hours of audio. + +.. code-block:: python + + quartznet = EncDecCTCModel.from_pretrained('QuartzNet15x5Base-En') + + files = ['path/to/my.wav'] # file duration should be less than 25 seconds + + for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): + print(f"Audio in {fname} was recognized as: {transcription}") + +To see the available pretrained checkpoints: + +.. code-block:: python + + EncDecCTCModel.list_available_models() + +NeMo ASR Model Under the Hood +----------------------------- + +Any aspect of ASR training or model architecture design can easily be customized +with PyTorch Lightning since every NeMo model is a Lightning Module. + +.. code-block:: python + + class EncDecCTCModel(ASRModel): + """Base class for encoder decoder CTC-based models.""" + ... + @typecheck() + def forward(self, input_signal, input_signal_length): + processed_signal, processed_signal_len = self.preprocessor( + input_signal=input_signal, length=input_signal_length, + ) + # Spec augment is not applied during evaluation/testing + if self.spec_augmentation is not None and self.training: + processed_signal = self.spec_augmentation(input_spec=processed_signal) + encoded, encoded_len = self.encoder(audio_signal=processed_signal, length=processed_signal_len) + log_probs = self.decoder(encoder_output=encoded) + greedy_predictions = log_probs.argmax(dim=-1, keepdim=False) + return log_probs, encoded_len, greedy_predictions + + # PTL-specific methods + def training_step(self, batch, batch_nb): + audio_signal, audio_signal_len, transcript, transcript_len = batch + log_probs, encoded_len, predictions = self.forward( + input_signal=audio_signal, input_signal_length=audio_signal_len + ) + loss_value = self.loss( + log_probs=log_probs, targets=transcript, input_lengths=encoded_len, target_lengths=transcript_len + ) + wer_num, wer_denom = self._wer(predictions, transcript, transcript_len) + tensorboard_logs = { + 'train_loss': loss_value, + 'training_batch_wer': wer_num / wer_denom, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + } + return {'loss': loss_value, 'log': tensorboard_logs} + +Neural Types in NeMo ASR +------------------------ + +NeMo Models and Neural Modules come with Neural Type checking. +Neural type checking is extremely useful when combining many different neural +network architectures for a production-grade application. + +.. code-block:: python + + @property + def input_types(self) -> Optional[Dict[str, NeuralType]]: + if hasattr(self.preprocessor, '_sample_rate'): + audio_eltype = AudioSignal(freq=self.preprocessor._sample_rate) + else: + audio_eltype = AudioSignal() + return { + "input_signal": NeuralType(('B', 'T'), audio_eltype), + "input_signal_length": NeuralType(tuple('B'), LengthsType()), + } + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + return { + "outputs": NeuralType(('B', 'T', 'D'), LogprobsType()), + "encoded_lengths": NeuralType(tuple('B'), LengthsType()), + "greedy_predictions": NeuralType(('B', 'T'), LabelsType()), + } + +-------- + +Natural Language Processing (NLP) +================================= + +Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo. +`NeMo NLP Models `_ +include `HuggingFace Transformers `_ +and `NVIDIA Megatron-LM `_ BERT and Bio-Megatron models. +NeMo can also be used for pretraining BERT-based language models from HuggingFace. + +Any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks +that are included with NeMo: + +- `Glue Benchmark (All tasks) `_ +- `Intent Slot Classification `_ +- `Language Modeling (BERT Pretraining) `_ +- `Question Answering `_ +- `Text Classification `_ (including Sentiment Analysis) +- `Token Classifcation `_ (including Named Entity Recognition) +- `Punctuation and Capitalization `_ + +Named Entity Recognition (NER) +------------------------------ + +NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text. +This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify +geographical, geopolitical, organizational, persons, events, and natural phenomenon entities. +See this `NER notebook `_ +for a full tutorial on doing NER with NeMo, PyTorch Lightning, and Hydra. + +Specify NER Model Configurations with YAML File +----------------------------------------------- + +..note NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +See this `token classification config `_ +for the entire NER (token classification) .yaml file. + +.. code-block:: yaml + + # configure any argument of the PyTorch Lightning Trainer + trainer: + gpus: 1 # the number of gpus, 0 for CPU + num_nodes: 1 + max_epochs: 5 + ... + # configure any aspect of the token classification model here + model: + dataset: + data_dir: ??? # /path/to/data + class_balancing: null # choose from [null, weighted_loss]. Weighted_loss enables the weighted class balancing of the loss, may be used for handling unbalanced classes + max_seq_length: 128 + ... + tokenizer: + tokenizer_name: ${model.language_model.pretrained_model_name} # or sentencepiece + vocab_file: null # path to vocab file + ... + # the language model can be from HuggingFace or Megatron-LM + language_model: + pretrained_model_name: bert-base-uncased + lm_checkpoint: null + ... + # the classifier for the downstream task + head: + num_fc_layers: 2 + fc_dropout: 0.5 + activation: 'relu' + ... + # all other configuration: train/val/test/ data, optimizer, experiment manager, etc + ... + +Developing NER Model From Scratch +--------------------------------- + +`token_classification.py `_ + +.. code-block:: python + + # hydra_runner calls hydra.main and is useful for multi-node experiments + @hydra_runner(config_path="conf", config_name="token_classification_config") + def main(cfg: DictConfig) -> None: + trainer = pl.Trainer(**cfg.trainer) + model = TokenClassificationModel(cfg.model, trainer=trainer) + trainer.fit(model) + +After training, we can do inference with the saved NER model using PyTorch Lightning. + +Inference from file: + +.. code-block:: python + + gpu = 1 if cfg.trainer.gpus != 0 else 0 + trainer = pl.Trainer(gpus=gpu) + model.set_trainer(trainer) + model.evaluate_from_file( + text_file=os.path.join(cfg.model.dataset.data_dir, cfg.model.validation_ds.text_file), + labels_file=os.path.join(cfg.model.dataset.data_dir, cfg.model.validation_ds.labels_file), + output_dir=exp_dir, + add_confusion_matrix=True, + normalize_confusion_matrix=True, + ) + +Or we can run inference on a few examples: + +.. code-block:: python + + queries = ['we bought four shirts from the nvidia gear store in santa clara.', 'Nvidia is a company in Santa Clara.'] + results = model.add_predictions(queries) + + for query, result in zip(queries, results): + logging.info(f'Query : {query}') + logging.info(f'Result: {result.strip()}\n') + +Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trainer, customizable from the command line. + +.. code-block:: bash + + python token_classification.py \ + model.language_model.pretrained_model_name=bert-base-cased \ + model.head.num_fc_layers=2 \ + model.dataset.data_dir=/path/to/my/data \ + trainer.max_epochs=5 \ + trainer.gpus=[0,1] + +----------- + +Tokenizers +========== + +Tokenization is the process of converting natural langauge text into integer arrays +which can be used for machine learning. +For NLP tasks, tokenization is an essential part of data preprocessing. +NeMo supports all BERT-like model tokenizers from +`HuggingFace's AutoTokenizer `_ +and also supports `Google's SentencePieceTokenizer `_ +which can be trained on custom data. + +To see the list of supported tokenizers: + +.. code-block:: python + + from nemo.collections import nlp as nemo_nlp + + nemo_nlp.modules.get_tokenizer_list() + +See this `tokenizer notebook `_ +for a full tutorial on using tokenizers in NeMo. + +Language Models +--------------- + +Language models are used to extract information from (tokenized) text. +Much of the state-of-the-art in natural language processing is achieved +by fine-tuning pretrained language models on the downstream task. + +With NeMo, you can either `pretrain `_ +a BERT model on your data or use a pretrained lanugage model from `HuggingFace Transformers `_ +or `NVIDIA Megatron-LM `_. + +To see the list of language models available in NeMo: + +.. code-block:: python + + nemo_nlp.modules.get_pretrained_lm_models_list(include_external=True) + +Easily switch between any language model in the above list by using `.get_lm_model`. + +.. code-block:: python + + nemo_nlp.modules.get_lm_model(pretrained_model_name='distilbert-base-uncased') + +See this `language model notebook `_ +for a full tutorial on using pretrained language models in NeMo. + +Using a Pre-trained NER Model +----------------------------- + +NeMo has pre-trained NER models that can be used +to get started with Token Classification right away. +Models are automatically downloaded from NGC, +cached locally to disk, +and loaded into GPU memory using the `.from_pretrained` method. + +.. code-block:: python + + # load pre-trained NER model + pretrained_ner_model = TokenClassificationModel.from_pretrained(model_name="NERModel") + + # define the list of queries for inference + queries = [ + 'we bought four shirts from the nvidia gear store in santa clara.', + 'Nvidia is a company.', + 'The Adventures of Tom Sawyer by Mark Twain is an 1876 novel about a young boy growing ' + + 'up along the Mississippi River.', + ] + results = pretrained_ner_model.add_predictions(queries) + + for query, result in zip(queries, results): + print() + print(f'Query : {query}') + print(f'Result: {result.strip()}\n') + +NeMo NER Model Under the Hood +----------------------------- + +Any aspect of NLP training or model architecture design can easily be customized with PyTorch Lightning +since every NeMo model is a Lightning Module. + +.. code-block:: python + + class TokenClassificationModel(ModelPT): + """ + Token Classification Model with BERT, applicable for tasks such as Named Entity Recognition + """ + ... + @typecheck() + def forward(self, input_ids, token_type_ids, attention_mask): + hidden_states = self.bert_model( + input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask + ) + logits = self.classifier(hidden_states=hidden_states) + return logits + + # PTL-specfic methods + def training_step(self, batch, batch_idx): + """ + Lightning calls this inside the training loop with the data from the training dataloader + passed in as `batch`. + """ + input_ids, input_type_ids, input_mask, subtokens_mask, loss_mask, labels = batch + logits = self(input_ids=input_ids, token_type_ids=input_type_ids, attention_mask=input_mask) + + loss = self.loss(logits=logits, labels=labels, loss_mask=loss_mask) + tensorboard_logs = {'train_loss': loss, 'lr': self._optimizer.param_groups[0]['lr']} + return {'loss': loss, 'log': tensorboard_logs} + ... + +Neural Types in NeMo NLP +------------------------ + +NeMo Models and Neural Modules come with Neural Type checking. +Neural type checking is extremely useful when combining many different neural network architectures +for a production-grade application. + +.. code-block:: python + + @property + def input_types(self) -> Optional[Dict[str, NeuralType]]: + return self.bert_model.input_types + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + return self.classifier.output_types + +-------- + +Text-To-Speech (TTS) +==================== + +Everything needed to train TTS models and generate audio is included with NeMo. +`NeMo TTS Models `_ +can be trained from scratch on your own data or pretrained models can be downloaded +automatically. NeMo currently supports a two step inference procedure. +First, a model is used to generate a mel spectrogram from text. +Second, a model is used to generate audio from a mel spectrogram. + +Mel Spectrogram Generators: + +- `Tacotron 2 `_ +- `Glow-TTS `_ + +Audio Generators: + +- Griffin-Lim +- `WaveGlow `_ +- `SqueezeWave `_ + + +Specify TTS Model Configurations with YAML File +----------------------------------------------- + +..note NeMo Models and PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. + +`tts/conf/glow_tts.yaml `_ + +.. code-block:: yaml + + # configure the PyTorch Lightning Trainer + trainer: + gpus: -1 # number of gpus + max_epochs: 350 + num_nodes: 1 + distributed_backend: ddp + ... + + # configure the TTS model + model: + ... + encoder: + _target_: nemo.collections.tts.modules.glow_tts.TextEncoder + params: + n_vocab: 148 + out_channels: *n_mels + hidden_channels: 192 + filter_channels: 768 + filter_channels_dp: 256 + ... + # all other configuration, data, optimizer, parser, preprocessor, etc + ... + +Developing TTS Model From Scratch +--------------------------------- + +`tts/glow_tts.py `_ + +.. code-block:: python + + # hydra_runner calls hydra.main and is useful for multi-node experiments + @hydra_runner(config_path="conf", config_name="glow_tts") + def main(cfg): + trainer = pl.Trainer(**cfg.trainer) + model = GlowTTSModel(cfg=cfg.model, trainer=trainer) + trainer.fit(model) + +Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trainer, customizable from the command line. + +.. code-block:: bash + + python NeMo/examples/tts/glow_tts.py \ + trainer.gpus=4 \ + trainer.max_epochs=400 \ + ... + train_dataset=/path/to/train/data \ + validation_datasets=/path/to/val/data \ + model.train_ds.batch_size = 64 \ + +..note Training NeMo TTTs models from scratch take days/weeks so it is highly recommended to use multiple GPUs and multiple nodes with the PyTorch Lightning Trainer. + +Using State-Of-The-Art Pre-trained TTS Model +-------------------------------------------- + +Generate speech using models trained on `LJSpeech `, +around 24 hours of single speaker data. + +See this `TTS notebook `_ +for a full tutorial on generating speech with NeMo, PyTorch Lightning, and Hydra. + +.. code-block:: python + + # load pretrained spectrogram model + spec_gen = SpecModel.from_pretrained('GlowTTS-22050Hz').cuda() + + # load pretrained Generators + vocoder = WaveGlowModel.from_pretrained('WaveGlow-22050Hz').cuda() + + def infer(spec_gen_model, vocder_model, str_input): + with torch.no_grad(): + parsed = spec_gen.parse(text_to_generate) + spectrogram = spec_gen.generate_spectrogram(tokens=parsed) + audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) + if isinstance(spectrogram, torch.Tensor): + spectrogram = spectrogram.to('cpu').numpy() + if len(spectrogram.shape) == 3: + spectrogram = spectrogram[0] + if isinstance(audio, torch.Tensor): + audio = audio.to('cpu').numpy() + return spectrogram, audio + + text_to_generate = input("Input what you want the model to say: ") + spec, audio = infer(spec_gen, vocoder, text_to_generate) + +To see the available pretrained checkpoints: + +.. code-block:: python + + # spec generator + GlowTTSModel.list_available_models() + + # vocoder + WaveGlowModel.list_available_models() + +NeMo TTS Model Under the Hood +----------------------------- + +Any aspect of TTS training or model architecture design can easily +be customized with PyTorch Lightning since every NeMo model is a LightningModule. + +`glow_tts.py `_ + +.. code-block:: python + + class GlowTTSModel(SpectrogramGenerator): + """ + GlowTTS model used to generate spectrograms from text + Consists of a text encoder and an invertible spectrogram decoder + """ + ... + # NeMo models come with neural type checking + @typecheck( + input_types={ + "x": NeuralType(('B', 'T'), TokenIndex()), + "x_lengths": NeuralType(('B'), LengthsType()), + "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), + "y_lengths": NeuralType(('B'), LengthsType(), optional=True), + "gen": NeuralType(optional=True), + "noise_scale": NeuralType(optional=True), + "length_scale": NeuralType(optional=True), + } + ) + def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): + if gen: + return self.glow_tts.generate_spect( + text=x, text_lengths=x_lengths, noise_scale=noise_scale, length_scale=length_scale + ) + else: + return self.glow_tts(text=x, text_lengths=x_lengths, spect=y, spect_lengths=y_lengths) + ... + def step(self, y, y_lengths, x, x_lengths): + z, y_m, y_logs, logdet, logw, logw_, y_lengths, attn = self( + x=x, x_lengths=x_lengths, y=y, y_lengths=y_lengths, gen=False + ) + + l_mle, l_length, logdet = self.loss( + z=z, + y_m=y_m, + y_logs=y_logs, + logdet=logdet, + logw=logw, + logw_=logw_, + x_lengths=x_lengths, + y_lengths=y_lengths, + ) + + loss = sum([l_mle, l_length]) + + return l_mle, l_length, logdet, loss, attn + + # PTL-specfic methods + def training_step(self, batch, batch_idx): + y, y_lengths, x, x_lengths = batch + + y, y_lengths = self.preprocessor(input_signal=y, length=y_lengths) + + l_mle, l_length, logdet, loss, _ = self.step(y, y_lengths, x, x_lengths) + + output = { + "loss": loss, # required + "progress_bar": {"l_mle": l_mle, "l_length": l_length, "logdet": logdet}, + "log": {"loss": loss, "l_mle": l_mle, "l_length": l_length, "logdet": logdet}, + } + + return output + ... + +Neural Types in NeMo TTS +------------------------ + +NeMo Models and Neural Modules come with Neural Type checking. +Neural type checking is extremely useful when combining many different neural network architectures +for a production-grade application. + +.. code-block:: python + + @typecheck( + input_types={ + "x": NeuralType(('B', 'T'), TokenIndex()), + "x_lengths": NeuralType(('B'), LengthsType()), + "y": NeuralType(('B', 'D', 'T'), MelSpectrogramType(), optional=True), + "y_lengths": NeuralType(('B'), LengthsType(), optional=True), + "gen": NeuralType(optional=True), + "noise_scale": NeuralType(optional=True), + "length_scale": NeuralType(optional=True), + } + ) + def forward(self, *, x, x_lengths, y=None, y_lengths=None, gen=False, noise_scale=0.3, length_scale=1.0): + ... + +-------- + +Learn More +========== + +Download pre-trained +`ASR `_, +`NLP `_, +and `TTS `_ models +on `NVIDIA NGC `_ to quickly get started with NeMo. + + +Become an expert on Building Conversational AI applications with +our `tutorials `_, +and `example scripts `_, + +.. note:: Most NeMo tutorial notebooks can be run on `Google Colab `_. + +`NVIDIA NeMo `_ is actively being developed on GitHub. +`Contributions `_ are welcome! + +See our `developer guide `_ for +more information on core NeMo concepts, ASR/NLP/TTS collections, +and the NeMo API. diff --git a/docs/source/index.rst b/docs/source/index.rst index c683f44a431e6..39995447b6546 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -6,6 +6,7 @@ PyTorch Lightning Documentation =============================== + .. toctree:: :maxdepth: 1 :name: start @@ -106,6 +107,13 @@ PyTorch Lightning Documentation test_set production_inference +.. toctree:: + :maxdepth: 1 + :name: Partner Domain Frameworks + :caption: Partner Domain Frameworks + + asr_tts + .. toctree:: :maxdepth: 1 :name: community