## Nvidia NeMo: A Powerful Toolkit for Building Conversational AI Applications

Nvidia NeMo (Neural Modules) is an open-source toolkit for building state-of-the-art conversational AI applications. It provides a collection of pre-trained models and building blocks for tasks like speech recognition, natural language understanding, text-to-speech, and more.

**Usage:**

NeMo allows developers and researchers to quickly build and deploy advanced AI applications with minimal effort. Here's how it can be used:

* **Speech Recognition:** Develop accurate and robust speech recognition systems for various languages and accents.
* **Natural Language Understanding:** Build models for tasks like intent classification, named entity recognition, and question answering.
* **Text-to-Speech:** Generate high-quality and natural-sounding speech from text.
* **Dialogue Systems:** Create chatbots and virtual assistants that can engage in natural and meaningful conversations.
* **Custom Model Training:** Leverage pre-trained models and fine-tune them for specific tasks and datasets.


**Why Use NeMo?**

* **Ease of Use:** Provides a modular and intuitive API for building and training AI models.
* **Pre-trained Models:** Offers a collection of high-quality pre-trained models for various tasks, reducing the need for extensive training from scratch.
* **Scalability:** Designed to leverage Nvidia's GPUs for fast and efficient training and inference.
* **Customization:** Allows for flexibility in adapting models to specific needs and datasets.
* **Active Development and Community Support:** Backed by Nvidia and a growing community of developers and researchers, ensuring continuous improvements and updates.

**In essence, NeMo simplifies the process of developing and deploying powerful conversational AI applications. It empowers developers to focus on building innovative solutions rather than getting bogged down with complex model development and training.**

In [1]:
# Install NeMo and required dependencies
!apt-get update && apt-get install -y libsndfile1 ffmpeg
!pip install Cython packaging
!pip install nemo_toolkit['all']


0% [Working]            Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Ign:3 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:4 https://r2u.stat.illinois.edu/ubuntu jammy Release
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:7 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:8 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:9 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Hit:10 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:11 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:12 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list e

In [2]:
!pip install huggingface-hub==0.23.2
!pip install transformers==4.40.0

[0mCollecting transformers==4.40.0
  Using cached transformers-4.40.0-py3-none-any.whl.metadata (137 kB)
Collecting tokenizers<0.20,>=0.19 (from transformers==4.40.0)
  Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Using cached transformers-4.40.0-py3-none-any.whl (9.0 MB)
Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
[0mInstalling collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.20.1
    Uninstalling tokenizers-0.20.1:
      Successfully uninstalled tokenizers-0.20.1
  Attempting uninstall: transformers
    Found existing installation: transformers 4.46.0
    Uninstalling transformers-4.46.0:
      Successfully uninstalled transformers-4.46.0
[31mERROR: pip's dependency resolver d

# Main imports

In [55]:
import os
import torch
import pytorch_lightning as pl
from nemo.collections.nlp.models import TokenClassificationModel, TextClassificationModel
from nemo.collections.nlp.models import QAModel, PunctuationCapitalizationModel
from nemo.collections.nlp.models.machine_translation import MTEncDecModel
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
from nemo.collections.asr.models import EncDecCTCModel
from nemo.collections.tts.models import FastPitchModel, HifiGanModel
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

In [13]:
import os
os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
os.environ['NEMO_LOGGING_LEVEL'] = 'ERROR'  # Reduces verbosity

# Main Demo Class

In [101]:

class NeMoDemo:
    def __init__(self):
        """Initialize NeMo demo with proper environment setup"""
        # Set environment variables
        os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
        os.environ['NEMO_LOGGING_LEVEL'] = 'ERROR'  # Reduces verbosity

        # Setup device
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
        print(f"Using device: {self.device}")

        # Print GPU info if available
        if self.device == 'cuda':
            print(f"GPU: {torch.cuda.get_device_name(0)}")
            print(f"Total GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
            print(f"Available GPU memory: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB used")

    def setup_and_verify(self):
        """Verify NeMo setup and GPU availability"""
        try:
            print("\n=== System Information ===")
            print(f"PyTorch version: {torch.__version__}")
            print(f"CUDA available: {torch.cuda.is_available()}")
            if torch.cuda.is_available():
                print(f"CUDA device: {torch.cuda.get_device_name(0)}")
                print(f"CUDA version: {torch.version.cuda}")
                print(f"Memory allocated: {torch.cuda.memory_allocated(0)/1e9:.2f} GB")
                print(f"Memory cached: {torch.cuda.memory_reserved(0)/1e9:.2f} GB")

            # Test basic model loading
            print("\n=== Testing Model Loading ===")
            try:
                test_model = TokenClassificationModel.from_pretrained("ner_en_bert")
                print("✓ Successfully loaded test model (NER)")
                del test_model
                torch.cuda.empty_cache()
            except Exception as e:
                print(f"× Error loading test model: {str(e)}")

            return True
        except Exception as e:
            print(f"Error in setup and verification: {str(e)}")
            return False

    # Optional: Helper function to inspect model methods
    def inspect_qa_model(self, model):
        """Inspect QA model methods and properties"""
        print("\n=== QA Model Inspection ===")

        # Get all public methods
        methods = [method for method in dir(model) if not method.startswith('_')]

        print("\nPublic Methods:")
        for method in methods:
            try:
                attr = getattr(model, method)
                if callable(attr):
                    import inspect
                    sig = inspect.signature(attr)
                    print(f"{method}{sig}")
            except Exception as e:
                print(f"{method}: Could not get signature - {str(e)}")

        # Get model configuration
        if hasattr(model, 'cfg'):
            print("\nModel Configuration:")
            print(model.cfg)

        # Get model parameters
        print("\nModel Parameters:")
        for name, param in model.named_parameters():
            print(f"{name}: {param.shape}")

    def demonstrate_question_answering(self):
      """Question Answering demo with NeMo QAModel"""

      print("\n=== Question Answering Demo ===")

      # Load NeMo QA model
      model = QAModel.from_pretrained("qa_squadv1.1_bertbase")
      model.eval()

      # Load the HuggingFace tokenizer separately
      tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

      # Example context and questions
      context = """
      NVIDIA Corporation is a technology company founded in 1993.
      The company designs graphics processing units (GPUs) for gaming
      and professional markets, as well as system on chip units (SoCs)
      for the mobile computing and automotive market. The company's
      headquarters are located in Santa Clara, California.
      """

      questions = [
          "When was NVIDIA founded?",
          "What does NVIDIA design?",
          "Where is NVIDIA headquartered?"
      ]

      # Process each question
      for question in questions:

          # Tokenize the question and context using HuggingFace's tokenizer
          inputs = tokenizer(
              question, context, add_special_tokens=True,
              max_length=512, truncation=True, return_tensors='pt'
          )

          # Move inputs to model's device
          input_ids = inputs['input_ids'].to(model.device)
          attention_mask = inputs['attention_mask'].to(model.device)
          token_type_ids = inputs['token_type_ids'].to(model.device)

          # Forward pass through the model
          with torch.no_grad():
              # Now include token_type_ids in the model input
              outputs = model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)

              # The output shape is [batch_size, sequence_length, 2]
              # Split the tensor into start and end logits
              start_logits, end_logits = outputs[..., 0], outputs[..., 1]

          # Get the most probable start and end positions
          start_idx = torch.argmax(start_logits)
          end_idx = torch.argmax(end_logits)

          # Convert the token IDs back to tokens
          input_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
          answer_tokens = input_tokens[start_idx:end_idx + 1]

          # Join the tokens to form the answer
          answer = tokenizer.convert_tokens_to_string(answer_tokens).strip()

          print(f"\nQuestion: {question}")
          print(f"Answer: {answer}")


    def demonstrate_ner(self):
        """Named Entity Recognition demo"""
        print("\n=== Named Entity Recognition Demo ===")
        try:
            model = TokenClassificationModel.from_pretrained("ner_en_bert")
            texts = [
                "NVIDIA released its new GPU in New York last September.",
                "Jensen Huang discussed AI developments at GTC in San Francisco."
            ]
            results = model.add_predictions(texts)
            for text, entities in zip(texts, results):
                print(f"\nText: {text}")
                print("Entities:", entities)
        except Exception as e:
            print(f"NER Error: {str(e)}")

    def demonstrate_text_classification(self):
        """Text Classification demo with improved error handling"""
        print("\n=== Text Classification Demo ===")
        try:
            # Check available models first
            available_models = TextClassificationModel.list_available_models()
            if not available_models:
                print("No Text Classification models available")
                return

            model = TextClassificationModel.from_pretrained(available_models[0])
            model.eval()

            texts = [
                "This GPU performs incredibly well for deep learning tasks.",
                "The customer service was very disappointing.",
                "The product arrived on time but was damaged."
            ]

            print("\nAnalyzing texts...")
            for text in texts:
                try:
                    prediction = model.classifying([text])[0]
                    print(f"\nText: {text}")
                    print(f"Classification: {prediction}")
                except Exception as e:
                    print(f"Error classifying text: {str(e)}")

        except Exception as e:
            print(f"Classification Error: {str(e)}")

    def demonstrate_punctuation_capitalization(self):
        """Punctuation and Capitalization demo"""
        print("\n=== Punctuation and Capitalization Demo ===")
        try:
            model = PunctuationCapitalizationModel.from_pretrained("punctuation_en_bert")
            model.eval()

            texts = [
                "nvidia is a technology company based in santa clara california that develops gpus",
                "ai and deep learning are transforming various industries globally"
            ]

            print("\nProcessing texts...")
            for text in texts:
                try:
                    processed_text = model.add_punctuation_capitalization([text])[0]
                    print(f"\nOriginal: {text}")
                    print(f"Processed: {processed_text}")
                except Exception as e:
                    print(f"Error processing text: {str(e)}")

        except Exception as e:
            print(f"Punctuation Error: {str(e)}")

    def demonstrate_machine_translation(self):
        """Machine Translation demo"""
        print("\n=== Machine Translation Demo ===")
        try:
            available_models = MTEncDecModel.list_available_models()
            if not available_models:
                print("No Translation models available")
                return

            model = MTEncDecModel.from_pretrained("nmt_en_zh_transformer24x6")
            model.eval()

            texts = [
                "NVIDIA GPUs are excellent for deep learning.",
                "Artificial intelligence is transforming the technology landscape."
            ]

            print("\nTranslating texts...")
            for text in texts:
                try:
                    translation = model.translate([text])[0]
                    print(f"\nEnglish: {text}")
                    print(f"Translated: {translation}")
                except Exception as e:
                    print(f"Error translating text: {str(e)}")

        except Exception as e:
            print(f"Translation Error: {str(e)}")

    def demonstrate_text_to_speech(self):
        """Text-to-Speech demo"""
        print("\n=== Text-to-Speech Demo ===")
        try:
            # Load FastPitch and HifiGan models
            spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
            vocoder = HifiGanModel.from_pretrained("tts_en_hifigan")

            spec_generator.eval()
            vocoder.eval()

            text = "Welcome to the NVIDIA NeMo demonstration!"
            print(f"Converting text to speech: '{text}'")

            try:
                # Generate spectrogram
                with torch.no_grad():
                    parsed = spec_generator.parse(text)
                    spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
                    audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)

                print("✓ Audio generation successful!")
                print("Note: Audio playback is not available in this environment")
                # For environments that support audio playback (e.g., Colab):
                from IPython.display import Audio
                display(Audio(audio.cpu().numpy()[0], rate=22050))

            except Exception as e:
                print(f"Error generating audio: {str(e)}")

        except Exception as e:
            print(f"TTS Error: {str(e)}")

    def list_available_models(self):
        """List all available models for each task"""
        print("\n=== Available Models ===")

        model_types = {
            "NER Models": TokenClassificationModel,
            "QA Models": QAModel,
            "Text Classification Models": TextClassificationModel,
            "Translation Models": MTEncDecModel,
            "TTS Models": FastPitchModel
        }

        for model_name, model_class in model_types.items():
            print(f"\n{model_name}:")
            try:
                models = model_class.list_available_models()
                if isinstance(models, list):
                    for model in models:
                        print(f"- {model}")
                else:
                    print(f"- {models}")
            except Exception as e:
                print(f"Error listing {model_name}: {str(e)}")

    def cleanup(self):
        """Cleanup resources"""
        print("\n=== Cleanup ===")
        torch.cuda.empty_cache()
        print("✓ GPU memory cleared")

# Setup NeMo

In [102]:
# Initialize demo
demo = NeMoDemo()

# Setup and verify
if not demo.setup_and_verify():
  print("Setup failed. Exiting.")

Using device: cuda
GPU: Tesla T4
Total GPU memory: 15.84 GB
Available GPU memory: 1.48 GB used

=== System Information ===
PyTorch version: 2.5.0+cu121
CUDA available: True
CUDA device: Tesla T4
CUDA version: 12.1
Memory allocated: 1.48 GB
Memory cached: 5.74 GB

=== Testing Model Loading ===
[NeMo I 2024-10-24 11:03:09 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo.
[NeMo I 2024-10-24 11:03:09 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo
[NeMo I 2024-10-24 11:03:09 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 11:03:16 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: /tmp/tmpkz0yfr_n/tokenizer.vocab_file, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2024-10-24 11:03:17 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-10-24 11:03:17 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    text_file: text_train.txt
    labels_file: labels_train.txt
    shuffle: true
    num_samples: -1
    batch_size: 64
    
[NeMo W 2024-10-24 11:03:17 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    text_file: text_dev.txt
    labels_file: labels_dev.txt
    shuffle: false
    num_samples: -1
    batch_size: 64
    
[NeMo W 2024-10-24 11:03:17 modelPT:178] Please call the ModelPT.setup_test_dat

[NeMo I 2024-10-24 11:03:19 save_restore_connector:249] Model TokenClassificationModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo.
✓ Successfully loaded test model (NER)


## Check avilable models

In [41]:


# List available models
demo.list_available_models()

Using device: cuda
GPU: Tesla T4
Total GPU memory: 15.84 GB
Available GPU memory: 8.96 GB used

=== System Information ===
PyTorch version: 2.5.0+cu121
CUDA available: True
CUDA device: Tesla T4
CUDA version: 12.1
Memory allocated: 8.96 GB
Memory cached: 9.68 GB

=== Testing Model Loading ===
[NeMo I 2024-10-24 09:57:48 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo.
[NeMo I 2024-10-24 09:57:48 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo
[NeMo I 2024-10-24 09:57:48 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 09:57:56 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: /tmp/tmpw72s6xeb/tokenizer.vocab_file, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2024-10-24 09:57:56 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-10-24 09:57:56 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    text_file: text_train.txt
    labels_file: labels_train.txt
    shuffle: true
    num_samples: -1
    batch_size: 64
    
[NeMo W 2024-10-24 09:57:56 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    text_file: text_dev.txt
    labels_file: labels_dev.txt
    shuffle: false
    num_samples: -1
    batch_size: 64
    
[NeMo W 2024-10-24 09:57:56 modelPT:178] Please call the ModelPT.setup_test_dat

[NeMo I 2024-10-24 09:57:59 save_restore_connector:249] Model TokenClassificationModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo.
✓ Successfully loaded test model (NER)

=== Available Models ===

NER Models:
- PretrainedModelInfo(
	pretrained_model_name=ner_en_bert,
	description=The model was trained on GMB (Groningen Meaning Bank) corpus for entity recognition and achieves 74.61 F1 Macro score.,
	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/ner_en_bert/versions/1.10/files/ner_en_bert.nemo
)

QA Models:
- PretrainedModelInfo(
	pretrained_model_name=qa_squadv1.1_bertbase,
	description=Question answering model finetuned from NeMo BERT Base Uncased on SQuAD v1.1 dataset which obtains an exact match (EM) score of 82.78% and an F1 score of 89.97%.,
	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/qa_squadv1_1_bertbase/versions/1.0.0rc1/files/qa_squadv1.1_bertbase.nemo
)
- Pretrained

# NER

In [42]:
# Run demonstrations
demo.demonstrate_ner()



=== Named Entity Recognition Demo ===
[NeMo I 2024-10-24 09:58:03 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo.
[NeMo I 2024-10-24 09:58:03 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo
[NeMo I 2024-10-24 09:58:03 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 09:58:10 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: /tmp/tmps3_5_ke6/tokenizer.vocab_file, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2024-10-24 09:58:11 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-10-24 09:58:11 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    text_file: text_train.txt
    labels_file: labels_train.txt
    shuffle: true
    num_samples: -1
    batch_size: 64
    
[NeMo W 2024-10-24 09:58:11 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    text_file: text_dev.txt
    labels_file: labels_dev.txt
    shuffle: false
    num_samples: -1
    batch_size: 64
    
[NeMo W 2024-10-24 09:58:11 modelPT:178] Please call the ModelPT.setup_test_dat

[NeMo I 2024-10-24 09:58:14 save_restore_connector:249] Model TokenClassificationModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/ner_en_bert/8186f86c83b11d70b43b9ead695e7eda/ner_en_bert.nemo.
[NeMo I 2024-10-24 09:58:14 token_classification_dataset:123] Setting Max Seq length to: 16
[NeMo I 2024-10-24 09:58:14 data_preprocessing:404] Some stats of the lengths of the sequences:
[NeMo I 2024-10-24 09:58:14 data_preprocessing:406] Min: 14 |                  Max: 16 |                  Mean: 15.0 |                  Median: 15.0
[NeMo I 2024-10-24 09:58:14 data_preprocessing:412] 75 percentile: 15.50
[NeMo I 2024-10-24 09:58:14 data_preprocessing:413] 99 percentile: 15.98


[NeMo W 2024-10-24 09:58:14 token_classification_dataset:152] 0 are longer than 16


[NeMo I 2024-10-24 09:58:14 token_classification_dataset:155] *** Example ***
[NeMo I 2024-10-24 09:58:14 token_classification_dataset:156] i: 0
[NeMo I 2024-10-24 09:58:14 token_classification_dataset:157] subtokens: [CLS] n ##vid ##ia released its new gp ##u in new york last september . [SEP]
[NeMo I 2024-10-24 09:58:14 token_classification_dataset:158] loss_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[NeMo I 2024-10-24 09:58:14 token_classification_dataset:159] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[NeMo I 2024-10-24 09:58:14 token_classification_dataset:160] subtokens_mask: 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0

Text: NVIDIA released its new GPU in New York last September.
Entities: NVIDIA[B-ORG] released its new GPU in New[B-LOC] York[I-LOC] last September[B-TIME].

Text: Jensen Huang discussed AI developments at GTC in San Francisco.
Entities: Jensen[B-PER] Huang[I-PER] discussed AI developments at GTC[B-ORG] in San[B-LOC] Francisco[I-LOC].


# QnA

In [92]:
demo.demonstrate_question_answering()




=== Question Answering Demo ===
[NeMo I 2024-10-24 10:50:10 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/qa_squadv1.1_bertbase/ec1e93dd2fa3f3cbba91db856722c541/qa_squadv1.1_bertbase.nemo.
[NeMo I 2024-10-24 10:50:10 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/qa_squadv1.1_bertbase/ec1e93dd2fa3f3cbba91db856722c541/qa_squadv1.1_bertbase.nemo
[NeMo I 2024-10-24 10:50:10 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 10:50:19 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: /tmp/tmptjlfqksg/tokenizer.vocab_file, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2024-10-24 10:50:19 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-10-24 10:50:19 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    file: /datasets/squad/v1.1/train-v1.1.json
    batch_size: 3
    shuffle: true
    num_samples: -1
    num_workers: 2
    drop_last: false
    pin_memory: false
    
[NeMo W 2024-10-24 10:50:19 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    file: /datasets/squad/v1.1/dev-v1.1.json
    batch_size: 3
    shuffle: false
    num_samples: -1
    num_workers: 2
    drop_last: false
    pin_memory: 

[NeMo I 2024-10-24 10:50:21 bert_module:44] Restoring weights from /WS/bert/bert_base_uncased_mlm_final_1074591/BERT-STEP-2285714.pt


[NeMo W 2024-10-24 10:50:21 bert_module:47] Path /WS/bert/bert_base_uncased_mlm_final_1074591/BERT-STEP-2285714.pt not found
[NeMo W 2024-10-24 10:50:21 modelPT:258] You tried to register an artifact under config key=language_model.config_file but an artifact for it has already been registered.


[NeMo I 2024-10-24 10:50:21 save_restore_connector:249] Model QAModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/qa_squadv1.1_bertbase/ec1e93dd2fa3f3cbba91db856722c541/qa_squadv1.1_bertbase.nemo.

Question: When was NVIDIA founded?
Answer: 1993

Question: What does NVIDIA design?
Answer: graphics processing units

Question: Where is NVIDIA headquartered?
Answer: santa clara, california


In [93]:
# Optional: Inspect model
model = QAModel.from_pretrained("qa_squadv1.1_bertbase")
demo.inspect_qa_model(model)

[NeMo I 2024-10-24 10:51:45 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/qa_squadv1.1_bertbase/ec1e93dd2fa3f3cbba91db856722c541/qa_squadv1.1_bertbase.nemo.
[NeMo I 2024-10-24 10:51:45 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/qa_squadv1.1_bertbase/ec1e93dd2fa3f3cbba91db856722c541/qa_squadv1.1_bertbase.nemo
[NeMo I 2024-10-24 10:51:45 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 10:51:58 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: /tmp/tmpwjc427l0/tokenizer.vocab_file, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2024-10-24 10:51:59 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-10-24 10:51:59 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    file: /datasets/squad/v1.1/train-v1.1.json
    batch_size: 3
    shuffle: true
    num_samples: -1
    num_workers: 2
    drop_last: false
    pin_memory: false
    
[NeMo W 2024-10-24 10:51:59 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    file: /datasets/squad/v1.1/dev-v1.1.json
    batch_size: 3
    shuffle: false
    num_samples: -1
    num_workers: 2
    drop_last: false
    pin_memory: 

[NeMo I 2024-10-24 10:52:00 bert_module:44] Restoring weights from /WS/bert/bert_base_uncased_mlm_final_1074591/BERT-STEP-2285714.pt


[NeMo W 2024-10-24 10:52:00 bert_module:47] Path /WS/bert/bert_base_uncased_mlm_final_1074591/BERT-STEP-2285714.pt not found
[NeMo W 2024-10-24 10:52:00 modelPT:258] You tried to register an artifact under config key=language_model.config_file but an artifact for it has already been registered.


[NeMo I 2024-10-24 10:52:01 save_restore_connector:249] Model QAModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/qa_squadv1.1_bertbase/ec1e93dd2fa3f3cbba91db856722c541/qa_squadv1.1_bertbase.nemo.

=== QA Model Inspection ===

Public Methods:
add_module(name: str, module: Optional[ForwardRef('Module')]) -> None
all_gather(data: Union[torch.Tensor, Dict, List, Tuple], group: Optional[Any] = None, sync_grads: bool = False) -> Union[torch.Tensor, Dict, List, Tuple]
apply(fn: Callable[[ForwardRef('Module')], NoneType]) -> ~T
backward(loss: torch.Tensor, *args: Any, **kwargs: Any) -> None
bert_model(*args, **kwargs)
bfloat16() -> ~T
buffers(recurse: bool = True) -> Iterator[torch.Tensor]
children() -> Iterator[ForwardRef('Module')]
classifier(*args, **kwargs)
clip_gradients(optimizer: torch.optim.optimizer.Optimizer, gradient_clip_val: Union[int, float, NoneType] = None, gradient_clip_algorithm: Optional[str] = None) -> None
compile(*args, **kwargs)
configure_callback

# Text Classification

In [94]:
demo.demonstrate_text_classification()



=== Text Classification Demo ===
No Text Classification models available


# Punctuation and Capitalization

In [95]:
demo.demonstrate_punctuation_capitalization()



=== Punctuation and Capitalization Demo ===
[NeMo I 2024-10-24 10:53:45 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/punctuation_en_bert/93b0369b5e0d147f61895feffcbcfb88/punctuation_en_bert.nemo.
[NeMo I 2024-10-24 10:53:45 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/punctuation_en_bert/93b0369b5e0d147f61895feffcbcfb88/punctuation_en_bert.nemo
[NeMo I 2024-10-24 10:53:45 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 10:53:52 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: /tmp/tmpre1x1ohe/tokenizer.vocab_file, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2024-10-24 10:53:52 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-10-24 10:53:52 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    use_audio: false
    audio_file: null
    sample_rate: 16000
    use_bucketing: true
    batch_size: 32
    preload_audios: true
    use_tarred_dataset: false
    label_info_save_dir: null
    text_file: text_train.txt
    labels_file: labels_train.txt
    tokens_in_batch: null
    max_seq_length: 128
    num_samples: -1
    use_cache: true
    cache_dir: null
    get_label_frequences: false
    verbose: true
    n_jobs: 0
    tar_metadata_file: null
    tar_shuffle_n: 1
    shard_strategy: scatter
    shuffle: true
    drop_last: false
    pin_memory: true
    num_workers: 8
    persistent_wor

[NeMo I 2024-10-24 10:53:54 save_restore_connector:249] Model PunctuationCapitalizationModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/punctuation_en_bert/93b0369b5e0d147f61895feffcbcfb88/punctuation_en_bert.nemo.

Processing texts...
[NeMo I 2024-10-24 10:53:54 punctuation_capitalization_model:1167] Using batch size 1 for inference
[NeMo I 2024-10-24 10:53:54 punctuation_capitalization_infer_dataset:127] Max length: 18
[NeMo I 2024-10-24 10:53:54 data_preprocessing:404] Some stats of the lengths of the sequences:
[NeMo I 2024-10-24 10:53:54 data_preprocessing:406] Min: 16 |                  Max: 16 |                  Mean: 16.0 |                  Median: 16.0
[NeMo I 2024-10-24 10:53:54 data_preprocessing:412] 75 percentile: 16.00
[NeMo I 2024-10-24 10:53:54 data_preprocessing:413] 99 percentile: 16.00


100%|██████████| 1/1 [00:00<00:00, 45.89batch/s]


Original: nvidia is a technology company based in santa clara california that develops gpus
Processed: Nvidia is a technology company based in Santa Clara, California, that develops Gpus.
[NeMo I 2024-10-24 10:53:54 punctuation_capitalization_model:1167] Using batch size 1 for inference
[NeMo I 2024-10-24 10:53:54 punctuation_capitalization_infer_dataset:127] Max length: 11
[NeMo I 2024-10-24 10:53:54 data_preprocessing:404] Some stats of the lengths of the sequences:
[NeMo I 2024-10-24 10:53:54 data_preprocessing:406] Min: 9 |                  Max: 9 |                  Mean: 9.0 |                  Median: 9.0
[NeMo I 2024-10-24 10:53:54 data_preprocessing:412] 75 percentile: 9.00
[NeMo I 2024-10-24 10:53:54 data_preprocessing:413] 99 percentile: 9.00



100%|██████████| 1/1 [00:00<00:00, 42.50batch/s]


Original: ai and deep learning are transforming various industries globally
Processed: Ai and deep Learning are transforming various industries globally.





# Machine Translation

In [99]:
demo.demonstrate_machine_translation()




=== Machine Translation Demo ===
[NeMo I 2024-10-24 10:57:57 cloud:68] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/nmt_en_zh_transformer24x6/versions/1.5/files/en_zh_24x6.nemo to /root/.cache/torch/NeMo/NeMo_1.23.0/en_zh_24x6/e514abee8ddb6edba76343b6a5b53394/en_zh_24x6.nemo
[NeMo I 2024-10-24 10:59:08 common:924] Instantiating model from pre-trained checkpoint
[NeMo I 2024-10-24 10:59:41 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmp_dmz_0ku/75ff3fef67d84ce59bb34e2c0299c9b2_tokenizer.encoder.32000.BPE.model with r2l: False.
[NeMo I 2024-10-24 10:59:41 tokenizer_utils:179] Getting YouTokenToMeTokenizer with model: /tmp/tmp_dmz_0ku/d65d217cc9984019b65cf43cf579561a_tokenizer.decoder.32000.BPE.model with r2l: False.


[NeMo W 2024-10-24 10:59:42 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    src_file_name: null
    tgt_file_name: null
    use_tarred_dataset: true
    tar_files:
    - /data/tarred_dataset_old_plus_paracrawl_4k_tokens/parallel.batches.tokens.4000._OP_0..769_CL_.tar
    - /data/tarred_dataset_old_plus_paracrawl_en_zh_r2l_distilled_4k_tokens/parallel.batches.tokens.4000._OP_0..745_CL_.tar
    metadata_file:
    - /data/tarred_dataset_old_plus_paracrawl_4k_tokens/metadata.tokens.4000.json
    - /data/tarred_dataset_old_plus_paracrawl_en_zh_r2l_distilled_4k_tokens/metadata.tokens.4000.json
    lines_per_dataset_fragment: 1000000
    num_batches_per_tarfile: 100
    shard_strategy: scatter
    tokens_in_batch: 512
    clean: true
    max_seq_length: 512
    min_seq_length: 1
    cache_ids: false
    cache_data_per_node: false
    use

[NeMo I 2024-10-24 10:59:53 save_restore_connector:249] Model MTEncDecModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/en_zh_24x6/e514abee8ddb6edba76343b6a5b53394/en_zh_24x6.nemo.

Translating texts...

English: NVIDIA GPUs are excellent for deep learning.
Translated: NVIDIA GPU 非常适合深度学习。

English: Artificial intelligence is transforming the technology landscape.
Translated: 人工智能正在改变技术格局。


# Text to Speech

In [103]:
demo.demonstrate_text_to_speech()


=== Text-to-Speech Demo ===
[NeMo I 2024-10-24 11:03:26 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/tts_en_fastpitch_align/b7d086a07b5126c12d5077d9a641a38c/tts_en_fastpitch_align.nemo.
[NeMo I 2024-10-24 11:03:26 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/tts_en_fastpitch_align/b7d086a07b5126c12d5077d9a641a38c/tts_en_fastpitch_align.nemo
[NeMo I 2024-10-24 11:03:26 common:924] Instantiating model from pre-trained checkpoint


 NeMo-text-processing :: INFO     :: Creating ClassifyFst grammars.
INFO:NeMo-text-processing:Creating ClassifyFst grammars.
[NeMo W 2024-10-24 11:04:03 en_us_arpabet:66] apply_to_oov_word=None, This means that some of words will remain unchanged if they are not handled by any of the rules in self.parse_one_word(). This may be intended if phonemes and chars are both valid inputs, otherwise, you may see unexpected deletions in your input.
[NeMo W 2024-10-24 11:04:03 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.tts.torch.data.TTSDataset
      manifest_filepath: /ws/LJSpeech/nvidia_ljspeech_train_clean_ngc.json
      sample_rate: 22050
      sup_data_path: /raid/LJSpeech/supplementary
      sup_data_types:
      - align_prior_matrix
      - pitch
      n_fft: 1024
      win_length: 1024
  

[NeMo I 2024-10-24 11:04:03 features:289] PADDING: 1
[NeMo I 2024-10-24 11:04:03 save_restore_connector:249] Model FastPitchModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/tts_en_fastpitch_align/b7d086a07b5126c12d5077d9a641a38c/tts_en_fastpitch_align.nemo.
[NeMo I 2024-10-24 11:04:03 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.23.0/tts_hifigan/e6da322f0f7e7dcf3f1900a9229a7e69/tts_hifigan.nemo.
[NeMo I 2024-10-24 11:04:03 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.23.0/tts_hifigan/e6da322f0f7e7dcf3f1900a9229a7e69/tts_hifigan.nemo
[NeMo I 2024-10-24 11:04:03 common:924] Instantiating model from pre-trained checkpoint


[NeMo W 2024-10-24 11:04:11 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.tts.data.datalayers.MelAudioDataset
      manifest_filepath: /home/fkreuk/data/train_finetune.txt
      min_duration: 0.75
      n_segments: 8192
    dataloader_params:
      drop_last: false
      shuffle: true
      batch_size: 64
      num_workers: 4
    
[NeMo W 2024-10-24 11:04:11 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    dataset:
      _target_: nemo.collections.tts.data.datalayers.MelAudioDataset
      manifest_filepath: /home/fkreuk/data/val_finetune.txt
      min_duration: 3
      n_segments: 66150


[NeMo I 2024-10-24 11:04:11 features:289] PADDING: 0


[NeMo W 2024-10-24 11:04:11 features:266] Using torch_stft is deprecated and has been removed. The values have been forcibly set to False for FilterbankFeatures and AudioToMelSpectrogramPreprocessor. Please set exact_pad to True as needed.


[NeMo I 2024-10-24 11:04:11 features:289] PADDING: 0
[NeMo I 2024-10-24 11:04:13 save_restore_connector:249] Model HifiGanModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0/tts_hifigan/e6da322f0f7e7dcf3f1900a9229a7e69/tts_hifigan.nemo.
Converting text to speech: 'Welcome to the NVIDIA NeMo demonstration!'
✓ Audio generation successful!
Note: Audio playback is not available in this environment


In [104]:

# Cleanup
print("\n=== Cleanup ===")
torch.cuda.empty_cache()


=== Cleanup ===
