In [1]:
"""
You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.

Instructions for setting up Colab are as follows:
1. Open a new Python 3 notebook.
2. Import this notebook from GitHub (File -> Upload Notebook -> "GITHUB" tab -> copy/paste GitHub URL)
3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select "GPU" for hardware accelerator)
4. Run this cell to set up dependencies.
"""
# If you're using Google Colab and not running locally, run this cell

# install NeMo
!python -m pip install --upgrade git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[nlp]

Collecting nemo_toolkit[nlp]
  Cloning https://github.com/NVIDIA/NeMo.git (to revision main) to /tmp/pip-install-t5_s8_64/nemo-toolkit
  Running command git clone -q https://github.com/NVIDIA/NeMo.git /tmp/pip-install-t5_s8_64/nemo-toolkit
Building wheels for collected packages: nemo-toolkit
  Building wheel for nemo-toolkit (setup.py) ... [?25l[?25hdone
  Created wheel for nemo-toolkit: filename=nemo_toolkit-1.0.0a0-cp36-none-any.whl size=398162 sha256=0547674ef74d2c94f48cb4b849d2b06750ecf6effbc218ae91f1aeabdddc98e5
  Stored in directory: /tmp/pip-ephem-wheel-cache-scm6lg2t/wheels/7d/eb/d8/9e9a57ec1209168e720c7eeb7040f76c6417fff3ce490bfb05
Successfully built nemo-toolkit
Installing collected packages: nemo-toolkit
  Found existing installation: nemo-toolkit 1.0.0a0
    Uninstalling nemo-toolkit-1.0.0a0:
      Successfully uninstalled nemo-toolkit-1.0.0a0
Successfully installed nemo-toolkit-1.0.0a0


In [2]:
import os
import wget
from nemo.collections import nlp as nemo_nlp
from omegaconf import OmegaConf

[NeMo W 2020-09-09 16:10:53 experimental:28] Module <class 'nemo.collections.nlp.modules.common.huggingface.auto.AutoModelEncoder'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2020-09-09 16:10:53 experimental:28] Module <class 'nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2020-09-09 16:10:53 experimental:28] Module <class 'nemo.collections.nlp.modules.common.sequence_token_classifier.SequenceTokenClassifier'> is experimental, not ready for production and is not fully supported. Use at your own risk.




# Language models

Natural Language Processing (NLP) field experienced a huge leap in recent years due to the concept of transfer learning enabled through pretrained language models.

[BERT](https://arxiv.org/abs/1810.04805), [RoBERTa](https://arxiv.org/abs/1907.11692), [Megatron-LM](https://arxiv.org/abs/1909.08053), and many other proposed language models achieve state-of-the-art results on many NLP tasks, such as:
* question answering
* sentiment analysis
* named entity recognition and many others.

In NeMo, most of the NLP models represent a pretrained language model followed by a Token Classification layer or a Sequence Classification layer or a combination of both. By changing the language model, you can improve the performance of your final model on the specific downstream task you are solving.

With NeMo you can use either pretrain a BERT model from your data or use a pretrained language model from [HuggingFace transformers](https://github.com/huggingface/transformers) or [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) libraries.

Let's take a look at the list of available pretrained language models, note the complete list of HuggingFace model could be found at [https://huggingface.co/models](https://huggingface.co/models):


In [3]:
nemo_nlp.modules.get_pretrained_lm_models_list(include_external=True)

['megatron-bert-345m-uncased',
 'megatron-bert-345m-cased',
 'megatron-bert-uncased',
 'megatron-bert-cased',
 'biomegatron-bert-345m-uncased',
 'biomegatron-bert-345m-cased',
 'bert-base-uncased',
 'bert-large-uncased',
 'bert-base-cased',
 'bert-large-cased',
 'bert-base-multilingual-uncased',
 'bert-base-multilingual-cased',
 'bert-base-chinese',
 'bert-base-german-cased',
 'bert-large-uncased-whole-word-masking',
 'bert-large-cased-whole-word-masking',
 'bert-large-uncased-whole-word-masking-finetuned-squad',
 'bert-large-cased-whole-word-masking-finetuned-squad',
 'bert-base-cased-finetuned-mrpc',
 'bert-base-german-dbmdz-cased',
 'bert-base-german-dbmdz-uncased',
 'cl-tohoku/bert-base-japanese',
 'cl-tohoku/bert-base-japanese-whole-word-masking',
 'cl-tohoku/bert-base-japanese-char',
 'cl-tohoku/bert-base-japanese-char-whole-word-masking',
 'TurkuNLP/bert-base-finnish-cased-v1',
 'TurkuNLP/bert-base-finnish-uncased-v1',
 'wietsedv/bert-base-dutch-cased',
 'facebook/bart-base',
 '

NLP models for downstream tasks use `get_lm_model` helper function to easily switch between language models from the list above to another:

In [4]:
# use any pretrained model name from the list above
nemo_nlp.modules.get_lm_model(pretrained_model_name='distilbert-base-uncased')

DistilBertEncoder(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0): TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias=True)
          (lin2): Linear

All NeMo [NLP models](https://github.com/NVIDIA/NeMo/tree/main/examples/nlp) have an associated config file. As an example, let's examine the config file for the Named Entity Recognition (NER) model (more details about the model and the NER task could be found [here](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb)).

In [5]:
MODEL_CONFIG = "token_classification_config.yaml"

# download the model's configuration file 
if not os.path.exists(MODEL_CONFIG):
    print('Downloading config file...')
    wget.download('https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/nlp/token_classification/conf/' + MODEL_CONFIG)
else:
    print ('Config file already exists')

Config file already exists


In [6]:
# this line will print the entire config of the model
config = OmegaConf.load(MODEL_CONFIG)
print(OmegaConf.to_yaml(config))

trainer:
  gpus: 1
  num_nodes: 1
  max_epochs: 5
  max_steps: null
  accumulate_grad_batches: 1
  gradient_clip_val: 0.0
  amp_level: O0
  precision: 16
  distributed_backend: ddp
  checkpoint_callback: false
  logger: false
  row_log_interval: 1
  val_check_interval: 1.0
  resume_from_checkpoint: null
exp_manager:
  exp_dir: null
  name: token_classification_model
  create_tensorboard_logger: true
  create_checkpoint_callback: true
model:
  nemo_path: null
  label_ids: null
  dataset:
    data_dir: ???
    class_balancing: null
    max_seq_length: 128
    pad_label: O
    ignore_extra_tokens: false
    ignore_start_end: false
    use_cache: true
    num_workers: 2
    pin_memory: false
    drop_last: false
  train_ds:
    text_file: text_train.txt
    labels_file: labels_train.txt
    shuffle: true
    num_samples: -1
    batch_size: 64
  validation_ds:
    text_file: text_dev.txt
    labels_file: labels_dev.txt
    shuffle: false
    num_samples: -1
    batch_size: 64
  tokenizer:
 

For this tutorial, we are interested in the language_model part of the Named Entity Recognition Model.

In [7]:
print(OmegaConf.to_yaml(config.model.language_model))

pretrained_model_name: bert-base-uncased
lm_checkpoint: null
config_file: null
config: null



There are might be slight differences from one model to another, but most of them have the following important parameters associated with the language model:
* `pretrained_model_name` - a name of the pretrained model from either HuggingFace or Megatron-LM libraries, for example, bert-base-uncased or megatron-bert-345m-uncased.
* `lm_checkpoint` - a path to the pretrained model checkpoint if, for example, you trained a BERT model with your data
* `config_file` -  path to the model configuration file
* `config` or `config_dict` - path to the model configuration dictionary

To modify the default language model, specify the desired language model name with the `model.language_model.pretrained_model_name` argument, like this:

In [8]:
config.model.language_model.pretrained_model_name = 'roberta-base'

and then start the training as usual (please see [tutorials/nlp](https://github.com/NVIDIA/NeMo/tree/main/tutorials/nlp) for more details about training a particular model). 

You can also provide a pretrained language model checkpoint and a configuration file if available.

Note, that `pretrained_model_name` is used to set up both Language Model and Tokenizer.

All the above holds for both HuggingFace and Megatron-LM pretrained language models. Let's separately examine some specifics of finetuning with Megatron-LM and a HuggingFace model.

# Downstream tasks with Megatron and BioMegatron LM

[Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. More details could be found at [Megatron-LM github repo](https://github.com/NVIDIA/Megatron-LM).

To see the list of available Megatron-LM models in NeMo, run:


In [9]:
nemo_nlp.modules.get_megatron_lm_models_list()

['megatron-bert-345m-uncased',
 'megatron-bert-345m-cased',
 'megatron-bert-uncased',
 'megatron-bert-cased',
 'biomegatron-bert-345m-uncased',
 'biomegatron-bert-345m-cased']

If you want to use one of the available Megatron-LM models, specify its name with `model.language_model.pretrained_model_name` argument, for example:

In [10]:
config.model.language_model.pretrained_model_name = 'megatron-bert-345m-uncased'

If you have a different checkpoint or a model configuration file, use these general Megatron-LM model names:
* `megatron-bert-uncased` or 
* `megatron-bert-cased` 

and provide associated bert_config and bert_checkpoint files, as follows:

`model.language_model.pretrained_model_name=megatron-bert-uncased \
model.language_model.lm_checkpoint=<PATH_TO_CHECKPOINT> \
model.language_model.config_file=<PAHT_TO_CONFIG>`
 
 or 
 
`model.language_model.pretrained_model_name=megatron-bert-cased \
model.language_model.lm_checkpoint=<PATH_TO_CHECKPOINT> \
model.language_model.config_file=<PAHT_TO_CONFIG>`

The general Megatron-LM model names are used to download the correct vocabulary file needed to setup the model correctly. Note, the data preprocessing and model training is done in NeMo. Megatron-LM has its own set of training arguments (including tokenizer) that are ignored during finetuning in NeMo. Please see downstream task [config files and training scripts](https://github.com/NVIDIA/NeMo/tree/main/examples/nlp) for all NeMo supported arguments.

## Download pretrained model

With NeMo, the original and domain-specific Megatron-LM BERT models and model configuration files will be downloaded automatically, but they also could be downloaded with the links below:

[Megatron-LM BERT Uncased 345M (~345M parameters): https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m](https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m/files?version=v0.1_uncased)

[Megatron-LM BERT Cased 345M (~345M parameters): https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m](https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m/files?version=v0.1_cased)

[BioMegatron-LM BERT Cased 345M (~345M parameters): https://ngc.nvidia.com/catalog/models/nvidia:biomegatron345mcased](https://ngc.nvidia.com/catalog/models/nvidia:biomegatron345mcased)

[BioMegatron-LM BERT Uncased 345M (~345M parameters)](https://ngc.nvidia.com/catalog/models/nvidia:biomegatron345muncased): https://ngc.nvidia.com/catalog/models/nvidia:biomegatron345muncased

# Using any HuggingFace Pretrained Model

Currently, there are 4 HuggingFace language models that have the most extensive support in [NeMo](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/nlp/modules/common/huggingface): 

* BERT
* RoBERTa
* ALBERT
* DistilBERT

As was mentioned before, just set `model.language_model.pretrained_model_name` to the desired model name in your config and get_lm_model() will take care of the rest.

If you want to use another language model from [https://huggingface.co/models](https://huggingface.co/models), NeMo will use AutoModelEncoder.

In [11]:
new_model_name = 't5-small'
# change your config like this:
# model.language_model.pretrained_model_name = new_model_name
nemo_nlp.modules.get_lm_model(pretrained_model_name=new_model_name)

[NeMo W 2020-09-09 16:10:57 lm_utils:69] t5-small is not in get_pretrained_lm_models_list(include_external=False), will be using AutoModel from HuggingFace.
Some weights of T5Model were not initialized from the model checkpoint at t5-small and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[NeMo I 2020-09-09 16:11:00 huggingface_utils:102] Using HuggingFace AutoModel


AutoModelEncoder(
  (lm_model): T5Model(
    (shared): Embedding(32128, 512)
    (encoder): T5Stack(
      (embed_tokens): Embedding(32128, 512)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=512, out_features=512, bias=False)
                (k): Linear(in_features=512, out_features=512, bias=False)
                (v): Linear(in_features=512, out_features=512, bias=False)
                (o): Linear(in_features=512, out_features=512, bias=False)
                (relative_attention_bias): Embedding(32, 8)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseReluDense(
                (wi): Linear(in_features=512, out_features=2048, bias=False)
                (wo): Linear(in_features=2048, out

Then continue training your PyTorch Lightning Model, as usual, more details on model training could be found at [tutorials](https://github.com/NVIDIA/NeMo/tree/main/tutorials).