# Train the model to restore punctuation and capitalization

## Install dependencies

We will use the NeMo's punctuation and capitalization model and Tatoeba dataset.

In [1]:
%%capture

BRANCH = 'main'
!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[nlp]

In [2]:
%%capture
#can take up to 10 minutes

!git clone https://github.com/NVIDIA/apex
%cd apex

!pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" ./

In [1]:
from nemo.utils.exp_manager import exp_manager
from nemo.collections import nlp as nemo_nlp

import os
import wget 
import torch
import pytorch_lightning as pl
from omegaconf import OmegaConf

[NeMo W 2022-04-02 11:15:25 experimental:28] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-04-02 11:15:31 __init__:23] `pynini` is not installed ! 
    Please run the `nemo_text_processing/setup.sh` script prior to usage of this toolkit.


## Download the data

To prepare the data please refer to the Habr article (in Russian) or to the official NeMo's documentation.

Adopted Python scripts can be found in the averkij's github repo.

We'll download the previously prepared training data.

In [33]:
%cd /content
!gdown --id 1LDfesVeRco_YcP_bvzfFSJ7-BhFgnXC5

%mkdir data

/content
Downloading...
From: https://drive.google.com/uc?id=1LDfesVeRco_YcP_bvzfFSJ7-BhFgnXC5
To: /content/dataset.zip
100% 3.45M/3.45M [00:00<00:00, 170MB/s]


In [34]:
!unzip /content/dataset.zip -d /content/data

Archive:  /content/dataset.zip
replace /content/data/labels_dev.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: /content/data/labels_dev.txt  
  inflating: /content/data/labels_train.txt  
  inflating: /content/data/text_dev.txt  
  inflating: /content/data/text_train.txt  


In [2]:
DATA_DIR = "/content/data"
WORK_DIR = "/content/work"

In [3]:
! ls -l $DATA_DIR

total 67868
-rw-r--r-- 1 root root  3421144 Apr  2 10:48 cached.text_dev.DistilBertTokenizer.max_seq_length512.vocab30522.num_samples10000.punctuation_capitalization.pkl
-rw-r--r-- 1 root root 21846208 Apr  2 11:13 cached.text_train.DistilBertTokenizer.max_seq_length512.vocab30522.num_samples50000.punctuation_capitalization.pkl
-rw-r--r-- 1 root root 27503273 Apr  2 10:48 cached.text_train.DistilBertTokenizer.max_seq_length512.vocab30522.num_samples80000.punctuation_capitalization.pkl
drwxr-xr-x 2 root root     4096 Apr  2 10:48 label_id_files_for_nemo_checkpoint
-rw-r--r-- 1 root root   726188 Apr  2  2022 labels_dev.txt
-rw-r--r-- 1 root root  2935193 Apr  2  2022 labels_train.txt
-rw-r--r-- 1 root root  2538430 Apr  2  2022 text_dev.txt
-rw-r--r-- 1 root root 10506330 Apr  2  2022 text_train.txt


In [4]:
# let's take a look at the data 
print('Text:')
! head -n 5 $DATA_DIR/text_train.txt

print('\nLabels:')
! head -n 5 $DATA_DIR/labels_train.txt

Text:
один раз в жизни я делаю хорошее дело и оно бесполезно давайте чтонибудь попробуем
мне пора идти спать что ты делаешь
что это сегодня 18 июня и это день рождения мюриэл
с днём рождения мюриэл мюриэл сейчас 20
пароль muiriel у меня нет слов

Labels:
OU OO OO OO OO OO OO .O OU OO .O OU OO !O
OU OO OO .O OU OO ?O
OU ?O OU OO ,O OO OO OO OO !U
OU OO ,O !U OU OO .O
OU .U OU OO OO .O


# Model Configuration

In the Punctuation and Capitalization Model, we are jointly training two token-level classifiers on top of the pretrained [BERT](https://arxiv.org/pdf/1810.04805.pdf) model: 
- one classifier to predict punctuation and
- the other one - capitalization.

The model is defined in a config file which declares multiple important sections. They are:
- **model**: All arguments that are related to the Model - language model, token classifiers, optimizer and schedulers, dataset and any other related information

- **trainer**: Any argument to be passed to PyTorch Lightning

See [docs](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/punctuation_and_capitalization.html#training-punctuation-and-capitalization-model) for full config description.

In [5]:
MODEL_CONFIG = "punctuation_capitalization_config.yaml"
TOKENS_IN_BATCH = 1024
MAX_SEQ_LENGTH = 64
LEARNING_RATE = 0.00002
NUM_SAMPLES = 50000

config_dir = WORK_DIR + '/configs/'
os.makedirs(config_dir, exist_ok=True)
if not os.path.exists(config_dir + MODEL_CONFIG):
    print('Downloading config file...')
    wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/token_classification/conf/' + MODEL_CONFIG, config_dir)
else:
    print ('config file is already exists')

config file is already exists


In [6]:
config_path = f'{WORK_DIR}/configs/{MODEL_CONFIG}'
config = OmegaConf.load(config_path)
config.model.train_ds.ds_item = DATA_DIR
config.model.validation_ds.ds_item=DATA_DIR

del config.model.test_ds

# Building the PyTorch Lightning Trainer

NeMo models are primarily PyTorch Lightning modules - and therefore are entirely compatible with the PyTorch Lightning ecosystem!

Let's first instantiate a Trainer object!

In [9]:
cuda = 1 if torch.cuda.is_available() else 0
config.trainer.gpus = cuda
config.trainer.precision = 16 if torch.cuda.is_available() else 32
config.trainer.strategy = 'dp'

trainer = pl.Trainer(**config.trainer)

      f"The flag `devices={devices}` will be ignored, "
    
Using 16bit native Automatic Mixed Precision (AMP)
      "Setting `max_steps = None` is deprecated in v1.5 and will no longer be supported in v1.7."
    
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..


# Setting up a NeMo Experiment¶

NeMo has an experiment manager that handles logging and checkpointing for us, so let's use it!

In [10]:
exp_dir = exp_manager(trainer, config.get("exp_manager", None))
exp_dir = str(exp_dir)

      "`Trainer.num_gpus` was deprecated in v1.6 and will be removed in v1.8."
    


[NeMo I 2022-04-02 11:16:05 exp_manager:283] Experiments will be logged at /content/nemo_experiments/Punctuation_and_Capitalization/2022-04-02_11-16-05
[NeMo I 2022-04-02 11:16:05 exp_manager:649] TensorboardLogger has been set up


      rank_zero_deprecation("`Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.")
    
[NeMo W 2022-04-02 11:16:05 exp_manager:882] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to -1. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.


# Model Training

Before initializing the model, we might want to modify some of the model configs. For example, we might want to modify the pretrained BERT model.

In [11]:
print(nemo_nlp.modules.get_pretrained_lm_models_list())

#change for the appropriate model
PRETRAINED_BERT_MODEL = "DeepPavlov/distilrubert-tiny-cased-conversational-v1"

['bert-base-uncased', 'bert-large-uncased', 'bert-base-cased', 'bert-large-cased', 'bert-base-multilingual-uncased', 'bert-base-multilingual-cased', 'bert-base-chinese', 'bert-base-german-cased', 'bert-large-uncased-whole-word-masking', 'bert-large-cased-whole-word-masking', 'bert-large-uncased-whole-word-masking-finetuned-squad', 'bert-large-cased-whole-word-masking-finetuned-squad', 'bert-base-cased-finetuned-mrpc', 'bert-base-german-dbmdz-cased', 'bert-base-german-dbmdz-uncased', 'cl-tohoku/bert-base-japanese', 'cl-tohoku/bert-base-japanese-whole-word-masking', 'cl-tohoku/bert-base-japanese-char', 'cl-tohoku/bert-base-japanese-char-whole-word-masking', 'TurkuNLP/bert-base-finnish-cased-v1', 'TurkuNLP/bert-base-finnish-uncased-v1', 'wietsedv/bert-base-dutch-cased', 'distilbert-base-uncased', 'distilbert-base-uncased-distilled-squad', 'distilbert-base-cased', 'distilbert-base-cased-distilled-squad', 'distilbert-base-german-cased', 'distilbert-base-multilingual-cased', 'distilbert-base

In [12]:
NUM_SAMPLES = 50000

config.trainer.max_epochs = 15
config.model.language_model.pretrained_model_name = PRETRAINED_BERT_MODEL
config.model.train_ds.tokens_in_batch = TOKENS_IN_BATCH
config.model.validation_ds.tokens_in_batch = TOKENS_IN_BATCH
config.model.optim.lr = LEARNING_RATE
config.model.train_ds.num_samples = NUM_SAMPLES
config.model.validation_ds.num_samples = 10000

Now, we are ready to initialize our model. During the model initialization call, the dataset and data loaders we'll be prepared for training and evaluation.
Also, the pretrained BERT model will be downloaded, note it can take up to a few minutes depending on the size of the chosen BERT model.

In [13]:
model = nemo_nlp.models.PunctuationCapitalizationModel(cfg=config.model, trainer=trainer)

[NeMo I 2022-04-02 11:16:10 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: DeepPavlov/distilrubert-tiny-cased-conversational-v1, vocab_file: None, special_tokens_dict: {}, and use_fast: False


Using eos_token, but it is not set yet.
Using bos_token, but it is not set yet.
      "`Trainer.num_gpus` was deprecated in v1.6 and will be removed in v1.8."
    


[NeMo I 2022-04-02 11:16:11 punctuation_capitalization_dataset:984] Features restored from /content/data/cached.text_train.DistilBertTokenizer.max_seq_length512.vocab30522.num_samples50000.punctuation_capitalization.pkl


Batch mark up:   0%|          | 0/50000 [00:00<?, ?query/s][NeMo W 2022-04-02 11:16:12 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=136. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49984 to 49990.
[NeMo W 2022-04-02 11:16:12 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=160. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49991 to 49997.
Batch mark up: 100%|██████████| 50000/50000 [00:00<00:00, 479995.24query/s]
Batch building: 100%|██████████| 1035/1035 [00:01<00:00, 967.11batch/s]
      cpuset_checked))
    


[NeMo I 2022-04-02 11:16:13 punctuation_capitalization_dataset:984] Features restored from /content/data/cached.text_dev.DistilBertTokenizer.max_seq_length512.vocab30522.num_samples10000.punctuation_capitalization.pkl


Batch mark up: 100%|██████████| 10000/10000 [00:00<00:00, 373162.04query/s]
Batch building: 100%|██████████| 125/125 [00:00<00:00, 568.15batch/s]
[NeMo W 2022-04-02 11:16:13 lm_utils:80] DeepPavlov/distilrubert-tiny-cased-conversational-v1 is not in get_pretrained_lm_models_list(include_external=False), will be using AutoModel from HuggingFace.
Some weights of the model checkpoint at DeepPavlov/distilrubert-tiny-cased-conversational-v1 were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly 

In [14]:
trainer.fit(model)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2022-04-02 11:16:19 modelPT:497] The lightning trainer received accelerator: <pytorch_lightning.accelerators.gpu.GPUAccelerator object at 0x7f99609c8a90>. We recommend to use 'ddp' instead.
      "`Trainer.num_gpus` was deprecated in v1.6 and will be removed in v1.8."
    


[NeMo I 2022-04-02 11:16:19 modelPT:587] Optimizer config = Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.9, 0.999)
        eps: 1e-08
        lr: 2e-05
        weight_decay: 0.0
    )
[NeMo I 2022-04-02 11:16:19 lr_scheduler:837] Scheduler "<nemo.core.optim.lr_scheduler.WarmupAnnealing object at 0x7f994fec4b10>" 
    will be used during training (effective maximum steps = 3105) - 
    Parameters : 
    (warmup_steps: null
    warmup_ratio: 0.1
    last_epoch: -1
    max_steps: 3105
    )



  | Name             | Type              | Params
-------------------------------------------------------
0 | metrics          | ModuleDict        | 0     
1 | bert_model       | DistilBertEncoder | 10.3 M
2 | punct_classifier | TokenClassifier   | 1.3 K 
3 | capit_classifier | TokenClassifier   | 530   
4 | loss             | CrossEntropyLoss  | 0     
5 | agg_loss         | AggregatorLoss    | 0     
-------------------------------------------------------
10.3 M    Trainable params
0         Non-trainable params
10.3 M    Total params
20.591    Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

      cpuset_checked))
    


[NeMo I 2022-04-02 11:16:20 punctuation_capitalization_model:333] Punctuation report: 
    label                                                precision    recall       f1           support   
    O (label_id: 0)                                         72.80      45.21      55.78       1024
    ! (label_id: 1)                                          0.00       0.00       0.00          6
    , (label_id: 2)                                          6.07      16.30       8.85         92
    . (label_id: 3)                                          9.29      13.64      11.05        154
    ? (label_id: 4)                                          0.00       0.00       0.00         16
    -------------------
    micro avg                                               38.62      38.62      38.62       1292
    macro avg                                               17.63      15.03      15.14       1292
    weighted avg                                            59.24      38.62      46.16  

Training: 0it [00:00, ?it/s]

    


Validation: 0it [00:00, ?it/s]

[NeMo I 2022-04-02 11:17:05 punctuation_capitalization_model:333] Punctuation report: 
    label                                                precision    recall       f1           support   
    O (label_id: 0)                                         97.44      98.47      97.95      47166
    ! (label_id: 1)                                         20.00       0.59       1.15        339
    , (label_id: 2)                                         91.73      58.27      71.27       3731
    . (label_id: 3)                                         80.46      95.75      87.45       8504
    ? (label_id: 4)                                         69.04      49.40      57.59       1490
    -------------------
    micro avg                                               93.91      93.91      93.91      61230
    macro avg                                               71.74      60.50      63.08      61230
    weighted avg                                            93.62      93.91      93.35  


Batch mark up:   0%|          | 0/50000 [00:00<?, ?query/s][A[NeMo W 2022-04-02 11:17:06 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=136. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49984 to 49990.
[NeMo W 2022-04-02 11:17:06 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=160. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49991 to 49997.
Batch mark up: 100%|██████████| 50000/50000 [00:00<00:00, 542637.28query/s]

Batch building:   0%|          | 0/1035 [00:00<?, ?batch/s][A
Batch building:   6%|▌         | 64/1035 [00:00<00:01, 639.86batch/s][A
Batch building:  13%|█▎        | 135/1035 [00:00<00:01, 677.93batch/s][A
Batch building:  20%|█▉        | 206/1035 [00:00<0

Validation: 0it [00:00, ?it/s]

[NeMo I 2022-04-02 11:17:52 punctuation_capitalization_model:333] Punctuation report: 
    label                                                precision    recall       f1           support   
    O (label_id: 0)                                         97.92      98.25      98.09      47166
    ! (label_id: 1)                                         20.93       2.65       4.71        339
    , (label_id: 2)                                         91.29      60.65      72.88       3731
    . (label_id: 3)                                         81.51      95.31      87.87       8504
    ? (label_id: 4)                                         68.26      65.97      67.10       1490
    -------------------
    micro avg                                               94.24      94.24      94.24      61230
    macro avg                                               71.98      64.57      66.13      61230
    weighted avg                                            94.09      94.24      93.86  


Batch mark up:   0%|          | 0/50000 [00:00<?, ?query/s][A[NeMo W 2022-04-02 11:17:52 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=136. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49984 to 49990.
[NeMo W 2022-04-02 11:17:52 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=160. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49991 to 49997.
Batch mark up: 100%|██████████| 50000/50000 [00:00<00:00, 541944.55query/s]

Batch building:   0%|          | 0/1035 [00:00<?, ?batch/s][A
Batch building:   5%|▍         | 50/1035 [00:00<00:01, 497.18batch/s][A
Batch building:  11%|█         | 111/1035 [00:00<00:01, 562.34batch/s][A
Batch building:  17%|█▋        | 180/1035 [00:00<0

Validation: 0it [00:00, ?it/s]

[NeMo I 2022-04-02 11:18:37 punctuation_capitalization_model:333] Punctuation report: 
    label                                                precision    recall       f1           support   
    O (label_id: 0)                                         98.01      98.22      98.11      47166
    ! (label_id: 1)                                         28.30       4.42       7.65        339
    , (label_id: 2)                                         91.95      61.81      73.92       3731
    . (label_id: 3)                                         81.38      95.94      88.06       8504
    ? (label_id: 4)                                         69.48      64.16      66.71       1490
    -------------------
    micro avg                                               94.33      94.33      94.33      61230
    macro avg                                               73.82      64.91      66.89      61230
    weighted avg                                            94.25      94.33      93.98  


Batch mark up:   0%|          | 0/50000 [00:00<?, ?query/s][A
Batch mark up:  92%|█████████▏| 46106/50000 [00:00<00:00, 461036.26query/s][A[NeMo W 2022-04-02 11:18:38 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=136. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49984 to 49990.
[NeMo W 2022-04-02 11:18:38 punctuation_capitalization_dataset:1211] Could not create batch with multiple of 8 size. Probably there is a too long sequence in the dataset. current_max_length=160. Batch size will be reduced to 7. tokens_in_batch=1024. The batch includes sequences from 49991 to 49997.
Batch mark up: 100%|██████████| 50000/50000 [00:00<00:00, 416600.02query/s]

Batch building:   0%|          | 0/1035 [00:00<?, ?batch/s][A
Batch building:   6%|▋         | 67/1035 [00:00<00:01, 669.24batch/s][A
Batch building:  13%|█▎        | 136/1035 [00

## Test

For better quality make sure that your train dataset is balanced and train a model a little more.

In [20]:
queries = [
        'меня зовут сергей а как тебя',
        'подскажи пожалуйста сегодня вторник или среда',
        'закрой за мной дверь я ухожу'
    ]

inference_results = model.add_punctuation_capitalization(queries)

for query, result in zip(queries, inference_results):
    print(f'Query   : {query}')
    print(f'Combined: {result.strip()}\n')

[NeMo I 2022-04-02 11:19:39 punctuation_capitalization_model:1056] Using batch size 3 for inference
[NeMo I 2022-04-02 11:19:39 punctuation_capitalization_infer_dataset:91] Max length: 9
[NeMo I 2022-04-02 11:19:39 data_preprocessing:404] Some stats of the lengths of the sequences:
[NeMo I 2022-04-02 11:19:39 data_preprocessing:410] Min: 7 |                  Max: 7 |                  Mean: 7.0 |                  Median: 7.0
[NeMo I 2022-04-02 11:19:39 data_preprocessing:412] 75 percentile: 7.00
[NeMo I 2022-04-02 11:19:39 data_preprocessing:413] 99 percentile: 7.00


100%|██████████| 1/1 [00:00<00:00, 44.16batch/s]

Query   : меня зовут сергей а как тебя
Combined: Меня зовут Сергей. А как тебя?

Query   : подскажи пожалуйста сегодня вторник или среда
Combined: Подскажи, пожалуйста, сегодня вторник или среда.

Query   : закрой за мной дверь я ухожу
Combined: Закрой за мной дверь. Я ухожу.






## Mount drive and save

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!mkdir /content/drive/MyDrive/nemo

In [None]:
%cp -r /content/nemo_experiments /content/drive/MyDrive/nemo/exp2

In [None]:
%cp -r /content/work /content/drive/MyDrive/nemo/exp2


## Load saved checkpoint

In [None]:
!ls /content/drive/MyDrive/nemo/nemo_experiments/Punctuation_and_Capitalization/2022-01-31_10-13-25/checkpoints/

 Punctuation_and_Capitalization.nemo
'Punctuation_and_Capitalization--val_loss=0.1276-epoch=0.ckpt'
'Punctuation_and_Capitalization--val_loss=0.1276-epoch=0-last.ckpt'
'Punctuation_and_Capitalization--val_loss=0.1491-epoch=0.ckpt'


In [None]:
%cp -r /content/drive/MyDrive/nemo/nemo_experiments/Punctuation_and_Capitalization/2022-01-31_10-13-25/checkpoints/Punctuation_and_Capitalization.nemo /content

In [None]:
%cp -r /content/drive/MyDrive/nemo/nemo_experiments/Punctuation_and_Capitalization/2022-01-31_10-13-25/checkpoints/Punctuation_and_Capitalization--val_loss=0.1276-epoch=0-last.ckpt /content

In [None]:
checkpoint_path = "/content/Punctuation_and_Capitalization.nemo"

In [None]:
pretrained_model = nemo_nlp.models.PunctuationCapitalizationModel.restore_from(checkpoint_path)

[NeMo I 2022-02-01 09:00:11 tokenizer_utils:126] Getting HuggingFace AutoTokenizer with pretrained_model_name: DeepPavlov/rubert-base-cased, vocab_file: /tmp/tmp6k9g1ro3/0456104bb45245438462aa1eb7174c15_vocab.txt, special_tokens_dict: {}, and use_fast: False


Downloading:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.57M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Using eos_token, but it is not set yet.
Using bos_token, but it is not set yet.
[NeMo W 2022-02-01 09:00:13 modelPT:143] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    use_tarred_dataset: false
    ds_item: /content/data
    text_file: text_train.txt
    labels_file: labels_train.txt
    shuffle: true
    num_samples: 50000
    tokens_in_batch: 1024
    max_seq_length: 512
    n_jobs: 0
    tar_metadata_file: null
    tar_shuffle_n: 1
    
[NeMo W 2022-02-01 09:00:13 modelPT:150] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    use_tarred_dataset: false
    ds_item: /content/data
    text_file: text_dev.txt
    labels_file: labels_dev.txt
    shuffle: fal

Downloading:   0%|          | 0.00/681M [00:00<?, ?B/s]

Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not used when initializing BertEncoder

[NeMo I 2022-02-01 09:00:51 save_restore_connector:154] Model PunctuationCapitalizationModel was successfully restored from /content/Punctuation_and_Capitalization.nemo.
