<center>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/mt_pal_bert_tutorial_mrpc_rte.ipynb)

![Alt Text](https://static.tildacdn.com/tild3762-3666-4530-b139-666433343863/_DeepPavlov_-5.png)

</center>
<br>

# **Multi-Task Pal Bert in DeepPavlov**

The DeepPavlov Library consists of a lot of state of the art NLP techniques and Multi-task BERT is one of them.

Multi-task learning shares information between related tasks, reducing the number of parameters required. State of the art results across natural language understanding tasks in the GLUE benchmark has been previously used transfer learning from a large task: unsupervised training with BERT, where a separate BERT model was fine-tuned for each task.

In multi-task BERT we share a single BERT model along with a small number of task-specific parameters and match the performance of separately fine-tuned models with fewer parameters on the GLUE benchmark.

The Multi-Task Pal Bert model is based on the Bert-n-Pals paper: [arxiv.org/pdf/1902.02671.pdf](https://arxiv.org/pdf/1902.02671.pdf)


 This model uses the additional task specific **Projected Attention Layers or PALs** in parallel with self attention layers.


Along with using these additional layers, during training the, rather than training sequentially on each task, it picks the task to be trained on by making a random based on some provided *probs*. These *probs* are directly propotional to the *size of the training data*.


- `List of Probabilities = [List of probabilities for each task (proportional to train size)]`

- `Task_id = np.random.choice(number_tasks, List of Probabilities)`

- `Train only on the batch of that Task_id`




Using Multi-Task Bert we can achieve better results while using less memory.
To use multitask pal bert we need to make use that we need to use **4 basic components**, which are:


- `multitask_reader`

- `multitask_pal_iterator`

- `multitask_pal_bert_preprocessor`

- `multitask_pal_bert`

In this tutorial we will train a multitask model on two GLUE Benchmark datasets MRPC and RTE, you can read more about the GULE Benchmark here: [gluebenchmark.com](https://gluebenchmark.com/) 



## Download the datasets

First we need to download the data locally and unzip it in their respective folders.

In [None]:
! wget https://dl.fbaipublicfiles.com/glue/data/RTE.zip && unzip RTE.zip
! wget https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt -P /content/MRPC
! wget https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt -P /content/MRPC

--2021-08-17 08:50:09--  https://dl.fbaipublicfiles.com/glue/data/RTE.zip
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.74.142, 172.67.9.4, 104.22.75.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.74.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 697150 (681K) [application/zip]
Saving to: ‘RTE.zip’


2021-08-17 08:50:11 (709 KB/s) - ‘RTE.zip’ saved [697150/697150]

Archive:  RTE.zip
   creating: RTE/
  inflating: RTE/dev.tsv             
  inflating: RTE/test.tsv            
  inflating: RTE/train.tsv           
--2021-08-17 08:50:11--  https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 172.67.9.4, 104.22.75.142, 104.22.74.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|172.67.9.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1047044 (1023K) [text/plain]
Saving to: 

## Explore the Data

Next we explore the data, we open the tsv files for RTE using Pandas. 

**NOTE** - *Quoting* should be set to `3` to avoid errors while reading the *train.tsv* and *dev.tsv*  

In [None]:
import pandas as pd

df_train  = pd.read_csv("/content/RTE/train.tsv", sep="\t", quoting=3)
df_valid  = pd.read_csv("/content/RTE/dev.tsv", sep="\t", quoting=3)
df_train.head()

Unnamed: 0,index,sentence1,sentence2,label
0,0,No Weapons of Mass Destruction Found in Iraq Yet.,Weapons of Mass Destruction Found in Iraq.,not_entailment
1,1,"A place of sorrow, after Pope John Paul II die...",Pope Benedict XVI is the new leader of the Rom...,entailment
2,2,Herceptin was already approved to treat the si...,Herceptin can be used to treat breast cancer.,entailment
3,3,"Judie Vivian, chief executive at ProMedica, a ...",The previous name of Ho Chi Minh City was Saigon.,entailment
4,4,A man is due in court later charged with the m...,Paul Stewart Hutchinson is accused of having s...,not_entailment


We can see that this is a sentance pair entailment prediction task.

Next we create a new directory `glue_csv` and within that two more directories for the task data where we will save the train and valid data(dev.tsv) in CSV format as we will be using the `basic_classification_reader` to read the data.

## Make the new directories

In [None]:
! mkdir /content/glue_csv /content/glue_csv/RTE /content/glue_csv/MRPC

Save the data in CSV format

In [None]:
df_train.to_csv("glue_csv/RTE/train.csv")
df_valid.to_csv("glue_csv/RTE/valid.csv")

We will do the same with with the data for MRPC, if you want you can also explore this data by following the same setps as above.

In [None]:
df_train  = pd.read_csv("/content/MRPC/msr_paraphrase_train.txt", sep="\t", quoting=3)
df_valid  = pd.read_csv("/content/MRPC/msr_paraphrase_test.txt", sep="\t", quoting=3)
df_train.to_csv("/content/glue_csv/MRPC/train.csv")
df_valid.to_csv("/content/glue_csv/MRPC/valid.csv")

## Install DeepPavlov Using pip

In [None]:
! pip install deeppavlov

### Now lets us Explore the config for this tutorial

For this tutorial we will train a multitask model on two tasks MRPC and RTE, there is already a config file to make things easier, lets explore the config file.

In [2]:
import json
from pprint import pprint
from deeppavlov import configs

train_config = json.load(open(configs.tutorials.multitask_pal_bert.mt_pal_bert_mrpc_rte_tutorial))

pprint(train_config)

{'chainer': {'in': ['x_mrpc_with_id', 'x_rte_with_id'],
             'in_y': ['y_mrpc', 'y_rte'],
             'out': ['y_mrpc_pred_labels', 'y_rte_pred_labels'],
             'pipe': [{'class_name': 'multitask_pal_bert_preprocessor',
                       'in': ['x_mrpc_with_id', 'x_rte_with_id'],
                       'out': ['task_id', 'x_mrpc', 'x_rte']},
                      {'class_name': 'input_splitter',
                       'in': ['x_mrpc'],
                       'keys_to_extract': [0, 1],
                       'out': ['x_mrpc1', 'x_mrpc2']},
                      {'class_name': 'input_splitter',
                       'in': ['x_rte'],
                       'keys_to_extract': [0, 1],
                       'out': ['x_rte1', 'x_rte2']},
                      {'class_name': 'torch_transformers_preprocessor',
                       'in': ['x_mrpc1', 'x_mrpc2'],
                       'max_seq_length': 128,
                       'out': ['bert_features_mrpc'],
            

Now, we can see the the config has a lot of components in the config.
Here we will cover the components related to Multi-Task Pal Bert. For the other compnents you may refer the documentation [here](http://docs.deeppavlov.ai/en/master/intro/quick_start.html).

As discussed above to use multitask pal bert we need to make use that we need to use **4 basic components**, which are:


- `multitask_reader`

- `multitask_pal_iterator`

- `multitask_pal_bert_preprocessor`

- `multitask_pal_bert`

We will explore these components one by one

## Multi-Task Reader

In [3]:
pprint(train_config["dataset_reader"])

{'class_name': 'multitask_reader',
 'data_path': 'null',
 'tasks': {'mrpc': {'data_path': '{GLUE_CSV}/MRPC',
                    'reader_class_name': 'basic_classification_reader',
                    'x': ['#1 String', '#2 String'],
                    'y': 'Quality'},
           'rte': {'data_path': '{GLUE_CSV}/RTE',
                   'reader_class_name': 'basic_classification_reader',
                   'x': ['sentence1', 'sentence2'],
                   'y': 'label'}}}


`multitask_reader`: As we can see this is a collection of the readers we use for each task. In the tasks dict we have defined the `task_name` along with the all other parameters required by the `reader_class_name`.

Here we are using the `basic_classification_reader` to read the data, you can use other readers in DeepPavlov or create your own reader as well.

## Multi-Task Pal Bert Iterator

This iterator is specific to this pal bert model as it takes care of switching the task while training as explained above

In [4]:
pprint(train_config["dataset_iterator"])

{'class_name': 'multitask_pal_bert_iterator',
 'gradient_accumulation_steps': '{GRADIENT_ACC_STEPS}',
 'num_train_epochs': '{NUM_TRAIN_EPOCHS}',
 'steps_per_epoch': '{STEPS_PER_EPOCH}',
 'tasks': {'mrpc': {'iterator_class_name': 'basic_classification_iterator',
                    'seed': 12},
           'rte': {'iterator_class_name': 'basic_classification_iterator',
                   'seed': 12}}}


`multitask_pal_bert_iterator`: We can see this is also a collection for other iterators present in DeepPavlov but it also required some other parameters:

  - `num_train_epochs`: Total number to training epochs as this is also used for task selection during training.

  - `steps_per_epoch`: Number of steps taken per epoch, this is also used for task selection during training.

  - `gradient_accumulation_steps`: Number of gradient accmulation steps. This is required because we train on the same task for the number of gradient accumulation steps.

  - `tasks`: This is similar to the dict we have the the multitask_reader just the readers are replaced with iterators and the parameters for that iterator should be provided.

## Multi-Task Pal Bert Precrocessor

This is used to extract the `task_id` form the inputs for each task, later we would need to pass this `task_id` in the `multitask_pal_bert` component.

In [7]:
pprint(train_config["chainer"]["pipe"][0])

{'class_name': 'multitask_pal_bert_preprocessor',
 'in': ['x_mrpc_with_id', 'x_rte_with_id'],
 'out': ['task_id', 'x_mrpc', 'x_rte']}


## The Model - Multi-Task Pal Bert

In [9]:
pprint(train_config["chainer"]["pipe"][7])

{'class_name': 'multitask_pal_bert',
 'gradient_accumulation_steps': '{GRADIENT_ACC_STEPS}',
 'id': 'multitask_pal_bert',
 'in': ['task_id', 'bert_features_mrpc', 'bert_features_rte'],
 'in_distribution': {'mrpc': 1, 'rte': 1},
 'in_y': ['y_ids_mrpc', 'y_ids_rte'],
 'in_y_distribution': {'mrpc': 1, 'rte': 1},
 'learning_rate_drop_div': 2.0,
 'learning_rate_drop_patience': 2,
 'load_path': '{MODELS_PATH}/model',
 'optimizer_parameters': {'lr': 4e-05},
 'out': ['y_mrpc_pred_probas', 'y_rte_pred_probas'],
 'pretrained_bert': '{PRETRAINED_BERT}/pytorch_model.bin',
 'return_probas': True,
 'save_path': '{MODELS_PATH}/model',
 'steps_per_epoch': '{STEPS_PER_EPOCH}',
 'tasks': {'mrpc': {'n_classes': '#vocab_mrpc.len'},
           'rte': {'n_classes': '#vocab_rte.len'}}}


`multitask_pal_bert`: The is the model as in the paper Bert-n-Pals along with the other function to load, train and predict. Parameters used:


  - `in`: This is the input to the component. When using the `multitask_pal_bert` the first input should be `task_id` which we extracted using the `multitask_pal_bert_preprocessor`.

  - `in_distribution`: This is a dict that contains the number of input parameters that would be needed for each task, since these two are classification tasks we would only require 1 input feature for each task.

  -  `in_y_distribution`: This is similar to `in_distribution` but for the labels.

  - `num_train_epochs`: Total number to training epochs as this is also used for task selection during training.

  - `steps_per_epoch`: Number of steps taken per epoch, this is also used for task selection during training.

  - `gradient_accumulation_steps`: Number of gradient accmulation steps. This is required because we train on the same task for the number of gradient accumulation steps.

  - `pretrained_bert`: Path to the pretrained bert-base-uncased pytorch model from hugging face.
  
  - `tasks`: dict of task names with
    - `n_classes`: Number of prediction classes for the task.
    - `task_type`: Defaults to classification. Can also be set to `"regression"` if the task is a regression task. 
 
 All other parameters are not precific to this model.


## Before Training - Install the dependencies

In [13]:
! python -m deeppavlov install mt_pal_bert_mrpc_rte_tutorial

2021-08-18 12:02:03.399 INFO in 'deeppavlov.core.common.file'['file'] at line 32: Interpreting 'mt_pal_bert_mrpc_rte_tutorial' as '/usr/local/lib/python3.7/dist-packages/deeppavlov/configs/tutorials/multitask_pal_bert/mt_pal_bert_mrpc_rte_tutorial.json'
Collecting transformers==4.6.0
  Downloading transformers-4.6.0-py3-none-any.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 5.3 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 26.9 MB/s 
Collecting huggingface-hub==0.0.8
  Downloading huggingface_hub-0.0.8-py3-none-any.whl (34 kB)
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.0.8 tokenizers-0.10.3 transformers-4.6.0
Collecting git+https://github.com/deepmipt/bert.git@feat/multi_gpu
  Cloning https://github.com/deepmipt/bert.g

## Training the model

We set `download=True` as we a training the model from stratch

In [None]:
from deeppavlov import train_model

model = train_model(train_config, download=True)

2021-08-17 08:57:38.423 INFO in 'deeppavlov.core.data.utils'['utils'] at line 95: Downloading from https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin to /content/pretrained_bert/pytorch_model.bin
100%|██████████| 440M/440M [00:03<00:00, 130MB/s]
2021-08-17 08:57:52.453 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/models/mrpc.dict]
2021-08-17 08:57:52.466 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 101: [saving vocabulary to /content/models/mrpc.dict]
2021-08-17 08:57:52.468 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/models/rte.dict]
2021-08-17 08:57:52.479 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 101: [saving vocabulary to /content/models/rte.dict]
2021-08-17 08:57:52.488 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 210: Load path /content/m

{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.3142, "accuracy": 0.6646, "f1": 0.642}, "time_spent": "0:01:07", "epochs_done": 0, "batches_seen": 0, "train_examples_seen": 0, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.5, "accuracy": 0.75, "f1": 0.6}, "time_spent": "0:04:20", "epochs_done": 1, "batches_seen": 210, "train_examples_seen": 3360, "losses": [0.2801889181137085, 0.32641640305519104]}}


2021-08-17 09:03:21.290 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.3963
2021-08-17 09:03:21.292 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:03:21.298 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.3963, "accuracy": 0.6772, "f1": 0.4722}, "time_spent": "0:05:27", "epochs_done": 1, "batches_seen": 210, "train_examples_seen": 3360, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.5, "accuracy": 0.8125, "f1": 0.5}, "time_spent": "0:08:45", "epochs_done": 2, "batches_seen": 420, "train_examples_seen": 6720, "losses": [0.20978917181491852, 0.3954741358757019]}}


2021-08-17 09:07:46.301 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.502
2021-08-17 09:07:46.303 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:07:46.313 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.502, "accuracy": 0.772, "f1": 0.5236}, "time_spent": "0:09:52", "epochs_done": 2, "batches_seen": 420, "train_examples_seen": 6720, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 1.0, "accuracy": 1.0, "f1": 1.0}, "time_spent": "0:13:10", "epochs_done": 3, "batches_seen": 630, "train_examples_seen": 10080, "losses": [0.2072412222623825, 0.3265302777290344]}}


2021-08-17 09:12:11.280 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.5704
2021-08-17 09:12:11.281 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:12:11.289 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.5704, "accuracy": 0.8208, "f1": 0.6287}, "time_spent": "0:14:17", "epochs_done": 3, "batches_seen": 630, "train_examples_seen": 10080, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.875, "accuracy": 0.9375, "f1": 0.9091}, "time_spent": "0:17:35", "epochs_done": 4, "batches_seen": 840, "train_examples_seen": 13440, "losses": [0.2256726622581482, 0.16952921450138092]}}


2021-08-17 09:16:36.410 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 212: Did not improve on the multitask_accuracy of 0.5704


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.548, "accuracy": 0.8076, "f1": 0.592}, "time_spent": "0:18:42", "epochs_done": 4, "batches_seen": 840, "train_examples_seen": 13440, "impatience": 1, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.875, "accuracy": 0.9375, "f1": 0.9412}, "time_spent": "0:21:55", "epochs_done": 5, "batches_seen": 1050, "train_examples_seen": 16800, "losses": [0.04493341222405434, 0.2591383159160614]}}


2021-08-17 09:20:56.192 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 212: Did not improve on the multitask_accuracy of 0.5704
2021-08-17 09:20:56.193 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 219: ----------Current LR is decreased in 2.0 times----------
2021-08-17 09:20:56.199 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 210: Load path /content/models/model is given.
2021-08-17 09:20:56.202 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 219: Load path /content/models/model.pth.tar exists.
2021-08-17 09:20:56.208 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 221: Initializing `MultiTaskPalBert` from saved.
2021-08-17 09:20:58.818 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 166: Bert Model Weights Loaded.
2021-08-17 09:20:58.852 INFO in 'dee

{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.5606, "accuracy": 0.8185, "f1": 0.7083}, "time_spent": "0:23:01", "epochs_done": 5, "batches_seen": 1050, "train_examples_seen": 16800, "impatience": 2, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.875, "accuracy": 1.0, "f1": 0.8571}, "time_spent": "0:26:16", "epochs_done": 6, "batches_seen": 1260, "train_examples_seen": 20160, "losses": [0.15154743194580078, 0.05896526947617531]}}


2021-08-17 09:25:17.313 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.5761
2021-08-17 09:25:17.315 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:25:17.322 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.5761, "accuracy": 0.826, "f1": 0.6636}, "time_spent": "0:27:23", "epochs_done": 6, "batches_seen": 1260, "train_examples_seen": 20160, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 1.0, "accuracy": 1.0, "f1": 1.0}, "time_spent": "0:30:41", "epochs_done": 7, "batches_seen": 1470, "train_examples_seen": 23520, "losses": [0.011690197512507439, 0.25385284423828125]}}


2021-08-17 09:29:42.611 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.5853
2021-08-17 09:29:42.613 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:29:42.619 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.5853, "accuracy": 0.8363, "f1": 0.6588}, "time_spent": "0:31:48", "epochs_done": 7, "batches_seen": 1470, "train_examples_seen": 23520, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.9375, "accuracy": 0.9375, "f1": 1.0}, "time_spent": "0:35:07", "epochs_done": 8, "batches_seen": 1680, "train_examples_seen": 26880, "losses": [0.10359664261341095, 0.027497243136167526]}}


2021-08-17 09:34:08.24 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.5933
2021-08-17 09:34:08.26 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:34:08.32 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.5933, "accuracy": 0.8357, "f1": 0.6799}, "time_spent": "0:36:13", "epochs_done": 8, "batches_seen": 1680, "train_examples_seen": 26880, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.9375, "accuracy": 0.9375, "f1": 1.0}, "time_spent": "0:39:32", "epochs_done": 9, "batches_seen": 1890, "train_examples_seen": 30240, "losses": [0.0039653959684073925, 0.011465203948318958]}}


2021-08-17 09:38:33.514 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.6048
2021-08-17 09:38:33.516 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:38:33.525 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.6048, "accuracy": 0.842, "f1": 0.6781}, "time_spent": "0:40:39", "epochs_done": 9, "batches_seen": 1890, "train_examples_seen": 30240, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 16, "metrics": {"multitask_accuracy": 0.9375, "accuracy": 0.9375, "f1": 1.0}, "time_spent": "0:43:58", "epochs_done": 10, "batches_seen": 2100, "train_examples_seen": 33600, "losses": [0.005740293301641941, 0.0423043817281723]}}


2021-08-17 09:42:59.48 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 207: Improved best multitask_accuracy of 0.6071
2021-08-17 09:42:59.49 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 209: Saving model
2021-08-17 09:42:59.57 INFO in 'deeppavlov.core.models.torch_model'['torch_model'] at line 191: Saving model to /content/models/model.pth.tar.


{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.6071, "accuracy": 0.8386, "f1": 0.7181}, "time_spent": "0:45:04", "epochs_done": 10, "batches_seen": 2100, "train_examples_seen": 33600, "impatience": 0, "patience_limit": 5}}


2021-08-17 09:43:13.285 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/models/mrpc.dict]
2021-08-17 09:43:13.295 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/models/rte.dict]
2021-08-17 09:43:13.300 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 210: Load path /content/models/model is given.
2021-08-17 09:43:13.302 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 219: Load path /content/models/model.pth.tar exists.
2021-08-17 09:43:13.306 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 221: Initializing `MultiTaskPalBert` from saved.
2021-08-17 09:43:15.784 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 166: Bert Model Weights Loaded.
2021-08-17 09:43:15.796 INFO in 'deeppavlo

{"valid": {"eval_examples_count": 1741, "metrics": {"multitask_accuracy": 0.6071, "accuracy": 0.8386, "f1": 0.7181}, "time_spent": "0:01:08"}}


2021-08-17 09:44:32.364 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/models/mrpc.dict]
2021-08-17 09:44:32.367 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/models/rte.dict]
2021-08-17 09:44:32.373 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 210: Load path /content/models/model is given.
2021-08-17 09:44:32.380 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 219: Load path /content/models/model.pth.tar exists.
2021-08-17 09:44:32.381 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 221: Initializing `MultiTaskPalBert` from saved.
2021-08-17 09:44:34.878 INFO in 'deeppavlov.models.multitask_pal_bert.multitask_pal_bert'['multitask_pal_bert'] at line 166: Bert Model Weights Loaded.
2021-08-17 09:44:34.899 INFO in 'deeppavlo

In [None]:
model([[-1, ["The increase reflects lower credit losses and favorable interest rates.", "The gain came as a result of fewer credit losses and lower interest rates."]]], 
      [[-1, ["Mount Olympus towers up from the center of the earth.", "Mount Olympus is in the center of the earth."]]])

In [None]:
model([[-1, ["The increase reflects lower credit losses and favorable interest rates.", "Mount Olympus is in the center of the earth."]]], 
      [[-1, ["Mount Olympus towers up from the center of the earth.", "The gain came as a result of fewer credit losses and lower interest rates."]]])