If you have not already installed DeepPavlov, you should run

In [None]:
!pip install deeppavlov>=1.1.1

Multitask models are supported in DeepPavlov starting from the version 1.1.1.

We will see what the multitask configs in DeepPavlov look like, on the example of config multitask/multitask_example.json.

## Dataset reader

As a dataset reader, we use the `multitask_reader` class. This class must have a parameter tasks, which is a dictionary `{task name: parameters for the task}`. The order of the tasks in this dictionary must be exactly the same as in the later stages of the config.
Any parameter for any task, if it does not exist in that dictionary, is drawn from another parameter - task_defaults. This parameter contains the default dictionary for any task and this dictionary can also be empty.
The dataset_reader, path, train, validation, and test fields must exist for all tasks - either as default fields or as fields that are explicitly given in the dictionary.
```
{
    "dataset_reader": {
   	 "class_name": "multitask_reader",
   	 "task_defaults": {
   		 "class_name": "huggingface_dataset_reader",
   		 "path": "glue",
   		 "train": "train",
   		 "valid": "validation",
   		 "test": "test"
   	 },
   	 "tasks": {
   		 "cola": {
   			 "name": "cola"
   		 },
   		 "rte": {
   			 "name": "rte"
   		 },
   		 "stsb": {
   			 "name": "stsb"
   		 },
   		 "copa": {
   			 "path": "super_glue",
   			 "name": "copa"
   		 },
   		 "conll": {
   			 "class_name": "conll2003_reader",
   			 "use_task_defaults": false,
   			 "data_path": "{DOWNLOADS_PATH}/conll2003/",
   			 "dataset_name": "conll2003",
   			 "provide_pos": false
   		 },
   		 "squad": {
   			 "class_name": "squad_dataset_reader",
   			 "dataset": "squad",
   			 "url": "http://files.deeppavlov.ai/datasets/squad-v1.1.tar.gz",
   			 "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/"
   		 }
   	 }
    },
```

## Dataset iterator

As a dataset iterator, we use the `multitask_iterator` class. In this class, we also pass the dictionary tasks, which contain an iterator class name and parameters(if they are set) for all tasks analogously to the `multitask_reader`. 
We also set in the same class number of gradient accumulation steps, training epochs, and batch size(these parameters need to be also in the trainer).
We also pass into the `multitask_iterator` sampling mode, which defines for every task a probability that the samples will be drawn from its set of samples. We support uniform sampling (the same sampling probability for all tasks), plain sampling(sampling probability is proportional to the sample number), and annealed sampling.

```
"dataset_iterator": {
   	 "class_name": "multitask_iterator",
   	 "num_train_epochs": "{NUM_TRAIN_EPOCHS}",
   	 "gradient_accumulation_steps": "{GRADIENT_ACC_STEPS}",
   	 "seed": 42,
   	 "task_defaults": {
   		 "class_name": "huggingface_dataset_iterator",
   		 "label": "label",
   		 "use_label_name": false,
   		 "seed": 42
   	 },
   	 "tasks": {
   		 "cola": {
   			 "features": ["sentence"]
   		 },
   		 "rte": {
   			 "features": ["sentence1", "sentence2"]
   		 },
   		 "stsb": {
   			 "features": ["sentence1", "sentence2"]
   		 },
   		 "copa": {
   			 "features": ["contexts", "choices"]
   		 },
   		 "conll": {
   			 "class_name": "basic_classification_iterator",
   			 "seed": 42,
   			 "use_task_defaults": false
   		 },
   		 "squad": {
   			 "class_name": "squad_iterator",
   			 "seed": 1337,
   			 "shuffle": true
   		 }
   	 }
    },

```
## Chainer

The chainer utilizes elements for every task separately. 

However, to streamline the multi-task preprocessing, we have introduced the optional `multitask_pipeline_preprocessor` class. For this class, one should set the vocab_file for the tokenizer and either the default preprocessor class name or the list of preprocessor names(not the ones used in configs, but the ones defined in the library). The user can also set whether to do lowercase and whether to print the first example for the debugging purpose.

```
	"chainer": {
   	 "in": ["x_cola", "x_rte", "x_stsb", "x_copa", "x_conll", "x_squad"],
   	 "in_y": ["y_cola", "y_rte", "y_stsb", "y_copa", "y_conll", "y_squad"],
   	 "pipe": [{
   			 "class_name": "multitask_input_splitter",
                            	"keys_to_extract": [0,1],
   			 "in": ["x_squad"],
   			 "out": ["question_raw_squad", "context_raw_squad"]
   		 },
   		 {
   			 "class_name": "multitask_input_splitter",
                            	"keys_to_extract": [0,1],
   			 "in": ["y_squad"],
   			 "out": ["ans_raw_squad", "ans_raw_start_squad"]
   		 },
   		 {
   			 "class_name": "torch_squad_transformers_preprocessor",
   			 "add_token_type_ids": true,
   			 "vocab_file": "{BACKBONE}",
   			 "do_lower_case": true,
   			 "max_seq_length": 384,
   			 "in": [
   				 "question_raw_squad",
   				 "context_raw_squad"
   			 ],
   			 "out": [
   				 "bert_features_squad",
   				 "subtokens_squad",
   				 "split_context_squad"
   			 ]
   		 },
   		 {
   			 "class_name": "squad_bert_mapping",
   			 "do_lower_case": true,
   			 "in": [
   				 "split_context_squad",
   				 "bert_features_squad",
   				 "subtokens_squad"
   			 ],
   			 "out": [
   				 "subtok2chars_squad",
   				 "char2subtoks_squad"
   			 ]
   		 },
   		 {
   			 "class_name": "squad_bert_ans_preprocessor",
   			 "do_lower_case": true,
   			 "in": [
   				 "ans_raw_squad",
   				 "ans_raw_start_squad",
   				 "char2subtoks_squad"
   			 ],
   			 "out": [
   				 "ans_squad",
   				 "ans_start_squad",
   				 "ans_end_squad"
   			 ]
   		 },
   		 {
   			 "class_name": "multitask_pipeline_preprocessor",
   			 "possible_keys_to_extract": [0, 1],
   			 "preprocessors": [
   				 "TorchTransformersPreprocessor",
   				 "TorchTransformersPreprocessor",
   				 "TorchTransformersPreprocessor",
   				 "TorchTransformersMultiplechoicePreprocessor",
   				 "TorchTransformersNerPreprocessor"
   			 ],
   			 "do_lower_case": true,
   			 "n_task": 5,
   			 "vocab_file": "{BACKBONE}",
   			 "max_seq_length": 200,
   			 "max_subword_length": 15,
   			 "token_masking_prob": 0.0,
   			 "return_features": true,
   			 "in": ["x_cola", "x_rte", "x_stsb", "x_copa", "x_conll"],
   			 "out": [
   				 "bert_features_cola",
   				 "bert_features_rte",
   				 "bert_features_stsb",
   				 "bert_features_copa",
   				 "bert_features_conll"
   			 ]
   		 },
   		 {
   			 "id": "vocab_conll",
   			 "class_name": "simple_vocab",
   			 "unk_token": ["O"],
   			 "pad_with_zeros": true,
   			 "save_path": "{MODELS_PATH}/tag.dict",
   			 "load_path": "{MODELS_PATH}/tag.dict",
   			 "fit_on": ["y_conll"],
   			 "in": ["y_conll"],
   			 "out": ["y_ids_conll"]
   		 },
```

## Multitask transformer

As a class for multi-task training, we use the `multitask_transformer` class. The backbone model for multi-task training is defined in this class - it is advisable to make it the same as used for the tokenization in the previous components.
In this class, one should give as a `tasks` parameter a dictionary that has exactly the same order of tasks as in the reader, iterator, and `in_x` and `in_y` components in the chainer.
For every task, a number of options and the task_type needs to be set. 
You give `in` ( bert_features, the same order as tasks have) and `in_y` ( y for every task, also the same order) and you obtain probabilities if return_probas=True or labels
 ids if return_probas=False. ( Apart from the regression task, where always scores are returned(sts-b in config) and ner task, where always label ids for every token are returned(conll in config).

```
   	 	{
   			 "id": "multitask_transformer",
   			 "class_name": "multitask_transformer",
   			 "optimizer_parameters": {
   				 "lr": 2e-5
   			 },
   			 "gradient_accumulation_steps": "{GRADIENT_ACC_STEPS}",
   			 "learning_rate_drop_patience": 2,
   			 "learning_rate_drop_div": 2.0,
   			 "return_probas": true,
   			 "backbone_model": "{BACKBONE}",
   			 "save_path": "{MODEL_PATH}",
   			 "load_path": "{MODEL_PATH}",
   			 "tasks": {
   				 "cola": {
   					 "type": "classification",
   					 "options": 2
   				 },
   				 "rte": {
   					 "type": "classification",
   					 "options": 2
   				 },
   				 "stsb": {
   					 "type": "regression",
   					 "options": 1
   				 },
   				 "copa": {
   					 "type": "multiple_choice",
   					 "options": 2
   				 },
   				 "conll": {
   					 "type": "sequence_labeling",
   					 "options": "#vocab_conll.len"
   				 },
   				 "squad":{"type":"question_answering",
   				 "options":2}
   			 },
   			 "in": [
   				 "bert_features_cola",
   				 "bert_features_rte",
   				 "bert_features_stsb",
   				 "bert_features_copa",
   				 "bert_features_conll",
   				 "bert_features_squad"
   			 ],
   			 "in_y": ["y_cola", "y_rte", "y_stsb", "y_copa", "y_ids_conll", "ans_squad"],
   			 "out": [
   				 "y_cola_pred_probas",
   				 "y_rte_pred_probas",
   				 "y_stsb_pred",
   				 "y_copa_pred_probas",
   				 "y_conll_pred_ids",
   				 "results_squad"
   			 ]
   		 },
```
## Multitask metrics
After the multitask_transformer, almost all other components are the same as the single-task setting or as mentioned before…

```
   	 	{
   			 "class_name": "multitask_input_splitter",
   			 "in": ["results_squad"],
                            	"keys_to_extract": [0,1,2,3,4],
   			 "out": ["ans_start_predicted_squad",
   				 "ans_end_predicted_squad",
   				 "logits_squad",
   				 "scores_squad",
   				 "inds_squad"
   			 ]
   		 },
   		 {
   			 "class_name": "squad_bert_ans_postprocessor",
   			 "in": [
   				 "ans_start_predicted_squad",
   				 "ans_end_predicted_squad",
   				 "split_context_squad",
   				 "subtok2chars_squad",
   				 "subtokens_squad",
   				 "inds_squad"
   			 ],
   			 "out": [
   				 "ans_predicted_squad",
   				 "ans_start_predicted_squad",
   				 "ans_end_predicted_squad"
   			 ]
   		 },
   		 {
   			 "in": ["y_cola_pred_probas"],
   			 "out": ["y_cola_pred_ids"],
   			 "class_name": "proba2labels",
   			 "max_proba": true
   		 },
   		 {
   			 "in": ["y_rte_pred_probas"],
   			 "out": ["y_rte_pred_ids"],
   			 "class_name": "proba2labels",
   			 "max_proba": true
   		 },
   		 {
   			 "in": ["y_copa_pred_probas"],
   			 "out": ["y_copa_pred_ids"],
   			 "class_name": "proba2labels",
   			 "max_proba": true
   		 },
   		 {
   			 "in": ["y_conll_pred_ids"],
   			 "out": ["y_conll_pred_labels"],
   			 "ref": "vocab_conll"
   		 }
   	 ],
   	 "out": ["y_cola_pred_ids", "y_rte_pred_ids", "y_stsb_pred", "y_copa_pred_ids", "y_conll_pred_labels"]
    },
    "train": {
   	 "epochs": "{NUM_TRAIN_EPOCHS}",
   	 "batch_size": 32,

```


…apart from the metrics multitask_accuracy, multitask_f1_macro and multitask_f1_weighted, that calculate the corresponding metrics(accuracy, f1-macro and f1-weighted) for any task and then average them. As in any DeepPavlov config, the early stopping is performed for the first metric in the metric list.
```
    	"metrics": [{
   			 "name": "multitask_accuracy",
   			 "inputs": ["y_rte", "y_cola", "y_copa", "y_rte_pred_ids", "y_cola_pred_ids", "y_copa_pred_ids"]
   		 },
```
However, one can also calculate the single-task metrics.
```
   	 	{
   			 "name": "ner_f1",
   			 "inputs": ["y_conll", "y_conll_pred_labels"]
   		 },
   		 {
   			 "name": "ner_token_f1",
   			 "inputs": ["y_conll", "y_conll_pred_labels"]
   		 },
   		 {
   			 "name": "accuracy",
   			 "alias": "accuracy_cola",
   			 "inputs": ["y_cola", "y_cola_pred_ids"]
   		 },
   		 {
   			 "name": "accuracy",
   			 "alias": "accuracy_rte",
   			 "inputs": ["y_rte", "y_rte_pred_ids"]
   		 },
   		 {
   			 "name": "accuracy",
   			 "alias": "accuracy_copa",
   			 "inputs": ["y_copa", "y_copa_pred_ids"]
   		 },
   		 {
   			 "name": "pearson_correlation",
   			 "alias": "pearson_stsb",
   			 "inputs": ["y_stsb", "y_stsb_pred"]
   		 },
   		 {
   			 "name": "spearman_correlation",
   			 "alias": "spearman_stsb",
   			 "inputs": ["y_stsb", "y_stsb_pred"]
   		 },
   		 {
   			 "name": "squad_v1_f1",
   			 "inputs": [
   				 "ans_squad",
   				 "ans_predicted_squad"
   			 ]
   		 },
   		 {
   			 "name": "squad_v1_em",
   			 "inputs": [
   				 "ans_squad",
   				 "ans_predicted_squad"
   			 ]
   		 }
   	 ],
   	 "validation_patience": 3,
   	 "val_every_n_epochs": 1,
   	 "log_every_n_epochs": 1,
   	 "show_examples": false,
   	 "evaluation_targets": ["valid"],
   	 "class_name": "torch_trainer"
    },
    "metadata": {
   	 "variables": {
   		 "ROOT_PATH": "~/.deeppavlov",
   		 "MODELS_PATH": "{ROOT_PATH}/models/multitask_example",
   		 "DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
   		 "BACKBONE": "distilbert-base-uncased",
   		 "MODEL_PATH": "{MODELS_PATH}/{BACKBONE}",
   		 "NUM_TRAIN_EPOCHS": 5,
   		 "GRADIENT_ACC_STEPS": 1
   	 },
   	 "download": [{
   		 "url": "http://files.deeppavlov.ai/deeppavlov_data/multitask/multitask_example_v2.tar.gz",
   		 "subdir": "{MODELS_PATH}"
   	 }]
    }
}
```

In [None]:
Dataset reader

For inferring the multitask config in DeepPavlov, one firstly needs to build the model.
If you want to infer our pretrained config, you need to run the command in the command line

In [None]:
!python -m deeppavlov install multitask_example

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers<4.25.0,>=4.13.0
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5 MB[0m [31m53.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m85.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.24.0
Looking in indexes: https://pypi.org/si

and then build the model

In [None]:
from deeppavlov import build_model, configs
model = build_model('multitask_example', download=True)


# If you use your config from scratch, it should look like
# model = build_model('path/to/your/config.json')

2023-04-26 14:23:32.835 INFO in 'deeppavlov.core.data.utils'['utils'] at line 95: Downloading from http://files.deeppavlov.ai/deeppavlov_data/multitask/multitask_example.tar.gz to /root/.deeppavlov/models/multitask_example.tar.gz
INFO:deeppavlov.core.data.utils:Downloading from http://files.deeppavlov.ai/deeppavlov_data/multitask/multitask_example.tar.gz to /root/.deeppavlov/models/multitask_example.tar.gz
100%|██████████| 682M/682M [00:37<00:00, 18.3MB/s]
2023-04-26 14:24:11.318 INFO in 'deeppavlov.core.data.utils'['utils'] at line 276: Extracting /root/.deeppavlov/models/multitask_example.tar.gz archive into /root/.deeppavlov/models/multitask_example
INFO:deeppavlov.core.data.utils:Extracting /root/.deeppavlov/models/multitask_example.tar.gz archive into /root/.deeppavlov/models/multitask_example


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

2023-04-26 14:24:36.836 INFO in 'deeppavlov.models.torch_bert.multitask_transformer'['multitask_transformer'] at line 413: Load path /root/.deeppavlov/models/multitask_example/distilbert-base-uncased is given.
INFO:deeppavlov.models.torch_bert.multitask_transformer:Load path /root/.deeppavlov/models/multitask_example/distilbert-base-uncased is given.
2023-04-26 14:24:36.844 INFO in 'deeppavlov.models.torch_bert.multitask_transformer'['multitask_transformer'] at line 422: Load path /root/.deeppavlov/models/multitask_example/distilbert-base-uncased.pth.tar exists.
INFO:deeppavlov.models.torch_bert.multitask_transformer:Load path /root/.deeppavlov/models/multitask_example/distilbert-base-uncased.pth.tar exists.
2023-04-26 14:24:36.847 INFO in 'deeppavlov.models.torch_bert.multitask_transformer'['multitask_transformer'] at line 423: Initializing `MultiTaskTransformer` from saved.
INFO:deeppavlov.models.torch_bert.multitask_transformer:Initializing `MultiTaskTransformer` from saved.


Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_projector.weight', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2023-04-26 14:24:41.655 INFO in 'deeppavlov.models.torch_bert.multitask_transformer'['multitask_transformer'] at line 431: Loading weights from /root/.deeppavlov/models/multitask_example/distilbert-base-uncased.pth.tar.
INFO:deeppavlov.models.torch_bert.multitask_trans

Then, for inferring the config for N tasks, one needs to define the list of N lists of variables,
where every list is the list of examples to the certain task. 

Mind that the order of lists must be exactly the same as the order of tasks in config.

If the same phrase needs to be classified for many tasks, it is cached.
That speeds the computation up compared to using different phrases.
If one does not hand over arguments for any task, one can just pass an empty list.


Here is how one can make the list of x.

In [None]:
tasks =['cola','rte','stsb','copa','conll'] 
# the same order as config
x=dict()
for task in tasks:
    if task=='rte':  # Sentence pair classification/regression
       # Example can be a tuple
        x[task]=[('pair 1 phrase 1', 'pair 1 phrase 2'),
                 ('pair 2 phrase 1', 'pair 2 phrase 2')]
    elif task=='cola': # Single sentence classification/regression
       # Example can be a string
        x[task]=['phrase1']
    elif task=='conll': # NER
       # For NER, examples are strings
        x[task]=['first second'] # NER
    elif task=='stsb': # Single sentence regression. 
       #Examples for any task can be empty, like in that case
        x[task]=[] 
    elif task=='copa':
        x[task]=[('context in pair 1', ['choice 1 in pair 1', 'choice 2 in pair 1']),
                          ('context in pair 2', ['choice 1 in pair 2', 'choice 2 in pair 2'])]
       # Illustrating multiple choice task
      
    else:
        x[task]=['test phrase']
list_of_x = [x[task] for task in tasks]

To infer the model, one need to pass the concatenation of list of x and list of y.

List of y has the same structure as the list of x, but any list for y can be empty.

In [None]:
list_of_y = [[] for _ in tasks]
args = list_of_x + list_of_y

Then we perform inference as for usual DeepPavlov models

In [None]:
outputs = model(*args)
print(outputs)

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


[[1], [1, 1], [], [0, 1], [['O', 'O']]]
