# Multi-task Supervision Tutorial

In this tutorial we demonstrate how to use the multi-task versions of the label model and end model. We do this with a simple synthetic dataset, focusing primarily on inputs/output interfaces of these models. In a future tutorial, we will demonstrate the multi-task workflow on a real-world domain of larger scale and complexity, and the benefits that come from jointly modeling the weak supervision.

For multi-task problems, we execute our pipeline in five steps:
1. Load Data
2. Define Task Graph
3. Train Label Model
4. Train End Model
5. Evaluate

## Step 1: Load Data

We first load our data.

The data dyptes for the multi-task setting mirror those of the single-task setting, but with an extra dimension for the number of tasks (t), and with the cardinality (k) being replaced by task-specific cardinalities (K_t):

* X: a t-length list of \[n\]-dim iterables of end model inputs OR a single \[n\]-dim iterable of inputs if all tasks operate on the same input.
* Y: a t-length list of \[n\]-dim numpy.ndarray of target labels (Y[i] $\in$ {1,...,K_t})
* L: a t-length list of \[n,m\] scipy.sparse matrices of noisy labels (L[i,j] $\in$ {0,...,K_t}, with label 0 reserved for abstentions

And optionally (for use with some debugging/analysis tools):
* D: a t-length list of \[n\]-dim iterables of human-readable examples OR a single \[n\]-dim iterable of examples if all tasks operate on the same data.

In [54]:
import pickle

with open("data/multitask_tutorial.pkl", 'rb') as f:
    X, Y, L, D = pickle.load(f)

## Step 2: Define Task Graph

The primary role of the task graph is to define a set of feasible target label vectors.
For example, consider the following set of classification tasks, wherein we assign text entities to one of the given labels:

T0: Y0 ∈ {PERSON, ORG}  
T1: Y1 ∈ {DOCTOR, OTHER PERSON, NOT APPLICABLE}  
T2: Y2 ∈ {HOSPITAL, OTHER ORG, NOT APPLICABLE}  

Observe that the tasks are related by logical implication relationships: if Y0 = PERSON,
then Y2 = NOT APPLICABLE, since Y2 classifies ORGs. Thus, in this task structure, [PERSON, DOCTOR, NOT APPLICABLE] is a feasible label vector, whereas [PERSON, DOCTOR, HOSPITAL] is not.

To reflect this feasible label set, we define our task graph for this problem with a TaskHierarchy, a subclass of TaskGraph which assumes that label K_t for each non-root node is the "NOT APPLICABLE" class.

In [56]:
from metal.multitask import TaskHierarchy
task_graph = TaskHierarchy(cardinalities=[2,3,3], edges=[(0,1), (0,2)])

## Step 3: Train Label Model

We now pass our TaskGraph into the multi-task label model to instantiate a model with the appropriate structure.

In [57]:
from metal.multitask import MTLabelModel
label_model = MTLabelModel(task_graph=task_graph)

In [58]:
label_model.train(L, n_epochs=100, seed=123)

Computing O...
Estimating \mu...
[E:0]	Train Loss: 3.967
[E:10]	Train Loss: 1.118
[E:20]	Train Loss: 0.587
[E:30]	Train Loss: 0.186
[E:40]	Train Loss: 0.081
[E:50]	Train Loss: 0.049
[E:60]	Train Loss: 0.026
[E:70]	Train Loss: 0.026
[E:80]	Train Loss: 0.024
[E:90]	Train Loss: 0.023
[E:99]	Train Loss: 0.023
Finished Training


As with the single-task case, we can score this trained model to evaluate it directly, or use it to make predictions for our training set that will then be used to train a multi-task end model.

In [59]:
label_model.score(L, Y)

Accuracy: 0.939


0.9390000000000001

In [60]:
# Y_train_ps stands for "Y[labels]_train[split]_p[redicted]s[oft]"
Y_train_ps = label_model.predict_proba(L)

## Step 4: Train End Model

As with the single-task end model, the multi-task end model consists of three components: input layers, middle layers, and task head layers. Again, each layer consists of a torch.nn.Module followed by various optional additional operators (e.g., a ReLU nonlinearity, batch normalization, and/or dropout).

**Input layers**: The input module is an IdentityModule by default. If your tasks accept inputs of different types (e.g., one task over images and another over text), you may pass in a t-length list of input modules.

**Middle layers**: The middle modules are nn.Linear by default and are shared by all tasks.

**Head layers**: The t task head modules are nn.Linear modules by default. You may instead pass in a custom module to be used by all tasks or a t-length list of modules. These task heads are unique to each task, sharing no parameters with other tasks. Their output is fed to a softmax operators whose output dimensions are equal to the cardinalties for each task.

Here we construct a simple graph with a single (identity) input module, two intermediate layers, and linear task heads attached to the top layer.

In [61]:
from metal.multitask import MTEndModel

end_model = MTEndModel([1000,100,10], task_graph=task_graph, seed=123)


Network architecture:

--Input Layer--
IdentityModule()

--Middle Layers--
(layer1):
Sequential(
  (0): Linear(in_features=1000, out_features=100, bias=True)
  (1): ReLU()
)

(layer2):
Sequential(
  (0): Linear(in_features=100, out_features=10, bias=True)
  (1): ReLU()
)
(head0)
Linear(in_features=10, out_features=2, bias=True)
(head1)
Linear(in_features=10, out_features=3, bias=True)
(head2)
Linear(in_features=10, out_features=3, bias=True)




In [70]:
end_model.train(X, Y_train_ps, X, Y)

Saving model at iteration 0 with best score 0.946
[E:0]	Train Loss: 1.967	Dev score: 0.946
Saving model at iteration 1 with best score 0.968
[E:1]	Train Loss: 0.866	Dev score: 0.968
[E:2]	Train Loss: 0.614	Dev score: 0.956
[E:3]	Train Loss: 0.548	Dev score: 0.957
[E:4]	Train Loss: 0.501	Dev score: 0.945
[E:5]	Train Loss: 0.486	Dev score: 0.953
[E:6]	Train Loss: 0.471	Dev score: 0.948
[E:7]	Train Loss: 0.447	Dev score: 0.948
[E:8]	Train Loss: 0.431	Dev score: 0.949
[E:9]	Train Loss: 0.425	Dev score: 0.949
Restoring best model from iteration 1 with score 0.968
Finished Training


## Step 5: Evaluate

When it comes scoring our multi-task models, by default the mean task accuracy is reported. We can also, however, pass `reduce=None` to get back a list of task-specific accuracies.

In [73]:
print("Label Model:")
score = label_model.score(L, Y)

print()

print("End Model:")
score = end_model.score(X, Y)

Label Model:
Accuracy: 0.939

End Model:
Accuracy: 0.968
