In [1]:
# Copyright 2023 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlinmodelsrankingwithmultitasklearning/nvidia_logo.png" style="width: 90px; float: right;">
    

# Multi-Task Learning for Ranking

This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container. 
    
In the industry, it is common to find scenarios where you need to score the likelihood of different types of user-item interactions, e.g., clicking, liking, sharing, commenting, following the author, etc. Multi-Task Learning (MTL) techniques have been popular to train a single model that is able to predict multiple targets. By using MTL, it is possible to improve the accuracy of somewhat correlated tasks, in particular for sparser targets. It also allows simplifying the ML pipeline and reducing the computational resources to train and deploy different models for each task.

In this example, we demonstrate how to build and train ranking models with multiple targets. We introduce  building blocks that Merlin Models provides for MTL support, including MTL-specific architectures designed for accuracy improvement of many different tasks: [**Multi-gate Mixture-of-Experts (MMoE)**](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007) and [**Progressive Layered Extraction (PLE)**](https://dl.acm.org/doi/10.1145/3383313.3412236).

In this example notebook, we use a synthetic dataset generated from the schema of the dataset released in the [TenRec paper](https://arxiv.org/abs/2210.10629), which is suitable for multi-task learning for providing multiple targets (types of user-item events). 

### Learning objectives
- Getting to know the buiilding blocks Merlin provides for MTL
- Training different deep learning-based ranking models with multi-task learning using Merlin Models

In [2]:
import os
import tensorflow as tf

#os.environ["TF_GPU_ALLOCATOR"] = "cuda_malloc_async"
os.environ["TF_MEMORY_ALLOCATION"] = "0.9"

import merlin.models.tf as mm
from merlin.schema.tags import Tags

2023-02-27 16:59:58.633805: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
  warn(f"Triton dtype mappings did not load successfully due to an error: {exc.msg}")
2023-02-27 17:00:01.508087: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-27 17:00:03.638395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29249 MB memory:  -> device: 0, name: Quadro GV100, pci bus id: 0000:15:00.0, compute capability: 7.0
2023-0

## Generating data

Here we generate synthetic dataset based on the [schema](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/datasets/entertainment/tenrec_video/schema.pbtxt) of the `tenrec-video` dataset. The original dataset was released by the [TenRec paper](https://arxiv.org/abs/2210.10629), and is suitable for multi-task learning for providing multiple targets (types of user-item events). 
To make the synthetic data more realistic, our data generator takes into account the original cardinalities of categorical features and the dependency of user features to user id and item features to item id. For more information about how the schema API works you can check this [example](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb).

In [3]:
import os
from merlin.datasets.synthetic import generate_data

NUM_ROWS = os.environ.get("NUM_ROWS", 100_000)

train_ds, valid_ds = generate_data("tenrec-video", int(NUM_ROWS), set_sizes=(0.8, 0.2))
schema = train_ds.schema



By inspecting the columns tagging on the dataset schema, we can notice that there are number of user features (`user_id`, `gender`, `age`) and item features (`item_id`, `video_category`). There are also four binary classification targets (`click`, `follow`, `like`, and `share`) and one regression target (`watching_times`).

In [4]:
schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.freq_threshold,properties.cat_path,properties.start_index,properties.max_size,properties.num_buckets,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max,properties.domain.name
0,user_id,"(Tags.ID, Tags.USER_ID, Tags.CATEGORICAL, Tags...","DType(name='int32', element_type=<ElementType....",False,False,0.0,.//categories/unique.user_id.parquet,1.0,0.0,,2633851.0,512.0,0.0,100000.0,user_id
1,item_id,"(Tags.ID, Tags.CATEGORICAL, Tags.ITEM, Tags.IT...","DType(name='int32', element_type=<ElementType....",False,False,0.0,.//categories/unique.item_id.parquet,1.0,0.0,,179280.0,512.0,0.0,179280.0,item_id
2,video_category,"(Tags.CATEGORICAL, Tags.ITEM)","DType(name='int32', element_type=<ElementType....",False,False,0.0,.//categories/unique.video_category.parquet,1.0,0.0,,5.0,16.0,0.0,5.0,video_category
3,gender,"(Tags.CATEGORICAL, Tags.USER)","DType(name='int32', element_type=<ElementType....",False,False,0.0,.//categories/unique.gender.parquet,1.0,0.0,,5.0,16.0,0.0,5.0,gender
4,age,"(Tags.CATEGORICAL, Tags.USER)","DType(name='int32', element_type=<ElementType....",False,False,0.0,.//categories/unique.age.parquet,1.0,0.0,,10.0,16.0,0.0,10.0,age
5,click,"(Tags.BINARY_CLASSIFICATION, Tags.TARGET, Tags...","DType(name='int8', element_type=<ElementType.I...",False,False,,,,,,,,,,
6,follow,"(Tags.BINARY_CLASSIFICATION, Tags.TARGET, Tags...","DType(name='int8', element_type=<ElementType.I...",False,False,,,,,,,,,,
7,like,"(Tags.BINARY_CLASSIFICATION, Tags.TARGET, Tags...","DType(name='int8', element_type=<ElementType.I...",False,False,,,,,,,,,,
8,share,"(Tags.BINARY_CLASSIFICATION, Tags.TARGET, Tags...","DType(name='int8', element_type=<ElementType.I...",False,False,,,,,,,,,,
9,watching_times,"(Tags.REGRESSION, Tags.TARGET)","DType(name='int16', element_type=<ElementType....",False,False,,,,,,,,0.0,5.0,watching_times


In [5]:
# Printing first rows of the generated dataframe
train_ds.to_ddf().head()

Unnamed: 0,user_id,gender,age,item_id,video_category,click,follow,like,share,watching_times
0,27,1,1,59,1,1,1,1,0,4
1,23,1,1,90,1,1,1,0,1,3
2,50,1,1,54,1,0,0,0,0,4
3,139,1,1,18,1,1,0,0,1,1
4,14,1,1,11,1,0,0,0,1,4


## Building and training MTL models

In [6]:
BATCH_SIZE = 4 * 1024

The simplest way to build a model with Merlin Models is using `InputBlockV2` and `OutputBlock` building blocks, that infer the input features and target columns from the schema.
The `InputBlockV2` creates the embedding layers for categorical features and concatenates all features. The `OutputBlock` creates a head `ModelOutput` for each target depending on the task type (tagged in the column schema), e.g. `RegressionOutput()` for regression, `BinaryOuput()` for binary classification, `CategoricalOutput` for multi-class classification.  

You can inspect below a multi-task learning model created for this dataset with just four lines of code.

In [7]:
model = mm.Model(
    mm.InputBlockV2(schema),
    mm.MLPBlock([32,16]),
    mm.OutputBlock(schema)
)

model

Model(
  (blocks): _TupleWrapper((ParallelBlock(
    (_pre): PrepareFeatures()
    (_aggregation): ConcatFeatures()
    (parallel_layers): Dict(
      (categorical): ParallelBlock(
        (parallel_layers): Dict(
          (user_id): EmbeddingTable(
            (features): Dict(
              (user_id): ColumnSchema(name='user_id', tags={<Tags.ID: 'id'>, <Tags.USER_ID: 'user_id'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.USER: 'user'>}, properties={'freq_threshold': 0.0, 'cat_path': './/categories/unique.user_id.parquet', 'start_index': 1.0, 'max_size': 0.0, 'num_buckets': None, 'embedding_sizes': {'cardinality': 2633851.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 100000, 'name': 'user_id'}}, dtype=DType(name='int32', element_type=<ElementType.Int: 'int'>, element_size=32, element_unit=None, signed=True, shape=Shape(dims=None)), is_list=False, is_ragged=False)
            )
            (table): Embedding()
          )
          (item_id): EmbeddingTable(
            (features)

*Note*: If you want to build a model for just a subset of the target features, you can either remove the unwanted columns from schema: `schema.without(["like", "follow", "share"])`

OR you can replace `mm.OutputBlock(schema)` by a `ParallelBlock` with only the desired targets:
```python
mm.ParallelBlock(
  mm.BinaryOutput("click"), mm.RegressionOutput("watching_times")
)
```

### Train and evaluation of MTL models

In [8]:
model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)





<keras.callbacks.History at 0x7f0fc4149100>

By inspecting the metrics output from model evaluation, we can observe that there are specific default metrics to each target; for binary classification (`precision`, `recall`, `binary_accuracy`, `auc`) and for regression (`root_mean_squared_error`) tasks.  
Each task has its own loss (e.g. `click/binary_output_loss`, `watching_times/regression_output_loss`) and the `loss` is the sum of all tasks losses.

In [9]:
model.evaluate(valid_ds, batch_size=BATCH_SIZE, return_dict=True)



{'loss': 2.9355478286743164,
 'click/binary_output_loss': 0.6932715773582458,
 'follow/binary_output_loss': 0.6933135986328125,
 'like/binary_output_loss': 0.693278431892395,
 'share/binary_output_loss': 0.6933173537254333,
 'watching_times/regression_output_loss': 0.16236688196659088,
 'click/binary_output/precision': 0.5111662745475769,
 'click/binary_output/recall': 0.04149043187499046,
 'click/binary_output/binary_accuracy': 0.5044000148773193,
 'click/binary_output/auc': 0.5021954774856567,
 'follow/binary_output/precision': 0.49819493293762207,
 'follow/binary_output/recall': 0.8840841054916382,
 'follow/binary_output/binary_accuracy': 0.49729999899864197,
 'follow/binary_output/auc': 0.4973853826522827,
 'like/binary_output/precision': 0.5026715993881226,
 'like/binary_output/recall': 0.4886624813079834,
 'like/binary_output/binary_accuracy': 0.50204998254776,
 'like/binary_output/auc': 0.49843934178352356,
 'share/binary_output/precision': 0.4973011612892151,
 'share/binary_out

### Setting loss weights

You can balance the importance of individual task losses into the final `loss` by setting `loss_weights`.

In [10]:
loss_weights = {
        "click/binary_output": 5.0,
        "like/binary_output": 4.0,
        "share/binary_output": 3.0,
        "follow/binary_output": 2.0,
        "watching_times/regression_output": 1.0,        
    }


model.compile(optimizer="adam", run_eagerly=False, loss_weights=loss_weights)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f08484bdd60>

### Setting task-specific class / sample weights

Keras supports setting `class_weight` and `sample_weight` for **single-task models** in `model.fit()`.  

The `class_weight` allows weighting the classes of categorical/binary target in the loss, so that model training can pay more attention to samples from an under-represented class.  

The `sample_weight` allows weighting data samples which should account more or less for the loss during training. If `weighted_metrics` is provided in `model.compile()`, then those metrics will also be weighted by `sample_weight` during training and testing.

Merlin Models provides building blocks for **tasks-specific class and sample weights** with the `ColumnBasedSampleWeight` block. Here are some examples for different use cases.

#### 1. Setting class weights per task
Here we create an MTL model to predict `click` and `like` targets. We set negative events (0s) to have weight 1.0 and positive events (1s) to have a higher weight. As `like` target is a more rare event (sparser) than `click`  we should use higher sample weight for positive examples for it.

In [11]:
output_block = mm.ParallelBlock(
  mm.BinaryOutput("click",
                  post=mm.ColumnBasedSampleWeight(
                        binary_class_weights=(1.0, 5.0), 
                  )), 
  mm.BinaryOutput("like",
                  post=mm.ColumnBasedSampleWeight(
                        binary_class_weights=(1.0, 20.0), 
                  ))
)

In [12]:
model = mm.Model(
    mm.InputBlockV2(schema),
    mm.MLPBlock([32,16]),
    output_block
)

model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f08419662b0>

#### 2. Using other target / feature as weight per task
Another use case would be using a feature or another target for sample weight. For example, the `watching_times` target column represents the number of times the user has watched the video. That column could be used as a strength indicator on how much the video is relevant for the user. So we can use it as a the sample weight for `click`, so that the loss emphasizes more from such samples.  
P.s. Input columns (non-target) can also be used as sample weight.

In [13]:
output_block = mm.BinaryOutput("click",
                  post=mm.ColumnBasedSampleWeight("watching_times")
               )

In [14]:
model = mm.Model(
    mm.InputBlockV2(schema),
    mm.MLPBlock([32,16]),
    output_block
)

model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f08412f3e50>

#### 3. Using another binary target as sample space
In some cases, a target might be conditioned to another binary target. For example, there might be some event dependency in the system the user is interacting with, so that the user can only `like` or `share` if `click` event happened first. As the more specific events are usually much less frequent than `click`, they are sparser thus suffer more from unbalanced class training. In such cases, as you can only have a positive event (i.e., `like=1`) if `click=1`, we can use `click` as the sample space for training `like`, i.e., the sample is only considered for `like` loss if `click=1`. Here is how you can set such sample space dependency among targets.  

In [15]:
output_block = mm.ParallelBlock(
  mm.BinaryOutput("click"), 
  mm.BinaryOutput("like", post=mm.ColumnBasedSampleWeight("click"))
)

In such cases you might want to compute metrics for `like` considering only its sample space, rather than the entire space. The **`weighed_metrics`** can be used for that, as regular metrics are not influenced by sample weights.  
We also demonstrate below how to override the default **`metrics`** per task. Metrics can be either Keras-like metrics or string aliases supported by Merlin Models (e.g., "auc", "precision", "recall", "binary_accuracy", "rmse", "mse").

In [16]:
model = mm.Model(
    mm.InputBlockV2(schema),
    mm.MLPBlock([32,16]),
    output_block
)

metrics = {
        "click/binary_output": [tf.keras.metrics.AUC(name="auc", num_thresholds=200)],
        "like/binary_output": ["auc"],
    }

weighted_metrics = {
        "click/binary_output": ["auc"],
        "like/binary_output": ["auc"],
    }

model.compile(optimizer="adam", run_eagerly=False, metrics=metrics, weighted_metrics=weighted_metrics)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f0840ea0fd0>

You can notice that when `weighted_metrics` are set we get the specified metrics prefixed by `weighted_`. The regular metrics for `like` (`auc`) differs from the weighted metrics (`weighted_auc`) because the latter are affected by sample weights, i.e., computed only for samples where `click=1`.

In [17]:
model.evaluate(valid_ds, batch_size=BATCH_SIZE, return_dict=True)



{'loss': 1.0423791408538818,
 'click/binary_output_loss': 0.6933351755142212,
 'like/binary_output_loss': 0.34904393553733826,
 'click/binary_output/auc': 0.5008733868598938,
 'click/binary_output/weighted_auc': 0.5008733868598938,
 'like/binary_output/auc': 0.49714359641075134,
 'like/binary_output/weighted_auc': 0.5003456473350525,
 'regularization_loss': 0.0,
 'loss_batch': 1.0625323057174683}

You can also cascade multiple sample weights for a target by using a `SequentialBlock`. For example, setting class weights for a binary target and using another binary column as the sample space. If there are multiple `ColumnBasedSampleWeight`, the sample weights are multiplied element-wise.
```python
mm.BinaryOutput("like", 
                post=mm.SequentialBlock(
                    [mm.ColumnBasedSampleWeight(binary_class_weights=(1.0, 5.0)),
                     mm.ColumnBasedSampleWeight("click")]
                )
)
```

## Multi-task learning architectures

In this section we describe different architectures for multi-task learning, which are summarized in the following illustration. The blue shapes are the ones that are shared for all tasks, and the other colored shapes are task-specific ones. We explain each of those architectures in the next sub-sections.

<img src="../images/mtl_architectures.png"  width="90%">

Image adapted from: [Progressive Layered Extraction (PLE): A Novel Multi-Task
Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/10.1145/3383313.3412236)

### Hard parameter sharing

The examples above used a **hard parameter sharing**, where all tasks share MLP layers in the bottom, and each task has a specific single-layer MLP tower that projects the shared-bottom output to a single neuron per task (for binary classification / regression tasks).  
We can specify more powerful task towers, so that tasks have more freedom to learn different things, with either of the following examples.

In [18]:
output_block = mm.OutputBlock(schema, task_blocks=mm.MLPBlock([32]))

or...

In [19]:
output_block = mm.ParallelBlock(
  mm.BinaryOutput("click", pre=mm.MLPBlock([64])), 
  mm.BinaryOutput("like",  pre=mm.MLPBlock([32]))
)

In [20]:
model = mm.Model(
    mm.InputBlockV2(schema),
    mm.MLPBlock([32,16]),
    output_block
)
model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f08406bdf70>

### MMoE architecture

The [**Multi-gate Mixture-of-Experts (MMoE)**](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007) architecture was introduced by Google in 2018 and is one of the most popular models for multi-task learning on tabular data. It is based on the former one-gate **Mixture of Experts (MoE)**, which proposed having different sub-networks (experts) projecting the inputs independently and then having the experts output weighted averaged by a gate for a shared representation to be used for all tasks. The MMoE architecture took a step further and proposed having an independent gate for each task, so that they could choose how to better combine the experts outputs. You can find more details in the [MMoE paper](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007).

The MMoE architecture can be created for your dataset with just a few lines of code!

In [21]:
inputs = mm.InputBlockV2(schema)
output_block = mm.OutputBlock(schema, task_blocks=mm.MLPBlock([32]))
mmoe = mm.MMOEBlock(
    output_block,
    expert_block=mm.MLPBlock([64]),
    num_experts=4,
    gate_block=mm.MLPBlock([16]),
)
model = mm.Model(inputs, mmoe, output_block)
print(model)

Model(
  (blocks): _TupleWrapper((ParallelBlock(
    (_pre): PrepareFeatures()
    (_aggregation): ConcatFeatures()
    (parallel_layers): Dict(
      (categorical): ParallelBlock(
        (parallel_layers): Dict(
          (user_id): EmbeddingTable(
            (features): Dict(
              (user_id): ColumnSchema(name='user_id', tags={<Tags.ID: 'id'>, <Tags.USER_ID: 'user_id'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.USER: 'user'>}, properties={'freq_threshold': 0.0, 'cat_path': './/categories/unique.user_id.parquet', 'start_index': 1.0, 'max_size': 0.0, 'num_buckets': None, 'embedding_sizes': {'cardinality': 2633851.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 100000, 'name': 'user_id'}}, dtype=DType(name='int32', element_type=<ElementType.Int: 'int'>, element_size=32, element_unit=None, signed=True, shape=Shape(dims=None)), is_list=False, is_ragged=False)
            )
            (table): Embedding()
          )
          (item_id): EmbeddingTable(
            (features)

In [22]:
model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f0827f5ae80>

### CGC and PLE architectures

The **CGC** and **PLE** architectures were introduced in this [paper](https://dl.acm.org/doi/10.1145/3383313.3412236) (2020). The authors observed that architectures like **MMoE** presented a "seesaw" phenomenon, where improving the accuracy of one task hurts the accuracy of other tasks.  
So instead of having all tasks sharing all the experts, they proposed allowing for some task-specific experts and shared experts, which they named **Customized Gate Control (CGC) Model**, for which we provide a building block.   
Notice that `CGCBlock` has separate arguments for `num_task_experts` and `num_shared_experts`.

#### CGC

In [23]:
inputs = mm.InputBlockV2(schema)
output_block = mm.OutputBlock(schema, task_blocks=mm.MLPBlock([32]))

cgc = mm.CGCBlock(
    output_block,
    expert_block=mm.MLPBlock([64]),
    num_task_experts=2,
    num_shared_experts=3,
)
model = mm.Model(inputs, cgc, output_block)

In [24]:
model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f0840af0040>

#### PLE

Furthermore, the [paper](https://dl.acm.org/doi/10.1145/3383313.3412236) authors proposed stacking multiple **CGC** models on top of each other to form a multi-level MTL model, which they called **Progressive Layered Extraction (PLE)**, which in their experiments was able to alleviate the *seesaw problem*, improving the accuracy of all tasks. The `PLEBlock` introduces the `num_layers`, which controls the number of levels.   

In [25]:
inputs = mm.InputBlockV2(schema)
output_block = mm.OutputBlock(schema, task_blocks=mm.MLPBlock([32]))

ple = mm.PLEBlock(
    num_layers=2,
    outputs=output_block,
    expert_block=mm.MLPBlock([64]),
    num_task_experts=2,
    num_shared_experts=3,
)
model = mm.Model(inputs, ple, output_block)

In [26]:
model.compile(optimizer="adam", run_eagerly=False)
model.fit(train_ds, batch_size=BATCH_SIZE)



<keras.callbacks.History at 0x7f08260acdf0>

In [27]:
metrics_results = model.evaluate(valid_ds, batch_size=BATCH_SIZE, return_dict=True)
metrics_results



{'loss': 2.9349355697631836,
 'click/binary_output_loss': 0.6931571364402771,
 'follow/binary_output_loss': 0.6931573152542114,
 'like/binary_output_loss': 0.6931440234184265,
 'share/binary_output_loss': 0.6931401491165161,
 'watching_times/regression_output_loss': 0.16233690083026886,
 'click/binary_output/precision': 0.4973677694797516,
 'click/binary_output/recall': 0.7991943359375,
 'click/binary_output/binary_accuracy': 0.4993000030517578,
 'click/binary_output/auc': 0.49983569979667664,
 'follow/binary_output/precision': 0.0,
 'follow/binary_output/recall': 0.0,
 'follow/binary_output/binary_accuracy': 0.5005000233650208,
 'follow/binary_output/auc': 0.5,
 'like/binary_output/precision': 0.5017586350440979,
 'like/binary_output/recall': 0.7409849166870117,
 'like/binary_output/binary_accuracy': 0.50204998254776,
 'like/binary_output/auc': 0.5,
 'share/binary_output/precision': 0.75,
 'share/binary_output/recall': 0.00030090269865468144,
 'share/binary_output/binary_accuracy': 0.

## Conclusion

In this notebook we introduced multi-task learning use cases for ranking models and the building blocks provided by Merlin Models to build, train and evaluate such models.  
You can see how easy it is to build state-of-the-art MTL architectures with just a few lines of code with Merlin Models!