# Losses Recipes

In this page, we will show you how to customize your own losses. In `carefree-learn`, it is fairly easy to define various kinds of losses (ML, CV, etc.) with a unified API `register_loss_module`.

> You might notice that if you run the blocks with `register_loss_module` calls for more than once, `carefree-learn` will throw a warning which says " '...' has already been registered ", and your changes will have no effect. This is intentional because normally we **DO NOT** want to register anything for more than once.
> 
> However, if you are using some interactive developing tools (e.g. Jupyter Notebook), it is very common to modify the implementations for more than once. In this case, we can set `allow_duplicate=True` in the `register_loss_module` functions to bypass this check. And of course, this should **NEVER** happen in production for safety!

# Table of Content

- [Typical Losses](#Typical-Losses)
- [Complex Losses](#Complex-Losses)
- [Integration](#Integration)
  - [Single Loss](#Single-Loss)
  - [Multi-Task Loss](#Multi-Task-Loss)
- [Q&A](#Q&A)

> You might also notice that the class name defined below somehow matches the registered name. This is not required, since `carefree-learn` only cares about the name that you pass to the `register_loss_module` function, and will not check the actual class name.

# Preparations

In [1]:
import torch
import cflearn

import numpy as np
import torch.nn as nn

from torch import Tensor
from typing import Dict
from cflearn.constants import LOSS_KEY
from cflearn.constants import LABEL_KEY
from cflearn.constants import PREDICTIONS_KEY

# Typical Losses

In the most typical cases, a loss function should receive a `predictions` and a `labels` to calculate the loss:

In [2]:
# typical classification loss - cross entropy
@cflearn.register_loss_module("my_cross_entropy", allow_duplicate=False)
class MyCrossEntropy(nn.Module):
    def __init__(self):
        super().__init__()
        self.ce = nn.CrossEntropyLoss(reduction="none")

    # logits : [N, K, ...]
    # labels : [N, 1, ...]
    def forward(self, predictions: Tensor, labels: Tensor) -> Tensor:
        return self.ce(predictions, labels.squeeze(1))

# typical regression loss - mean absolute error
@cflearn.register_loss_module("my_mae", allow_duplicate=False)
class MyMAE(nn.Module):
    # logits : [N, ...]
    # labels : [N, ...]
    def forward(self, predictions: Tensor, labels: Tensor) -> Tensor:
        return (predictions - labels).abs()

> You might notice that the returned `Tensor` is not 'reduced', that's because `carefree-learn` supports specifying the `reduction` (default: `mean`) of your losses (as `pytorch` does). See the following `Usages` section for more details.
>
> However, you can still 'reduce' your loss in the definition. In this case, your loss will return the same values no matter what the `reduction` is .

## Usages

In [3]:
logits = torch.randn(100, 5)
labels = torch.argmax(logits, dim=1, keepdim=True)
inv_labels = torch.argmin(logits, dim=1, keepdim=True)
fw = {PREDICTIONS_KEY: logits}
b = {LABEL_KEY: labels}
ib = {LABEL_KEY: inv_labels}
my_ce = cflearn.api.make_loss("my_cross_entropy")
my_ce_sum = cflearn.api.make_loss("my_cross_entropy", reduction="sum")
print(my_ce.core(logits, labels).shape)
print(my_ce(fw, b)[LOSS_KEY].item())
print(my_ce.core(logits, inv_labels).shape)
print(my_ce(fw, ib)[LOSS_KEY].item())
print(my_ce_sum(fw, ib)[LOSS_KEY].item())
print()
logits = torch.randn(10, 5, 10, 10)
labels = torch.argmax(logits, dim=1, keepdim=True)
inv_labels = torch.argmin(logits, dim=1, keepdim=True)
inv_labels = torch.argmin(logits, dim=1, keepdim=True)
fw = {PREDICTIONS_KEY: logits}
b = {LABEL_KEY: labels}
ib = {LABEL_KEY: inv_labels}
print(my_ce.core(logits, labels).shape)
print(my_ce(fw, b)[LOSS_KEY].item())
print(my_ce.core(logits, inv_labels).shape)
print(my_ce(fw, ib)[LOSS_KEY].item())
print(my_ce_sum(fw, ib)[LOSS_KEY].item())

predictions = torch.randn(100, 5)
labels = predictions - 0.123
my_mae = cflearn.api.make_loss("my_mae")
print()
print(my_mae.core(predictions, labels).shape)
predictions = torch.randn(4, 5, 64, 64)
labels = predictions - 0.123
print()
print(my_mae.core(predictions, labels).shape)

torch.Size([100])
0.8284955024719238
torch.Size([100])
3.1444976329803467
314.44976806640625

torch.Size([10, 10, 10])
0.7982355952262878
torch.Size([10, 10, 10])
3.1760029792785645
3176.0029296875

torch.Size([100, 5])

torch.Size([4, 5, 64, 64])


# Complex Losses

In some complex situations, we may:
- have multiple values in our predictions / inputs (e.g. multi-style transfer).
- need to record multiple losses for debug / verbose purpose.

`carefree-learn` therefore supports your custom losses to:
- receive a `dict` of `Tensor`s.
- return a `dict` of `Tensor`s, in which the value of `LOSS_KEY` should indicate the final loss.

In [4]:
import torch.nn.functional as F

@cflearn.register_loss_module("my_vq_vae_loss", allow_duplicate=False)
class MyVQVAELoss(nn.Module):
    def forward(
        self,
        forward_results: Dict[str, Tensor],
        batch: Dict[str, Tensor],
    ) -> Dict[str, Tensor]:
        # reconstruction loss
        target = batch[LABEL_KEY]
        reconstruction = forward_results[PREDICTIONS_KEY]
        mse = F.mse_loss(reconstruction, target)
        # vq & commit loss
        z_e = forward_results["z_e"]
        z_q_g = forward_results["z_q_g"]
        vq_loss = F.mse_loss(z_q_g, z_e.detach())
        commit_loss = F.mse_loss(z_e, z_q_g.detach())
        # gather
        loss = mse + vq_loss + commit_loss
        return {"mse": mse, "commit": commit_loss, LOSS_KEY: loss}

Naming is important here - you should use `forward_results` & `batch` to let `carefree-learn` knows that you require the full data instead of one single `Tensor`.

You can also simplify your implementation with this design if you only require parts of the full data:

In [5]:
@cflearn.register_loss_module("my_vq_vae_loss2", allow_duplicate=False)
class MyVQVAELoss2(nn.Module):
    def forward(
        self,
        forward_results: Dict[str, Tensor],
        target: Tensor,
    ) -> Dict[str, Tensor]:
        # reconstruction loss
        reconstruction = forward_results[PREDICTIONS_KEY]
        mse = F.mse_loss(reconstruction, target)
        # vq & commit loss
        z_e = forward_results["z_e"]
        z_q_g = forward_results["z_q_g"]
        vq_loss = F.mse_loss(z_q_g, z_e.detach())
        commit_loss = F.mse_loss(z_e, z_q_g.detach())
        # gather
        loss = mse + vq_loss + commit_loss
        return {"mse": mse, "commit": commit_loss, LOSS_KEY: loss}

For some rare scenarios, we may even need the training `state` (e.g. current `step` / `epoch`) to calculate our losses. This is also accessible in `carefree-learn` by simply add a `state` argument to the `forward` method:

In [6]:
from typing import Optional
from cflearn.protocol import TrainerState

@cflearn.register_loss_module("my_state_loss", allow_duplicate=False)
class MyStateLoss(nn.Module):
    def forward(
        self,
        forward_results: Dict[str, Tensor],
        target: Tensor,
        state: Optional[TrainerState] = None,
    ) -> Dict[str, Tensor]:
        # to something
        return torch.tensor([])

> Notice that the `state` is not always available (for example, at testing stage), so we always need to handle `if state is None` condition.

## Usages

In [7]:
target = torch.randn(4, 3, 32, 32)
reconstruction = target + 0.123
code = torch.randn(4, 32)
forward_results = {
    PREDICTIONS_KEY: reconstruction,
    "z_e": code,
    "z_q_g": code + 0.234,
}
batch = {
    LABEL_KEY: target,
}
my_vq_vae_loss = cflearn.api.make_loss("my_vq_vae_loss")
my_vq_vae_loss2 = cflearn.api.make_loss("my_vq_vae_loss2")
print(my_vq_vae_loss(forward_results, batch))
print(my_vq_vae_loss2(forward_results, batch))

{'mse': tensor(0.0151), 'commit': tensor(0.0548), 'loss': tensor(0.1246)}
{'mse': tensor(0.0151), 'commit': tensor(0.0548), 'loss': tensor(0.1246)}


# Integration

After defining our own losses, we need to know how to integrate them in existing `carefree-learn` pipelines for training, testing and deploying. Basically, losses could be specified across various APIs with `loss_name` and `loss_config`. We will use `fit_ml` to demonstrate the core concepts, and the same recipes could be applied elsewhere.

## Preparations

In [8]:
x = np.random.random([100, 5])
y = np.random.random([100, 1])
common_kwargs = dict(
    x_train=x,
    y_train=y,
    x_valid=x,
    y_valid=y,
    core_name="linear",
    input_dim=5,
    output_dim=1,
    is_classification=False,
    fixed_steps=1,
)

## Single Loss

In this case, simply specify the `loss_name` to our target loss, and the `loss_config` should be the `kwargs` that will be passed to your loss's `__init__` method:

In [9]:
@cflearn.register_loss_module("my_foo_loss", allow_duplicate=False)
class MyFooLoss(nn.Module):
    def __init__(self, foo):
        super().__init__()
        self.foo = foo
        print(f"\n>>>>>> MyFooLoss.foo: {foo}\n")

    def forward(self, logits, labels):
        return self.foo * (logits - labels).abs()


m = cflearn.api.fit_ml(
    loss_name="my_foo_loss",
    loss_config=dict(foo=1.2345),
    **common_kwargs,
)


>>>>>> MyFooLoss.foo: 1.2345

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated

## Multi-Task Loss

In some cases, we might want to use multiple losses at the same time. Although it is often recommended to define a new loss to achieve this, `carefree-learn` still provides a `multi_task` 'hook' for accessibility. The format of `loss_name` in this case should be:

```text
multi_task:{loss1},{loss2},...,{lossk}
```

For example:

In [10]:
m = cflearn.api.fit_ml(
    loss_name="multi_task:mae,mse",
    **common_kwargs,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
--------

> When using the `multi_task` hook, all losses will be sumed up to construct the final loss.

# Q&A

## Does the order of the `forward` arguments matter?

- Yes, it matters, we should always use `predictions` as the first argument, and `inputs` as the second. This is aligned to the `pytorch`'s API.