# Metrics Recipes

In this page, we will show you how to customize your own metrics. In `carefree-learn`, it is fairly easy to define various kinds of metrics (ML, CV, etc.) with a unified API.

> You might notice that if you run the blocks with `register_metric` calls for more than once, `carefree-learn` will throw a warning which says " '...' has already been registered ", and your changes will have no effect. This is intentional because normally we DO NOT want to register anything for more than once.
> 
> However, if you are using some interactive developing tools (e.g. Jupyter Notebook), it is very common to modify the implementations for more than once. In this case, we can set `allow_duplicate=True` in the `register_*` functions to bypass this check. And of course, this should NEVER happen in production for safety!

# Table of Content

- [ML Metrics](#ML-Metrics)
- [CV Metrics](#CV-Metrics)
- [Integration](#Integration)
  - [Single Metric](#Single-Metric)
  - [Multiple Metrics](#Multiple-Metrics)
  - [Weighted Metrics](#Weighted-Metrics)
  - [Complex Metrics](#Complex-Metrics)

> You might also notice that:
> - All classes have inherited `MetricInterface`. This is not required, but it can guide you to implement the essential parts in IDE.
> - The class name somehow matches the registered name. This is also not required, since `carefree-learn` only cares about the name that you pass to the `register_*` functions, and will not check the actual class name.

# Preparations

In [1]:
import torch
import cflearn

import numpy as np
import torch.nn as nn

from typing import Dict
from cftool.array import iou
from cftool.array import corr
from cftool.array import softmax

try:
    from sklearn import metrics
except:
    metrics = None

np.random.seed(142857)
torch.manual_seed(142857)

<torch._C.Generator at 0x2164caac350>

# ML Metrics

In [2]:
# typical classification metric
@cflearn.register_metric("my_binary_accuracy", allow_duplicate=False)
class MyBinaryAccuracy(cflearn.MetricInterface):
    def __init__(self, threshold: float = 0.0):
        super().__init__()
        self.threshold = threshold

    # True means that the larger this metric is, the better.
    @property
    def is_positive(self) -> bool:
        return True

    # logits : [N, 1]
    # labels : [N, 1]
    def forward(self, logits: np.ndarray, labels: np.ndarray) -> float:
        predictions = (logits > self.threshold).astype(int)
        return (predictions == labels).mean().item()

# typical regression metric
@cflearn.register_metric("my_l1", allow_duplicate=False)
class MyL1(cflearn.MetricInterface):
    # False means that the smaller this metric is, the better.
    @property
    def is_positive(self) -> bool:
        return False

    # predictions : [N, 1]
    # labels      : [N, 1]
    def forward(self, predictions: np.ndarray, labels: np.ndarray) -> float:
        return np.abs(predictions - labels).mean().item()
    
# special classification metric, which requires the whole dataset to evaluate.
@cflearn.register_metric("my_auc", allow_duplicate=False)
class MyAUC(cflearn.MetricInterface):
    def __init__(self) -> None:
        super().__init__()
        if metrics is None:
            print("`scikit-learn` needs to be installed for `AUC`")

    # True means that the larger this metric is, the better.
    @property
    def is_positive(self) -> bool:
        return True

    #   True means that this metric requires the entire dataset to evaluate.
    #   For AUC, this has to be True because for some imbalanced dataset, it is 
    # very likely to have some batches that only contain one kind of labels, which 
    # will crash the AUC calculation.
    #   Notice that this is only useful when this metric is integrated in `carefree-learn`'s 
    # pipeline, where `carefree-learn` will read this flag and decide whether to pass
    # the entire dataset to this metric or not.
    @property
    def requires_all(self) -> bool:
        return True

    # K >= 2
    # logits : [N, K]
    # labels : [N, 1]
    def forward(self, logits: np.ndarray, labels: np.ndarray) -> float:
        if metrics is None:
            return 0.0
        num_classes = logits.shape[1]  # K
        probabilities = softmax(logits)
        labels = labels.ravel()
        if num_classes == 2:
            return metrics.roc_auc_score(labels, probabilities[..., 1])
        return metrics.roc_auc_score(labels, probabilities, multi_class="ovr")

# special regression metric, which requires the whole dataset to evaluate.
@cflearn.register_metric("my_corr", allow_duplicate=False)
class MyCorrelation(cflearn.MetricInterface):
    # True means that the larger this metric is, the better.
    @property
    def is_positive(self) -> bool:
        return True

    #   True means that this metric requires the entire dataset to evaluate.
    #   For correlation, it is better to set it to True because correlation works better
    # with more data. But this is kind of a trade-off: requiring the full dataset can
    # indeed increase the accuracy of correlation estimation, but will also have much
    # greater impact on your RAM.
    @property
    def requires_all(self) -> bool:
        return True

    # predictions : [N, K]
    # labels      : [N, K]
    def forward(self, predictions: np.ndarray, labels: np.ndarray) -> float:
        return corr(predictions, labels, get_diagonal=True).mean().item()

## Usages

In [3]:
logits = np.random.random([100, 1]) - 0.5
labels = (logits > 0).astype(int)
my_binary_accuracy = cflearn.api.make_metric("my_binary_accuracy")
print(my_binary_accuracy.core.forward(logits, labels))

predictions = np.random.random([100, 1])
labels = predictions - 0.01
my_l1 = cflearn.api.make_metric("my_l1")
print(my_l1.core.forward(predictions, labels))

logits = np.random.random([100, 2])
labels = np.argmax(logits, axis=1, keepdims=True)
my_auc = cflearn.api.make_metric("my_auc")
print(my_auc.core.forward(logits, labels))
logits = np.random.random([100, 10])
labels = np.argmax(logits, axis=1, keepdims=True)
print(my_auc.core.forward(logits, labels))
logits[range(100), labels.ravel()] += 1.0
print(my_auc.core.forward(logits, labels))

predictions = np.random.random([100, 1])
labels = predictions - 0.01
my_corr = cflearn.api.make_metric("my_corr")
print(my_corr.core.forward(predictions, labels))
predictions = np.random.random([100, 10])
labels = predictions - 0.01
print(my_corr.core.forward(predictions, labels))

1.0
0.010000000000000007
1.0
0.9687798633063698
1.0
0.9999999999999996
1.0


# CV Metrics

> For **C**omputer **V**ision metrics, if the metric is image-based, we should never set its `requires_all` to `True` because it will be a disaster to put all your images to RAM.

In [4]:
# Intersection over Union, only supports binary situations
@cflearn.register_metric("my_iou", allow_duplicate=True)
class MyIOU(cflearn.MetricInterface):
    # True means that the larger this metric is, the better.
    @property
    def is_positive(self) -> bool:
        return True
    
    # K ∈ {1, 2}
    # logits : [N, K, H, W]
    # labels : [N, 1, H, W]
    def forward(self, logits: np.ndarray, labels: np.ndarray) -> float:
        return iou(logits, labels).mean().item()

## Usages

In [5]:
logits = np.random.random([4, 2, 224, 224])
labels = (logits[:, [1]] > 0.5).astype(int)
concat = np.concatenate([1 - labels, labels], axis=1)
my_iou = cflearn.api.make_metric("my_iou")
print(my_iou.core.forward(logits, labels))
print(my_iou.core.forward(concat, labels))
print(my_iou.core.forward(concat * 5, labels))
print(my_iou.core.forward(concat * 10, labels))
print(my_iou.core.forward(concat * 50, labels))
print(my_iou.core.forward((labels - 0.5) * 2, labels))
print(my_iou.core.forward((labels - 0.5) * 10, labels))
print(my_iou.core.forward((labels - 0.5) * 20, labels))
print(my_iou.core.forward((labels - 0.5) * 100, labels))

0.38977155824353243
0.5762673693012536
0.9867113608015362
0.9999092642198215
1.0
0.5762673693012536
0.9867113608015362
0.9999092642198215
1.0


# Integration

After defining our own metrics, we need to know how to integrate them in existing `carefree-learn` pipelines for training, testing and deploying. Basically, metrics could be specified across various APIs with `metric_names` and `metric_configs`. We will use `fit_ml` to demonstrate the core concepts, and the same recipes could be applied elsewhere.

## Preparations

In [6]:
x = np.random.random([100, 5])
y = np.random.randint(0, 2, [100, 1])
common_kwargs = dict(
    x_train=x,
    y_train=y,
    x_valid=x,
    y_valid=y,
    core_name="linear",
    input_dim=5,
    output_dim=1,
    is_classification=True,
    fixed_steps=1,
)

## Single Metric

In this case, the `metric_names` should be an `str`, and the `metric_configs` should be the `kwargs` that will be passed to your metric's `__init__` method:

In [7]:
@cflearn.register_metric("my_foo_metric", allow_duplicate=False)
class MyFooMetric(cflearn.MetricInterface):
    def __init__(self, foo):
        super().__init__()
        self.foo = foo
        print(f"\n>>>>>> MyFooMetric.foo: {foo}\n")

    @property
    def is_positive(self) -> bool:
        return True

    def forward(self, logits, labels) -> float:
        return self.foo


m = cflearn.api.fit_ml(
    metric_names="my_foo_metric",
    metric_configs=dict(foo=1.2345),
    **common_kwargs,
)


>>>>>> MyFooMetric.foo: 1.2345

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimat

> You might notice that `carefree-learn` will print out the metrics and the final score of current model. The final score is simply a 'mean' of every metric, except that the metrics with `is_positive=False` will be the opposite during the calculation:

In [8]:
@cflearn.register_metric("my_foo_negative_metric", allow_duplicate=False)
class MyFooNegativeMetric(cflearn.MetricInterface):
    def __init__(self, foo):
        super().__init__()
        self.foo = foo
        print(f"\n>>>>>> MyFooNegativeMetric.foo: {foo}\n")

    @property
    def is_positive(self) -> bool:
        return False

    def forward(self, logits, labels) -> float:
        return self.foo


m = cflearn.api.fit_ml(
    metric_names="my_foo_negative_metric",
    metric_configs=dict(foo=1.2345),
    **common_kwargs,
)


>>>>>> MyFooNegativeMetric.foo: 1.2345

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00

> As shown above, the final score is now negative.

## Multiple Metrics

In this case, the `metric_names` should be a list of `str`, and the `metric_configs` should be a `dict`, where the keys are the names and the values will be passed to the corresponding `__init__` method:

In [9]:
@cflearn.register_metric("my_bar_metric", allow_duplicate=False)
class MyFooMetric(cflearn.MetricInterface):
    def __init__(self, bar):
        super().__init__()
        self.bar = bar
        print(f"\n>>>>>> MyBarMetric.bar: {bar}\n")

    @property
    def is_positive(self) -> bool:
        return True

    def forward(self, logits, labels) -> float:
        return self.bar


m = cflearn.api.fit_ml(
    metric_names=["my_foo_metric", "my_bar_metric"],
    metric_configs=dict(
        my_foo_metric=dict(foo=1.2345),
        my_bar_metric=dict(bar=2.3456),
    ),
    **common_kwargs,
)


>>>>>> MyFooMetric.foo: 1.2345


>>>>>> MyBarMetric.bar: 2.3456

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.

## Weighted Metrics

`carefree-learn` also supports weighted metrics. The formula under the hood is:

$$
\text{score}=\frac1{\sum_{i=1}^{k}w_i}\cdot\sum_{i=1}^{k}\hat m_i(\hat y, y)\cdot w_i
$$

the default value of $w_i$ is $1$, and

$$
\hat m_i(\hat y, y)\triangleq I(\text m_i)\cdot \text{m}_i\text{.forward}(\hat y, y)
$$

where $\text m_i$ is the $i$th metric, and

$$
I(\text m_i)\triangleq
\begin{cases}
 1, & \text{if m}_i\text{.is_positive} \\
 -1, & \text{otherwise}
\end{cases}
$$

In order to use weighted metrics, we can specify `metric_weights`, where the keys are the metric names and the values are the weights:

In [10]:
m = cflearn.api.fit_ml(
    metric_names=["my_foo_metric", "my_bar_metric"],
    metric_configs=dict(
        my_foo_metric=dict(foo=1.2345),
        my_bar_metric=dict(bar=2.3456),
    ),
    metric_weights=dict(
        my_foo_metric=0.123,
        my_bar_metric=0.234,
    ),
    **common_kwargs,
)

print(">>> target score", (1.2345 * 0.123 + 2.3456 * 0.234) / (0.123 + 0.234))


>>>>>> MyFooMetric.foo: 1.2345


>>>>>> MyBarMetric.bar: 2.3456

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.

## Complex Metrics

In some complex situations, our inputs / labels may have multiple values (e.g. multi-task problems). `carefree-learn` supports your custom metrics receiving a `dict` of `np.ndarray`s for such cases:

In [11]:
@cflearn.register_metric("my_complex_metric", allow_duplicate=False)
class MyComplexMetric(cflearn.MetricInterface):
    @property
    def is_positive(self) -> bool:
        return True

    def forward(
        self,
        np_outputs: Dict[str, np.ndarray],
        np_batch: Dict[str, np.ndarray],
    ) -> float:
        print(f"\n>>> detected prediction keys : {list(np_outputs.keys())}")
        print(f">>> detected input      keys : {list(np_batch.keys())}\n")
        return 0.0

m = cflearn.api.fit_ml(
    train_others={"foo_input": np.random.random(x.shape)},
    valid_others={"bar_input": np.random.random(x.shape)},
    metric_names="my_complex_metric",
    **common_kwargs,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
--------

> You might notice that only `bar_input` is available. This is as expected because `carefree-learn` will use validation data to calculate the metrics. 

It's important to keep the first & second arguments **EXACTLY** as `np_outputs` & `np_batch`, because `carefree-learn` can therefore know that you require the full data instead of one single `np.ndarray`.

You can also simplify your implementation with this design if you only require parts of the full data:

In [12]:
@cflearn.register_metric("my_complex_metric2", allow_duplicate=False)
class MyComplexMetric2(cflearn.MetricInterface):
    @property
    def is_positive(self) -> bool:
        return True

    def forward(
        self,
        predictions: np.ndarray,
        np_batch: Dict[str, np.ndarray],
    ) -> float:
        print(f"\n>>> detected predictions : {predictions.shape}")
        print(f">>> detected input keys  : {list(np_batch.keys())}\n")
        return 0.0

m = cflearn.api.fit_ml(
    train_others={"foo_input": np.random.random(x.shape)},
    valid_others={"bar_input": np.random.random(x.shape)},
    metric_names="my_complex_metric2",
    **common_kwargs,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
--------

For some rare scenarios, we may even need the entire `DataLoader` to calculate our metrics. This is also accessible in `carefree-learn` by simply add a `loader` argument to the `forward` method:

In [13]:
@cflearn.register_metric("my_metric_with_loader", allow_duplicate=False)
class MyMetricWithLoader(cflearn.MetricInterface):
    @property
    def is_positive(self) -> bool:
        return True

    def forward(self, logits, labels, loader) -> float:
        print(f"\n>>> loader      : {loader}")
        print(f">>> loader.data : {loader.data}\n")
        return 0.0

m = cflearn.api.fit_ml(
    metric_names="my_metric_with_loader",
    **common_kwargs,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    Linear                                   [-1, 5]                                  [-1, 1]                    6
      Linear                                 [-1, 5]                                  [-1, 1]                    6
Total params: 6
Trainable params: 6
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
--------