  
<td>
    <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

----

# Model Diagnostics - Custom Metrics Basics


* Measuring model quality is critical to efficiently building models. It is important that the metrics used to measure model quality closely align with the business objectives for the model. Otherwise, slight changes in model quality, as they related to these core objectives, are lost to noise. Custom metrics enables users to measure model quality in terms of their exact business goals. By incorporating custom metrics into workflows, users can:
    * Iterate faster
    * Measure and report on model quality
    * Understand marginal value of additional labels and modeling efforts


* For an end-to-end demo of diagnostics using custom metrics checkout this [notebook](custom_metrics_demo.ipynb)



## Environment Setup

Install dependencies

In [None]:
!pip install -q "labelbox[data]"

Import libraries

In [1]:
from labelbox.data.serialization import NDJsonConverter
from labelbox.data.annotation_types import (
    ScalarMetric, 
    Label, 
    ImageData, 
    Point, 
    Rectangle, 
    ObjectAnnotation,
    ClassificationAnnotation,
    ClassificationAnswer,
    Radio
)

## Custom Metrics
* Users can provide metrics at the following levels of granularity:
    1. data rows
    2. features
    3. subclasses
* Additionally, metrics can be given custom names to best describe what they are measuring.
    
* Limits and Behavior:
    * At a data row cannot have more than 20 metrics
    * Metrics are upserted, so if a metric already exists, its value will be replaced
    * Metrics can have values in the range [0,100000]
* Currently `ScalarMetric`s and `ConfusionMatrixMetric`s are supported. 

### ScalarMetric
    * A `ScalarMetric` is a metric with just a single scalar value.

In [None]:
from labelbox.data.annotation_types import (
    ScalarMetric, 
    ScalarMetricAggregation, 
    ConfusionMatrixMetric
)

In [3]:
data_row_metric = ScalarMetric(
    metric_name = "iou",
    value = 0.5
)

feature_metric = ScalarMetric(
    metric_name = "iou",
    feature_name = "cat",
    value = 0.5
)

subclass_metric = ScalarMetric(
    metric_name = "iou",
    feature_name = "cat",
    subclass_name = "organge",
    value = 0.5
)


### ConfusionMatrixMetric
- A `ConfusionMatrixMetric` contains 4 numbers [True postivie, False Postive, True Negative, False Negateive]
- Confidence is also supported a key value pairs, where the score is the key and the value is the metric value.
- In the user interface, these metrics are used to derive precision,recall, and f1 scores. The reason these are not directly uploaded is that the raw data allows us to do processing on the front end.


In [4]:

data_row_metric = ConfusionMatrixMetric(
    metric_name = "50pct_iou",
    feature_name = "cat",  
    subclass_name = "organge",    
    value = [1,0,1,0]
)


feature_metric = ConfusionMatrixMetric(
    metric_name = "50pct_iou",
    feature_name = "cat",  
    subclass_name = "organge",    
    value = [1,0,1,0]
)

subclass_metric = ConfusionMatrixMetric(
    metric_name = "50pct_iou",
    feature_name = "cat",  
    subclass_name = "organge",    
    value = [1,0,1,0]
)


### Confidence
* Users can provide confidence scores along with metrics
* This enables them to explore their model performance without necessarily knowing the optimal thresholds for each class.
* Users can filter on confidence and value in the UI to perform powerful queries.
* The keys represent a confidence score (must be between 0 and 1) and the values represent either a scalar metric or for confusion matrix metrics [TP,FP,TN,FN]

In [5]:

confusion_matrix_metric_with_confidence = ConfusionMatrixMetric(
    metric_name = "confusion_matrix_50pct_iou",
    feature_name = "cat",  
    subclass_name = "organge",    
    value = {0.1 : [1,0,1,0], 0.3 : [1,0,1,0], 0.5 : [1,0,1,0], 0.7 : [1,0,1,0], 0.9 : [1,0,1,0]}
)

scalar_metric_with_confidence = ScalarMetric(
    metric_name = "iou",
    value = {0.1 : 0.2, 0.3 : 0.25, 0.5 : 0.3, 0.7 : 0.4, 0.9: 0.3}
)



### Aggregations
* This is an optional field on the `ScalarMetric` object (by default it uses Arithmetic Mean).
* Aggregations occur in two cases:
    1. When a user provides a feature or subclass level metric, Labelbox automatically aggregates all metrics with the same parent to create a value for that parent.
        * E.g. A user provides cat and dog iou. The data row level metric for iou is the average of both of those.
        * The exception to this is when the data row level iou is explicitly set, then the aggregation will not take effect (on a per data row basis). 
    2. When users create slices or want aggregate statistics on their models, the selected aggregation is applied.

In [6]:
"""
If the following metrics are uploaded then
in the web app, users will see:
true positives dog = 4
true positives cat = 3
true positives = 7
"""

feature_metric = ScalarMetric(
    metric_name = "true_positives",
    feature_name = "cat",
    value = 3,
    aggregation = ScalarMetricAggregation.SUM
)

feature_metric = ScalarMetric(
    metric_name = "true_positives",
    feature_name = "dog",
    value = 4,
    aggregation = ScalarMetricAggregation.SUM
)


### Built-in Metrics:
* The SDK Provides a set of default metrics that make metrics easy to use.
1. `confusion_matrix_metric()`
    * Computes a single confusion matrix metric for all the predictions and labels provided. 
2. `miou_metric()`
    * Computes a single iou score for all predictions and labels provided    
3. `feature_confusion_matrix_metric()`
    * Computes the iou score for each of the classes found in the predictions and labels
4. `feature_miou_metric()`
     * Computes a confusion matrix metric for each of the classes found in the predictions and labels
------
* Note that all of these functions expect the prediction and ground truth annotations to correspond to the same data row. These functions should be called for each data row that you need metrics for.

In [None]:
from labelbox.data.metrics import (
    feature_miou_metric, 
    miou_metric, 
    confusion_matrix_metric, 
    feature_confusion_matrix_metric
)

In [7]:

predictions = [
    ObjectAnnotation(
        name="cat",
        value=Rectangle(start=Point(x=0, y=0),
                        end=Point(x=10, y=10))
    )
]
        
ground_truths = [
    ObjectAnnotation(
        name="cat",
        value=Rectangle(start=Point(x=0, y=0),
                        end=Point(x=8, y=8)))
]

In [8]:
print(feature_miou_metric(ground_truths, predictions))
print(miou_metric(ground_truths, predictions))
print(confusion_matrix_metric(ground_truths, predictions))
print(feature_confusion_matrix_metric(ground_truths, predictions))

[ScalarMetric(value=0.64, feature_name='cat', subclass_name=None, extra={}, metric_name='iou', aggregation=<ScalarMetricAggregation.ARITHMETIC_MEAN: 'ARITHMETIC_MEAN'>)]
[ScalarMetric(value=0.64, feature_name=None, subclass_name=None, extra={}, metric_name='iou', aggregation=<ScalarMetricAggregation.ARITHMETIC_MEAN: 'ARITHMETIC_MEAN'>)]
[ConfusionMatrixMetric(value=(1, 0, 0, 0), feature_name=None, subclass_name=None, extra={}, metric_name='50pct_iou', aggregation=<ConfusionMatrixAggregation.CONFUSION_MATRIX: 'CONFUSION_MATRIX'>)]
[ConfusionMatrixMetric(value=(1, 0, 0, 0), feature_name='cat', subclass_name=None, extra={}, metric_name='50pct_iou', aggregation=<ConfusionMatrixAggregation.CONFUSION_MATRIX: 'CONFUSION_MATRIX'>)]


In [9]:
# Adjust iou for iou calcuations.
# Set it higher than 0.64 and we get a false postive and a false negative for the other ground truth object.
print(feature_confusion_matrix_metric(ground_truths, predictions, iou = 0.9))

[ConfusionMatrixMetric(value=(0, 1, 0, 1), feature_name='cat', subclass_name=None, extra={}, metric_name='90pct_iou', aggregation=<ConfusionMatrixAggregation.CONFUSION_MATRIX: 'CONFUSION_MATRIX'>)]


In [10]:
# subclasses are included by default
predictions = [
    ObjectAnnotation(
        name="cat",
        value=Rectangle(start=Point(x=0, y=0),
                        end=Point(x=10, y=10)),
    classifications = [
        ClassificationAnnotation(
        name="height", value=Radio(answer=ClassificationAnswer(name="tall")))
    
    ])
]

ground_truths = [
    ObjectAnnotation(
        name="cat",
        value=Rectangle(start=Point(x=0, y=0),
                        end=Point(x=10, y=10)),
    classifications = [
        ClassificationAnnotation(
        name="height", value=Radio(answer=ClassificationAnswer(name="short")))
    ])
]
conf_matrix_metrics = feature_confusion_matrix_metric(ground_truths, predictions)
iou_metrics = feature_confusion_matrix_metric(ground_truths, predictions, include_subclasses = False)

In [11]:
print("Subclasses:", conf_matrix_metrics[0].value)
print("Excluding Subclasses:", iou_metrics[0].value)

Subclasses: (0, 1, 0, 1)
Excluding Subclasses: (1, 0, 0, 0)


### Uploading Custom Metrics
* Custom metrics are uploaded the same way as any MEA upload. NDJson must be created. Fortunately this is made easy with converter functions.
* First construct a metric annotation in one of two ways:
    1. Manually
    2. Using one of the provided functions `feature_miou_metric`, `miou_metric`, `confusion_matrix_metric`, `feature_confusion_matrix_metric`.
* Then add the metric annotation to a label ( This step associates the metrics with a data row)
* Convert to ndjson and upload

In [12]:
# Continuing with the last example:
metrics = [*conf_matrix_metrics, *iou_metrics]
labels = [Label(data = ImageData(uid = "cktiom8osh4210ytmevuk7lfh"), annotations = metrics)]
ndjson_predictions = list(NDJsonConverter.serialize(labels))
print(json.dumps(ndjson_predictions, indent = 2, sort_keys = True))
# We can upload these metric with other annotations
#model_run.add_predictions(f'diagnostics-import-{uuid.uuid4()}', ndjson_predictions)

[
  {
    "aggregation": "CONFUSION_MATRIX",
    "dataRow": {
      "id": "cktiom8osh4210ytmevuk7lfh"
    },
    "featureName": "cat",
    "metricName": "50pct_iou",
    "metricValue": [
      0,
      1,
      0,
      1
    ],
    "uuid": "f36e393d-e98a-498c-977a-181cde417921"
  },
  {
    "aggregation": "CONFUSION_MATRIX",
    "dataRow": {
      "id": "cktiom8osh4210ytmevuk7lfh"
    },
    "featureName": "cat",
    "metricName": "50pct_iou",
    "metricValue": [
      1,
      0,
      0,
      0
    ],
    "uuid": "c6f32f7b-9391-4ebe-8bf4-d3459ca1e65e"
  }
]
