# Custom Metrics

In {doc}`model_validation`, we described how to choose a built-in evaluation metric to guide the model selection. This tutorial, we will show how to add your custom metric to AutoGluon.

A metric measures model's performance by comparing the difference between true values and predicted values. In the following example, we implement an accuracy metric from scratch. 

In [12]:
import numpy as np

def my_accuracy(y_true, y_pred):
    return (y_true == y_pred).sum() / y_true.size

Verify its correctness with toy data.

In [13]:
y_true = np.array([0, 1, 0, 0])
y_pred = np.array([0, 1, 1, 0])

my_accuracy(y_true, y_pred)

0.75

Next we need to wrap our metric to {class}`autogluon.core.metrics.Scorer`, AutoGluon's class for metrics. The easy way to do it is through {func}`autogluon.core.metrics.make_scorer`. It needs to specify four arguments: 

- the string `name` that will appear in printing
- the metric function (`score_func`), which accepts two arguments, `y_true` and `y_pred`, to return a score
- the optimal value (`optimum`) when prediction is perfect. It is 1.0 for accuracy, and often 0.0 for a loss.
- if a larger returned value is better (`greater_is_better=True`), true for accuracy and false for a loss

Note that we need to save our code into a `.py` file so it can be pickled when saving models. Otherwise you will see errors such as `Can't pickle <function...`. We use the `writefile` magic to save the following code into `my_accuracy_ag.py`.

In [23]:
%%writefile my_accuracy_ag.py
from autogluon.core.metrics import make_scorer

def my_accuracy(y_true, y_pred):
    return (y_true == y_pred).sum() / y_true.size

my_accuracy_ag = make_scorer(
    name='accuracy', score_func=my_accuracy,
    optimum=1, greater_is_better=True)

Writing my_accuracy_ag.py


To use our metric during training, we need to import it and pass it to `fit` through the `eval_metric` argument.

In [15]:
#@title Load the knot theory data
from autogluon.tabular import TabularDataset, TabularPredictor

url = 'https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/'
train_data = TabularDataset(url+'train.csv')
test_data = TabularDataset(url+'test.csv')
label = 'signature'

In [24]:
from my_accuracy_ag import my_accuracy_ag

predictor = TabularPredictor(
    label=label, eval_metric=my_accuracy_ag).fit(train_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20220712_214440/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20220712_214440/"
AutoGluon Version:  0.5.0
Python Version:     3.9.12
Operating System:   Linux
Train Data Rows:    10000
Train Data Columns: 18
Label Column: signature
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	First 10 (of 13) unique label values:  [-2, 0, 2, -8, 4, -4, -6, 8, 6, 10]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9984
Train Data Class Count: 9
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Ava

We can also use it to evaluate models. For example, we specify the `extra_metrics` argument in the `leaderboard` method. You will find a new column whose name is the one we specified in `make_scorer`.

In [29]:
predictor.leaderboard(test_data, extra_metrics=[my_accuracy_ag], 
                      silent=True).head()

Unnamed: 0,model,score_test,accuracy,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,0.95,0.95,0.964965,0.298955,0.050804,15.585609,0.005762,0.000497,0.57559,2,True,14
1,LightGBM,0.9456,0.9456,0.955956,0.072992,0.025439,4.202675,0.072992,0.025439,4.202675,1,True,5
2,XGBoost,0.9448,0.9448,0.956957,0.064141,0.025232,6.077209,0.064141,0.025232,6.077209,1,True,11
3,LightGBMLarge,0.9444,0.9444,0.94995,0.145704,0.02791,9.42897,0.145704,0.02791,9.42897,1,True,13
4,CatBoost,0.9432,0.9432,0.955956,0.022018,0.009241,18.190912,0.022018,0.009241,18.190912,1,True,8


Beyond implementing metrics from scratch, we can wrap metrics from other libraries. Here are examples to wrap scikit-learn metrics. The first is the MSE loss for regression, whose optimal value is 0 and a smaller value is better.

In [32]:
import sklearn

mse_ag = make_scorer(
    name='mean_squared_error', score_func=sklearn.metrics.mean_squared_error,
    optimum=0, greater_is_better=False)

Then we wrap the area under the ROC curve for binary classification. Since we need multiple classification thresholds to compute the curve, we set `needs_threshold=True`.

In [None]:
roc_auc_ag = make_scorer(
    name='roc_auc', score_func=sklearn.metrics.roc_auc_score,
    optimum=1, greater_is_better=True, needs_threshold=True)