# Using Custom Metric Function

In this notebook, we will show an example of how to calculate custom performance metrics on an H2O model. The notebook will go through the following steps:

1. Train a GBM model in H2O
2. Write a script to calculate Mean Absolute Percent Error (MAPE)
3. Train a GBM model in H2O using MAPE as a [`custom_metric_func`](https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/dev/custom_functions.md)
4. Train a Grid of GBMs and choose model based on MAPE


## 1. Train a  GBM Model in H2O

In [1]:
import h2o
from h2o.automl import H2OAutoML
import random, os, sys
from datetime import datetime
import pandas as pd
import logging
import csv
import optparse
import time
import json
from distutils.util import strtobool
import psutil
import numpy as np

In [2]:
pct_memory=0.5
virtual_memory=psutil.virtual_memory()
min_mem_size=int(round(int(pct_memory*virtual_memory.available)/1073741824,0))
print(min_mem_size)

3


In [3]:
# 65535 Highest port no
port_no=random.randint(5555,55555)
h2o.init(strict_version_check=False,min_mem_size_GB=min_mem_size,port=port_no) # start h2o

Checking whether there is an H2O instance running at http://localhost:7235..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_121"; OpenJDK Runtime Environment (Zulu 8.20.0.5-macosx) (build 1.8.0_121-b15); OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-macosx) (build 25.121-b15, mixed mode)
  Starting server from /Users/bear/anaconda/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/lh/42j8mfjx069d1bkc2wlf2pw40000gn/T/tmpptr4zi7o
  JVM stdout: /var/folders/lh/42j8mfjx069d1bkc2wlf2pw40000gn/T/tmpptr4zi7o/h2o_bear_started_from_python.out
  JVM stderr: /var/folders/lh/42j8mfjx069d1bkc2wlf2pw40000gn/T/tmpptr4zi7o/h2o_bear_started_from_python.err
  Server is running at http://127.0.0.1:7235
Connecting to H2O server at http://127.0.0.1:7235... successful.


0,1
H2O cluster uptime:,01 secs
H2O cluster timezone:,America/New_York
H2O data parsing timezone:,UTC
H2O cluster version:,3.22.1.3
H2O cluster version age:,"14 days, 4 hours and 31 minutes"
H2O cluster name:,H2O_from_python_bear_xhvtoz
H2O cluster total nodes:,1
H2O cluster free memory:,3.556 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


In [4]:
# Import Data
train_path = "data/loan.csv"
train = h2o.import_file(train_path, destination_frame = "loan_train")

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [5]:
# Set target and predictor variables
y = "int_rate"
x = train.col_names
x.remove(y)
x.remove("bad_loan")

In [6]:
# Train GBM Model
from h2o.estimators import H2OGradientBoostingEstimator

gbm_v1 = H2OGradientBoostingEstimator(model_id = "gbm_v1.hex")

gbm_v1.train(y = y, x = x, training_frame = train)

gbm Model Build progress: |███████████████████████████████████████████████| 100%


In [7]:
print(gbm_v1)

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  gbm_v1.hex


ModelMetricsRegression: gbm
** Reported on train data. **

MSE: 10.889553004292022
RMSE: 3.2999322726825806
MAE: 2.6385683124779997
RMSLE: 0.23820098598547484
Mean Residual Deviance: 10.889553004292022
Scoring History: 


0,1,2,3,4,5,6
,timestamp,duration,number_of_trees,training_rmse,training_mae,training_deviance
,2019-02-09 00:20:50,0.023 sec,0.0,4.3919265,3.5248550,19.2890183
,2019-02-09 00:20:50,0.694 sec,1.0,4.2431782,3.4077985,18.0045609
,2019-02-09 00:20:50,0.892 sec,2.0,4.1176423,3.3086766,16.9549780
,2019-02-09 00:20:51,1.036 sec,3.0,4.0115703,3.2245441,16.0926965
,2019-02-09 00:20:51,1.225 sec,4.0,3.9217182,3.1531439,15.3798737
---,---,---,---,---,---,---
,2019-02-09 00:20:52,2.885 sec,16.0,3.4798396,2.8014681,12.1092835
,2019-02-09 00:20:53,3.038 sec,17.0,3.4659556,2.7895055,12.0128484
,2019-02-09 00:20:53,3.191 sec,18.0,3.4536410,2.7790570,11.9276360



See the whole table with table.as_data_frame()
Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
term,3314836.2500000,1.0,0.4574651
revol_util,2051101.2500000,0.6187640,0.2830629
purpose,387989.8750000,0.1170465,0.0535447
delinq_2yrs,317014.2812500,0.0956350,0.0437497
loan_amnt,274672.9375000,0.0828617,0.0379063
longest_credit_length,191578.5468750,0.0577943,0.0264389
verification_status,174609.3906250,0.0526751,0.0240970
home_ownership,155358.8125000,0.0468677,0.0214403
dti,117325.0937500,0.0353939,0.0161915





## 2. Write Script to Calculate Mean Absolute Percent Error (MAPE)

### Function to Calculate MAPE in H2O

In [8]:
def MAPE(actual, predict):
    abs_pct_error = abs((actual - predict) / actual)
    mape = abs_pct_error.mean()[0]
    return mape

In [9]:
mape_v1 = MAPE(train[y], gbm_v1.predict(train))
print("MAPE: " + str(round(mape_v1, 4)))

gbm prediction progress: |████████████████████████████████████████████████| 100%
MAPE: 0.2195


### Python Script to calculate MAPE in custom_metric_func

The MAPE metric is defined in a class stored in utils_model_metrics.py. This class contains three methods `map`, `reduce`, and `metric`. The `map` method takes 5 arguments `predicted`, `actual`, `weight`, `offset` and `model`.

```
class MapeMetric:
    def map(self, predicted, actual, weight, offset, model):
        return [weight * abs((actual[0] - predicted[0]) / actual[0]), weight]

    def reduce(self, left, right):
        return [left[0] + right[0], left[1] + right[1]]

    def metric(self, last):
        return last[0] / last[1]
```

This class definition is uploaded to the H2O cluster using [`h2o.upload_custom_metric`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/h2o.html?highlight=custom_metric#h2o.upload_custom_metric).

In [10]:
from utils_model_metrics import MapeMetric

mape_func = h2o.upload_custom_metric(MapeMetric, func_name = "MAPE", func_file = "mape.py")

In [11]:
type(mape_func)

str

In [12]:
print(mape_func)

python:MAPE=mape.MapeMetricWrapper


## 3. Train a GBM Model using custom_metric_func to calculate MAPE

The [`H2OGeneralizedLinearEstimator`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html?highlight=automl#h2ogeneralizedlinearestimator),
[`H2ORandomForestEstimator`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html?highlight=automl#h2orandomforestestimator), and
[`H2OGradientBoostingEstimator`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html?highlight=automl#h2ogradientboostingestimator) models accept a `custom_metric_func` argument.

In [13]:
# Train GBM Model with custom_metric_function
gbm_v2 = H2OGradientBoostingEstimator(model_id = "gbm_v2.hex",
                                      custom_metric_func = mape_func)

gbm_v2.train(y = y, x = x, training_frame = train)

gbm Model Build progress: |███████████████████████████████████████████████| 100%


In [14]:
perf = gbm_v2.model_performance()
perf


ModelMetricsRegression: gbm
** Reported on train data. **

MSE: 10.889553004292022
RMSE: 3.2999322726825806
MAE: 2.6385683124779997
RMSLE: 0.23820098598547484
Mean Residual Deviance: 10.889553004292022
MAPE: 0.21950013462414486




In [15]:
perf.custom_metric_name()

'MAPE'

In [16]:
perf.custom_metric_value()

0.21950013462414486

We can see that our custom mae function is in the model performance metrics labeled `mae`.  This value matches the MAE calculated in our original GBM model.

In [None]:
print("MAPE V1: " + str(round(mape_v1, 4)))
print("MAPE V2: " + str(round(gbm_v2.model_performance().custom_metric_value(), 4)))

MAPE V1: 0.2195
MAPE V2: 0.2195


## 4. Train a Grid of GBMs and choose model based on MAPE

In [None]:
from h2o.grid.grid_search import H2OGridSearch
gbm_hyper_parameters = {'max_depth': [7, 8, 9]}
gbm_grid = H2OGridSearch(H2OGradientBoostingEstimator(custom_metric_func = mape_func,
                                                      nfolds = 5),
                           gbm_hyper_parameters)
gbm_grid.train(x = x, y = y, training_frame = train, grid_id = "gbm_grid")

gbm Grid Build progress: |█████████████████████████████████████████

In [None]:
sorted([[h2o.get_model(x).model_performance(xval = True).custom_metric_value(), x] for x in gbm_grid.model_ids])

## Shutdown H2O Cluster

In [None]:
h2o.cluster().shutdown()