# Deploying an H2O model on Verta

Within Verta, a "Model" can be any arbitrary function: a traditional ML model (e.g., sklearn, PyTorch, TF, etc); a function (e.g., squaring a number, making a DB function etc.); or a mixture of the above (e.g., pre-processing code, a DB call, and then a model application.) See more [here](https://docs.verta.ai/verta/registry/concepts).

This notebook provides an example of how to deploy an H2O model on Verta as a Verta Standard Model by extending [VertaModelBase](https://verta.readthedocs.io/en/master/_autogen/verta.registry.VertaModelBase.html?highlight=VertaModelBase#verta.registry.VertaModelBase).

## 0. Imports

In [None]:
# restart your notebook if prompted on Colab
!python -m pip install verta

In [None]:
import h2o
from h2o.estimators import H2OGradientBoostingEstimator

h2o.init()

### 0.1 Verta import and setup

In [None]:
import os

# Ensure credentials are set up, if not, use below
# os.environ['VERTA_EMAIL'] = ""
# os.environ['VERTA_DEV_KEY'] = ""
# os.environ['VERTA_HOST'] = ""

from verta import Client
client = Client(os.environ['VERTA_HOST'])

## 1. Model Training

### 1.1 Load training data

In [None]:
h2o_df = h2o.load_dataset("prostate.csv")
h2o_df["CAPSULE"] = h2o_df["CAPSULE"].asfactor()
h2o_df["GLEASON"] = h2o_df["GLEASON"].asfactor()

In [None]:
import time

h2o_model = H2OGradientBoostingEstimator(
    distribution="gaussian",
    ntrees=2,
    max_depth=2,
    learn_rate=1,
)
predictors = ["CAPSULE", "RACE", "PSA", "GLEASON"]
response = "AGE"
train, test = h2o_df.split_frame(ratios=[.75], seed=1234)
h2o_model.train(
    x=predictors,
    y=response,
    training_frame=train,
)

MODEL_PATH = "h2o_model_file" + str(time.time())
h2o.save_model(model=h2o_model, path=MODEL_PATH, force=True)
saved_model_path = os.path.join(MODEL_PATH, os.listdir(MODEL_PATH)[0])

## 2. Register Model for deployment

In [None]:
import os
import subprocess

import h2o
import pandas as pd

from verta.registry import VertaModelBase, verify_io

class H2OModelWrapper(VertaModelBase):
    def __init__(self, artifacts):
        import h2o
        import jdk
        # Based on our testing, making predictions requires about this much memory to avoid "Out Of Memory" crashes
        h2o.init()
        self.model = h2o.load_model(artifacts["serialized_model"])
    
    @verify_io
    def predict(self, model_input):
        frame = h2o.H2OFrame(pd.DataFrame(model_input))
        model_out1 = self.model.predict(frame)
        model_out2 = model_out1.as_data_frame().to_dict("records")
        return model_out2

In [None]:
from verta.environment import Python

model_version = client.get_or_create_registered_model(name="h2o_model").create_standard_model(
    model_cls=H2OModelWrapper,
    environment=Python(requirements=['h2o', 'install-jdk==0.3.0']),
    artifacts={"serialized_model":saved_model_path}
)

## 3. Deploy model to endpoint

In [None]:
from verta.endpoint.resources import Resources

h2o_endpoint = client.get_or_create_endpoint(path="/h2o")
# Ensure the endpoint has enough memory to handle h2o's 250M requirement
resources = Resources(cpu=.25, memory="512Mi")
h2o_endpoint.update(model_version, resources=resources, wait=True)

In [None]:
deployed_model = h2o_endpoint.get_deployed_model()

## 4. Predict

### 4.1 Quick-Predict with Json

In [None]:
data = {"CAPSULE":["0"], "RACE":["2"], "PSA":[51.9], "GLEASON":["6"]}
deployed_model.predict(data)

### 4.2 (OR) Predict with Test Data

In [None]:
for i in range(test.nrow):
    row = test[i, :].as_data_frame().to_dict("records")
    prediction = deployed_model.predict(row)
    print(prediction)

## 5. Cleanup

In [None]:
h2o.cluster().shutdown()
h2o_endpoint.delete()

---