# Deploying an H2O model on Verta

Within Verta, a "Model" can be any arbitrary function: a traditional ML model (e.g., sklearn, PyTorch, TF, etc); a function (e.g., squaring a number, making a DB function etc.); or a mixture of the above (e.g., pre-processing code, a DB call, and then a model application.) See more [here](https://docs.verta.ai/verta/registry/concepts).

This notebook provides an example of how to deploy an H2O model on Verta as a Verta Standard Model by extending [VertaModelBase](https://verta.readthedocs.io/en/master/_autogen/verta.registry.VertaModelBase.html?highlight=VertaModelBase#verta.registry.VertaModelBase).

## 0. Imports

In [4]:
# restart your notebook if prompted on Colab
!python -m pip install verta

Collecting verta
  Using cached verta-0.20.4-py2.py3-none-any.whl (954 kB)
Collecting pyyaml<6.0,>=5.1
  Downloading PyYAML-5.4.1.tar.gz (175 kB)
[K     |████████████████████████████████| 175 kB 6.8 MB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting click<9.0,>=7.0
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting pytimeparse<2.0,>=1.1.8
  Using cached pytimeparse-1.1.8-py2.py3-none-any.whl (10.0 kB)
Collecting requests<3.0,>=2.21
  Using cached requests-2.28.1-py3-none-any.whl (62 kB)
Collecting pathlib2<3.0,>=2.2
  Using cached pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Collecting googleapis-common-protos<2.0,>=1.5
  Using cached googleapis_common_protos-1.56.4-py2.py3-none-any.whl (211 kB)
Collecting protobuf<4.0,>=3.8
  Downloading protobuf-3.20.2-cp310-cp310-macosx_10_9_universal2.whl (1.1 MB)
[K     |███████████████

In [16]:
import h2o
from h2o.estimators import H2OGradientBoostingEstimator

h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_282"; OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08); OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)
  Starting server from /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/f7/r9486pbd77j4bppmmcpnxw6c0000gp/T/tmpmbqz88f0
  JVM stdout: /var/folders/f7/r9486pbd77j4bppmmcpnxw6c0000gp/T/tmpmbqz88f0/h2o_hmacdonald_started_from_python.out
  JVM stderr: /var/folders/f7/r9486pbd77j4bppmmcpnxw6c0000gp/T/tmpmbqz88f0/h2o_hmacdonald_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,America/Los_Angeles
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.36.1.4
H2O_cluster_version_age:,1 month and 23 days
H2O_cluster_name:,H2O_from_python_hmacdonald_ofeofj
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,1.778 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


### 0.1 Verta import and setup

In [19]:
import os

# Ensure credentials are set up, if not, use below
# os.environ['VERTA_EMAIL'] = ""
os.environ['VERTA_DEV_KEY'] = "3e1e7517-c92c-4d94-bad0-c3f82a1d6a29"
os.environ['VERTA_HOST'] = "staging.dev.verta.ai"

from verta import Client
client = Client(os.environ['VERTA_HOST'])

got VERTA_EMAIL from environment
got VERTA_DEV_KEY from environment
connection successfully established


## 1. Model Training

### 1.1 Load training data

In [20]:
h2o_df = h2o.load_dataset("prostate.csv")
h2o_df["CAPSULE"] = h2o_df["CAPSULE"].asfactor()
h2o_df["GLEASON"] = h2o_df["GLEASON"].asfactor()

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [21]:
import time

h2o_model = H2OGradientBoostingEstimator(
    distribution="gaussian",
    ntrees=2,
    max_depth=2,
    learn_rate=1,
)
predictors = ["CAPSULE", "RACE", "PSA", "GLEASON"]
response = "AGE"
train, test = h2o_df.split_frame(ratios=[.75], seed=1234)
h2o_model.train(
    x=predictors,
    y=response,
    training_frame=train,
)

MODEL_PATH = "h2o_model_file" + str(time.time())
h2o.save_model(model=h2o_model, path=MODEL_PATH, force=True)
saved_model_path = os.path.join(MODEL_PATH, os.listdir(MODEL_PATH)[0])

gbm Model Build progress: |██████████████████████████████████████████████████████| (done) 100%


## 2. Register Model for deployment

In [8]:
import os
import subprocess

import h2o
import pandas as pd

from verta.registry import VertaModelBase, verify_io

class H2OModelWrapper(VertaModelBase):
    def __init__(self, artifacts):
        import h2o
        h2o.init()
        self.model = h2o.load_model(artifacts["serialized_model"])
    
    @verify_io
    def predict(self, model_input):
        frame = h2o.H2OFrame(pd.DataFrame(model_input))
        model_out1 = self.model.predict(frame)
        model_out2 = model_out1.as_data_frame().to_dict("records")
        return model_out2

In [9]:
from verta.environment import Python

model_version = client.get_or_create_registered_model(name="h2o_model").create_standard_model(
    model_cls=H2OModelWrapper,
    environment=Python(requirements=['h2o']),
    artifacts={"serialized_model":saved_model_path}
)

created new RegisteredModel: h2o_model in workspace: OctReleaseTestingOrg
created new ModelVersion: ModelVersion 149081664315034294484
uploading serialized_model to Registry
uploading part 1
upload complete
uploading model to Registry
uploading part 1
upload complete
uploading model_api.json to Registry
uploading part 1
upload complete
uploading custom_modules to Registry
uploading part 1
upload complete


## 3. Deploy model to endpoint

In [None]:
from verta.endpoint.resources import Resources

h2o_endpoint = client.get_or_create_endpoint(path="/h2o")
# Ensure the endpoint has enough memory to handle h2o's high memory requirements. You may need to increase this number
resources = Resources(cpu=.25, memory="512Mi")
h2o_endpoint.update(model_version, resources=resources, wait=True)

waiting for update.......

In [11]:
deployed_model = h2o_endpoint.get_deployed_model()

## 4. Predict

### 4.1 Quick-Predict with Json

In [23]:
data = {"CAPSULE":["0"], "RACE":["2"], "PSA":[51.9], "GLEASON":["6"]}
deployed_model.predict(data)

[{'predict': 64.6670710707533}]

### 4.2 (OR) Predict with Test Data

In [22]:
for i in range(1):
    row = test[i, :].as_data_frame().to_dict("records")
    print(row)
    prediction = deployed_model.predict(row)
    print(prediction)

[{'ID': 4, 'CAPSULE': 0, 'AGE': 76, 'RACE': 2, 'DPROS': 2, 'DCAPS': 1, 'PSA': 51.2, 'VOL': 20, 'GLEASON': 7}]
[{'predict': 64.32512482651349}]


## 5. Cleanup

In [14]:
h2o.cluster().shutdown()
# h2o_endpoint.delete()

H2O session _sid_b4e5 closed.


---