# Deploying a scikit-learn model on Verta

Within Verta, a "Model" can be any arbitrary function: a traditional ML model (e.g., sklearn, PyTorch, TF, etc); a function (e.g., squaring a number, making a DB function etc.); or a mixture of the above (e.g., pre-processing code, a DB call, and then a model application.) See more [here](https://docs.verta.ai/verta/registry/concepts).

This notebook provides an example of how to deploy a scikit-learn model on Verta as a Verta Standard Model either via  convenience functions or by extending [VertaModelBase](https://verta.readthedocs.io/en/master/_autogen/verta.registry.VertaModelBase.html?highlight=VertaModelBase#verta.registry.VertaModelBase).

## 0. Imports

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

### 0.1 Verta import and setup

In [2]:
# restart your notebook if prompted on Colab
try:
    import verta
except ImportError:
    !pip install verta

In [3]:
import os

# Ensure credentials are set up, if not, use below
# os.environ['VERTA_EMAIL'] = ''
# os.environ['VERTA_DEV_KEY'] = ''
# os.environ['VERTA_HOST'] = ''

from verta import Client
client = Client(os.environ['VERTA_HOST'])

## 1. Model Training

### 1.1 Load training data

In [4]:
# Load data
melbourne_file_path = 'melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 

# Filter rows with missing values
melbourne_data = melbourne_data.dropna(axis=0)

# Choose target and features
y = melbourne_data.Price
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 'Distance',
                        'YearBuilt', 'Car', 'Propertycount']
X = melbourne_data[melbourne_features]

# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state = 0)

### 1.2 Train/test code

In [5]:
forest_model = RandomForestRegressor(random_state=1)
forest_model.fit(X_train, y_train)
melb_preds = forest_model.predict(X_test)
print(mean_absolute_error(y_test, melb_preds))

## 2. Register Model for deployment

In [6]:
import cloudpickle
with open("model.pkl", "wb") as f:
    cloudpickle.dump(forest_model, f)

In [7]:
from verta.registry import VertaModelBase

class HousingPriceRegressor(VertaModelBase):
    def __init__(self, artifacts):
        self.model = cloudpickle.load(open(artifacts["serialized_model"], "rb"))
        
    def predict(self, batch_input):
        results = []
        for one_input in batch_input:
            results.append(self.model.predict(one_input))
        return results

In [8]:
artifacts_dict = {"serialized_model" : "model.pkl"}
clf = HousingPriceRegressor(artifacts_dict)
clf.predict([X_test.values.tolist()[:5]])

In [9]:
from verta import Client
client = Client(os.environ['VERTA_HOST'])
registered_model = client.get_or_create_registered_model(
    name="melbourne-housing-data")

In [10]:
from verta.environment import Python
from verta.utils import ModelAPI

model_version = registered_model.create_standard_model(
    model_cls=HousingPriceRegressor,
    environment=Python(requirements=["scikit-learn"]),
    artifacts=artifacts_dict,
    name="v0",
    model_api=ModelAPI(X_train, y_train)
)

## 3. Deploy model to endpoint

In [11]:
endpoint = client.get_or_create_endpoint("melbourne-housing-data")
endpoint.update(model_version, wait=True)

In [12]:
deployed_model = endpoint.get_deployed_model()
deployed_model.predict([X_train.values.tolist()[:1]])