# Deploying a scikit-learn model on Verta

Within Verta, a "Model" can be any arbitrary function: a traditional ML model (e.g., sklearn, PyTorch, TF, etc); a function (e.g., squaring a number, making a DB function etc.); or a mixture of the above (e.g., pre-processing code, a DB call, and then a model application.) See more [here](https://docs.verta.ai/verta/registry/concepts).

This notebook provides an example of how to deploy a scikit-learn model on Verta as a Verta Standard Model either via  convenience functions or by extending [VertaModelBase](https://verta.readthedocs.io/en/master/_autogen/verta.registry.VertaModelBase.html?highlight=VertaModelBase#verta.registry.VertaModelBase).

## 0. Imports

In [1]:
from __future__ import print_function

import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", category=ConvergenceWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

import itertools
import os
import time

import six

import numpy as np
import pandas as pd

import sklearn
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics

### 0.1 Verta import and setup

In [2]:
# restart your notebook if prompted on Colab
try:
    import verta
except ImportError:
    !pip install verta

In [3]:
import os

# Ensure credentials are set up, if not, use below
# os.environ['VERTA_EMAIL'] = 
# os.environ['VERTA_DEV_KEY'] = 
# os.environ['VERTA_HOST'] = 

from verta import Client
PROJECT_NAME = "Census"
EXPERIMENT_NAME = "sklearn"
client = Client(os.environ['VERTA_HOST'])
proj = client.set_project(PROJECT_NAME)
expt = client.set_experiment(EXPERIMENT_NAME)

## 1. Model Training

### 1.1 Load training data

In [4]:
try:
    import wget
except ImportError:
    !pip install wget  # you may need pip3
    import wget

In [5]:
train_data_url = "http://s3.amazonaws.com/verta-starter/census-train.csv"
train_data_filename = wget.detect_filename(train_data_url)
if not os.path.isfile(train_data_filename):
    wget.download(train_data_url)

test_data_url = "http://s3.amazonaws.com/verta-starter/census-test.csv"
test_data_filename = wget.detect_filename(test_data_url)
if not os.path.isfile(test_data_filename):
    wget.download(test_data_url)

In [6]:
df_train = pd.read_csv(train_data_filename)
X_train = df_train.iloc[:,:-1]
y_train = df_train.iloc[:, -1]

df_test = pd.read_csv(test_data_filename)
X_test = df_test.iloc[:,:-1]
y_test = df_test.iloc[:, -1]


df_train.head()

#### Define hyperparams

In [7]:
hyperparam_candidates = {
    'C': [1e-6, 1e-4],
    'solver': ['lbfgs'],
    'max_iter': [15, 28],
}
hyperparam_sets = [dict(zip(hyperparam_candidates.keys(), values))
                   for values
                   in itertools.product(*hyperparam_candidates.values())]

### 1.3 Train/test code

In [8]:
def run_experiment(hyperparams):
    # create object to track experiment run
    run = client.set_experiment_run()
    
    # create validation split
    (X_val_train, X_val_test,
     y_val_train, y_val_test) = model_selection.train_test_split(X_train, y_train,
                                                                 test_size=0.2,
                                                                 shuffle=True)

    # log hyperparameters
    run.log_hyperparameters(hyperparams)
    print(hyperparams, end=' ')
    
    # create and train model
    model = linear_model.LogisticRegression(**hyperparams)
    model.fit(X_train, y_train)
    
    # calculate and log validation accuracy
    val_acc = model.score(X_val_test, y_val_test)
    run.log_metric("val_acc", val_acc)
    print("Validation accuracy: {:.4f}".format(val_acc))
    
# NOTE: run_experiment() could also be defined in a module, and executed in parallel
for hyperparams in hyperparam_sets:
    run_experiment(hyperparams)

In [9]:
best_run = expt.expt_runs.sort("metrics.val_acc", descending=True)[0]
print("Validation Accuracy: {:.4f}".format(best_run.get_metric("val_acc")))

best_hyperparams = best_run.get_hyperparameters()
print("Hyperparameters: {}".format(best_hyperparams))

In [10]:
model = linear_model.LogisticRegression(multi_class='auto', **best_hyperparams)
model.fit(X_train, y_train)
train_acc = model.score(X_train, y_train)
print("Training accuracy: {:.4f}".format(train_acc))

## 2. Register Model for deployment

In [11]:
registered_model = client.get_or_create_registered_model(
    name="census-sklearn", labels=["tabular", "sklearn"])

### 2.1 Register from the model object
#### If you are in the same file where you have the model object handy, use the code below to package the model

In [12]:
from verta.environment import Python

model_version_v1 = registered_model.create_standard_model_from_sklearn(
    model,
    environment=Python(requirements=["scikit-learn"]),
    name="v1",
)

### 2.2 (OR) Register a serialized version of the model using the VertaModelBase

In [13]:
import cloudpickle
with open("model.pkl", "wb") as f:
    cloudpickle.dump(model, f)

In [14]:
from verta.registry import VertaModelBase

class CensusIncomeClassifier(VertaModelBase):
    def __init__(self, artifacts):
        self.model = cloudpickle.load(open(artifacts["serialized_model"], "rb"))
        
    def predict(self, batch_input):
        results = []
        for one_input in batch_input:
            results.append(self.model.predict(one_input))
        return results

In [15]:
artifacts_dict = {"serialized_model" : "model.pkl"}
clf = CensusIncomeClassifier(artifacts_dict)
clf.predict([X_test.values.tolist()[:5]])

In [16]:
model_version_v2 = registered_model.create_standard_model(
    model_cls=CensusIncomeClassifier,
    environment=Python(requirements=["scikit-learn"]),
    artifacts=artifacts_dict,
    name="v2"
)

## 3. Deploy model to endpoint

In [17]:
census_endpoint = client.get_or_create_endpoint("census-model")
census_endpoint.update(model_version_v1, wait=True)

In [18]:
deployed_model = census_endpoint.get_deployed_model()
deployed_model.predict(X_test.values.tolist()[:5])

In [19]:
census_endpoint.update(model_version_v2, wait=True)

In [20]:
deployed_model = census_endpoint.get_deployed_model()
deployed_model.predict([X_test.values.tolist()[:5]])

---