## Chassis Example Notebooks
Welcome to the examples section for [Chassis](https://chassis.ml), which contains notebooks that auto-containerize models built using the most common machine learning (ML) frameworks. 

#### What is Chassis?
Chassis allows you to automatically create a Docker container from your model code and push that container image to a Docker registry. All you need is your model loaded into memory and a few lines of Chassis code! Our example bank is here to provide reference examples for many common ML frameworks.  

Can't find the framework you are looking for or need help? Fork this repository and open a PR, or list the desired framework in a new issue. We're always interested in growing this example bank! 

The primary maintainers of Chassis also actively monitor our [Discord Server](https://discord.gg/cHpzY9yCcM), so feel free to join and ask any questions you might have. We'll be there to respond and help out promptly.

In [21]:
import cv2
import chassisml
from io import StringIO
import numpy as np
import pandas as pd
import getpass
import json
import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Enter credentials
Dockerhub creds and Modzy API Key

In [None]:
dockerhub_user = getpass.getpass('docker hub username')
dockerhub_pass = getpass.getpass('docker hub password')

## Load sample data and Train XGBoost Model

In [13]:
# load data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# save sample data for testing later
with open("data/sample_house_data.csv", "w") as f:
    X_test[:10].to_csv(f, index=False)

In [14]:
# build XGBoost regressor
regressor = xgb.XGBRegressor(
    n_estimators=100,
    reg_lambda=1,
    gamma=0,
    max_depth=3
)

In [15]:
# train model
regressor.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.300000012,
             max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=12,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

In [16]:
# examine feature attribution importance
pd.DataFrame(regressor.feature_importances_.reshape(1, -1), columns=boston.feature_names)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.02066,0.017033,0.012711,0.017154,0.084881,0.296908,0.009907,0.039272,0.018087,0.021914,0.088921,0.00966,0.362893


In [17]:
# run inference
y_pred = regressor.predict(X_test)

# evaluate model
mean_squared_error(y_test, y_pred)

7.658965947061991

## Write process function

* Must take bytes as input
* Preprocess bytes, run inference, postprocess model output, return results

In [39]:
def process(input_bytes):
    # load data
    inputs = pd.read_csv(StringIO(str(input_bytes, "utf-8")))    
    
    # run inference
    preds = regressor.predict(inputs)
    
    # structure results
    inference_result = {
        "housePricePredictions": [
            {"row": i+1, "price": preds[i].round(0)*1000} for i in range(len(preds))
        ]
    }

    structured_output = {
        "data": {
            "result": inference_result,
            "explanation": None,
            "drift": None,
        }
    }
    return structured_output

## Initialize Chassis Client
We'll use this to interact with the Chassis service

In [40]:
chassis_client = chassisml.ChassisClient("http://localhost:5000")

## Create and test Chassis model
* Requires `process_fn` defined above

In [41]:
# create Chassis model
chassis_model = chassis_client.create_model(process_fn=process)

# test Chassis model locally (can pass filepath, bufferedreader, bytes, or text here):
sample_filepath = './data/sample_house_data.csv'
results = chassis_model.test(sample_filepath)
print(results)

b'{"data":{"result":{"housePricePredictions":[{"row":1,"price":34000.0},{"row":2,"price":30000.0},{"row":3,"price":19000.0},{"row":4,"price":21000.0},{"row":5,"price":25000.0},{"row":6,"price":48000.0},{"row":7,"price":12000.0},{"row":8,"price":16000.0},{"row":9,"price":6000.0},{"row":10,"price":25000.0}]},"explanation":null,"drift":null}}'


In [42]:
# test environment and model within Chassis service, must pass filepath here:

# dry run before build
test_env_result = chassis_model.test_env(sample_filepath)
print(test_env_result)

Starting test job... Ok!
{'model_output': 'Single input prediction:\n\nb\'{"data":{"result":{"housePricePredictions":[{"row":1,"price":34000.0},{"row":2,"price":30000.0},{"row":3,"price":19000.0},{"row":4,"price":21000.0},{"row":5,"price":25000.0},{"row":6,"price":48000.0},{"row":7,"price":12000.0},{"row":8,"price":16000.0},{"row":9,"price":6000.0},{"row":10,"price":25000.0}]},"explanation":null,"drift":null}}\'\n'}


## Publish model to Dockerhub
Need to provide model name, model version, Dockerhub credentials

In [43]:
response = chassis_model.publish(
    model_name="XGBoost Boston Housing Price Predictions",
    model_version="0.0.1",
    registry_user=dockerhub_user,
    registry_pass=dockerhub_pass,
)

job_id = response.get('job_id')
final_status = chassis_client.block_until_complete(job_id)

Starting build job... Ok!


In [44]:
if chassis_client.get_job_status(job_id)["result"] is not None:
    print("New model URL: {}".format(chassis_client.get_job_status(job_id)["result"]["container_url"]))
else:
    print("Chassis job failed \n\n {}".format(chassis_client.get_job_status(job_id)))

New model URL: https://integration.modzy.engineering/models/1lwzghnayr/0.0.1
