## Chassisml Example Notebooks
Welcome to the examples section for [Chassis.ml](https://chassis.ml), which contains notebooks that leverage Chassisml to auto-containerize models built using the most common machine learning frameworks. 

**NOTE:** Chassisml provides two key functionalities: 
1. Create a Docker container from your model code and push that container image to a Docker registry. This is the default behavior.
2. Should you pass valid Modzy credentials as optional parameters, Chassisml will take the container and upload it directly to the Modzy environment you specify. You will notice most of these notebooks deploy the model to one of the Modzy internal development environments.   

Can't find the framework you are looking for or need help? Fork this repository and open a PR, we're always interested in growing this example bank! 

The primary maintainers of Chassis also actively monitor our [Discord Server](https://discord.gg/tdfXFY2y), so feel free to join and ask any questions you might have. We'll be there to respond and help out promptly. 

In [11]:
import chassisml
import numpy as np
import getpass
import json
import pandas as pd
from io import StringIO
import lightgbm as lgb
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

## Enter credentials
Dockerhub creds and Modzy API Key

In [3]:
dockerhub_user = getpass.getpass('docker hub username')
dockerhub_pass = getpass.getpass('docker hub password')
modzy_api_key = getpass.getpass('modzy api key')

docker hub username········
docker hub password········
modzy api key········


## Train model

In [5]:
# load breast cancer dataset
df = pd.read_csv('./data/Breast_cancer_data.csv')
print(df.head())
print(df['diagnosis'].value_counts())

   mean_radius  mean_texture  mean_perimeter  mean_area  mean_smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   diagnosis  
0          0  
1          0  
2          0  
3          0  
4          0  
1    357
0    212
Name: diagnosis, dtype: int64


In [21]:
# preprocess data
X = df[['mean_radius','mean_texture','mean_perimeter','mean_area','mean_smoothness']]
y = df['diagnosis']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

# save sample data for testing
with open("./data/sample_cancer_data.csv", 'w') as sample_out:
    X_test[:10].to_csv(sample_out, index=False)

In [10]:
# build and train model
clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train)

LGBMClassifier()

In [12]:
# make predictions and evaluate model accuracy
y_pred=clf.predict(X_test)

accuracy=accuracy_score(y_pred, y_test)
print('LightGBM Model accuracy score: {0:0.4f}'.format(accuracy_score(y_test, y_pred)))

LightGBM Model accuracy score: 0.9298


## Prepare context dict
Initialize anything here that should persist across inference runs

In [31]:
# This will be passed to Chassis:
context = {
    "model": clf,
    "classes": ["No Cancer", "Cancer"]
}

## Write process function

* Must take bytes and context dict as input
* Preprocess bytes, run inference, postprocess model output, return results

In [32]:
def process(input_bytes,context):
    inputs = pd.read_csv(StringIO(str(input_bytes, "utf-8")))
    preds = context["model"].predict_proba(inputs)
    
    inference_result = {
        "classPredictions": [
            {"row": i+1, "class": context["classes"][np.argmax(pred)], "score": np.max(pred)} for i, pred in enumerate(preds)
        ]
    }
    
    structured_output = {
        "data": {
            "result": inference_result,
            "explanation": None,
            "drift": None,
        }
    }
    
    return structured_output

## Initialize Chassis Client
We'll use this to interact with the Chassis service

In [33]:
chassis_client = chassisml.ChassisClient("http://localhost:5000")

## Create and test Chassis model
* Requires `context` dict containing all variables which should be loaded once and persist across inferences
* Requires `process_fn` defined above

In [34]:
# create Chassis model
chassis_model = chassis_client.create_model(context=context,process_fn=process)

# test Chassis model locally (can pass filepath, bufferedreader, bytes, or text here):
sample_filepath = 'data/sample_cancer_data.csv'
results = chassis_model.test(sample_filepath)
print(results)

b'{"data":{"result":{"classPredictions":[{"row":1,"class":"No Cancer","score":0.9190364807929879},{"row":2,"class":"Cancer","score":0.9786904097867372},{"row":3,"class":"Cancer","score":0.9999388760861525},{"row":4,"class":"Cancer","score":0.9983723025640089},{"row":5,"class":"Cancer","score":0.9996343485822657},{"row":6,"class":"Cancer","score":0.9629658081706469},{"row":7,"class":"Cancer","score":0.9993079644951554},{"row":8,"class":"Cancer","score":0.9998238024829422},{"row":9,"class":"Cancer","score":0.9998382123967896},{"row":10,"class":"Cancer","score":0.9999329888634424}]},"explanation":null,"drift":null}}'


In [35]:
# test environment and model within Chassis service, must pass filepath here:

# dry run before build
test_env_result = chassis_model.test_env(sample_filepath)
print(test_env_result)

Starting test job... Ok!
{'model_output': 'Single input prediction:\n\nb\'{"data":{"result":{"classPredictions":[{"row":1,"class":"No Cancer","score":0.9190364807929879},{"row":2,"class":"Cancer","score":0.9786904097867372},{"row":3,"class":"Cancer","score":0.9999388760861525},{"row":4,"class":"Cancer","score":0.9983723025640089},{"row":5,"class":"Cancer","score":0.9996343485822657},{"row":6,"class":"Cancer","score":0.9629658081706469},{"row":7,"class":"Cancer","score":0.9993079644951554},{"row":8,"class":"Cancer","score":0.9998238024829422},{"row":9,"class":"Cancer","score":0.9998382123967896},{"row":10,"class":"Cancer","score":0.9999329888634424}]},"explanation":null,"drift":null}}\'\n'}


## Publish model to Modzy
Need to provide model name, model version, Dockerhub credentials, and required Modzy info

In [39]:
MODZY_URL = "https://integration.modzy.engineering/api"

response = chassis_model.publish(
    model_name="Chassis LightGBM Breast Cancer Classification",
    model_version="0.0.1",
    registry_user=dockerhub_user,
    registry_pass=dockerhub_pass,
    modzy_api_key=modzy_api_key,
    modzy_sample_input_path=sample_filepath,
    modzy_url=MODZY_URL
)

job_id = response.get('job_id')
final_status = chassis_client.block_until_complete(job_id)

Starting build job... Ok!


In [40]:
if chassis_client.get_job_status(job_id)["result"] is not None:
    print("New model URL: {}".format(chassis_client.get_job_status(job_id)["result"]["container_url"]))
else:
    print("Chassis job failed \n\n {}".format(chassis_client.get_job_status(job_id)))

New model URL: https://integration.modzy.engineering/models/cla9darwuf/0.0.1


## Run sample job using Modzy SDK
Submit inference job to our newly-deploy model running on Modzy

In [None]:
from modzy import ApiClient

client = ApiClient(base_url='https://integration.modzy.engineering/api', api_key=modzy_api_key)

input_name = final_status['result']['inputs'][0]['name']
model_id = final_status['result'].get("model").get("modelId")
model_version = final_status['result'].get("version")

inference_job = client.jobs.submit_file(model_id, model_version, {input_name: sample_filepath})
inference_job_result = client.results.block_until_complete(inference_job, timeout=None)
inference_job_results_json = inference_job_result.get_first_outputs()['results.json']
print(inference_job_results_json)