# Blue Classification

This notebook provides a look at the basic interface for creating a model in a Juptyer Workspace. We'll then upload it to Domo's Model Management interface where it can be deployed for real-time or batch inference.

In order to keep it simple, we won't use Machine Learning to train a model in this notebook. Instead, given a list of colored shapes, we'll define a simple algorithm that will classify the shape as blue, or not blue. 

In [1]:
import pandas as pd
from io import StringIO

In [2]:

data = [['Circle', 'Red'], ['Square', 'Blue'], ['Oval', 'Green'], ['Rectangle', 'Orange'], ['Rectangle', 'Pink']]
train_x = pd.DataFrame(data, columns=['Shape', 'Color'])

# For each row in the training data, 1 if the Color is Blue, and 0 otherwise. This is the value that we want to predict.
train_y = pd.DataFrame({'Blue': [0,1,0,0,0]})

# View first few rows of data when joined
train_x.join(train_y).head()

Unnamed: 0,Shape,Color,Blue
0,Circle,Red,0
1,Square,Blue,1
2,Oval,Green,0
3,Rectangle,Orange,0
4,Rectangle,Pink,0


## Hyperparameters

When leveraging machine learning to train the model, we want to keep track of the hyperparameters used to configure the training process. Although not used in this case, we'll set a couple of hyperparameters here as a reference, and show how they can be included as metadata with the model in the Model Management interface in a few steps.

In [3]:
hyperparameters = {
    "alpha": "2.35e-05",
    "lambda": "0.25"
}

## Model Training
At this point we would normally use a machine learning library to train a model to fit our training dataset. For the purposes of this notebook, our sample model has been defined already in [model.py](model.py)

## Validation
Now that we have a model, it's time to test. But first, a note on [model.py](model.py) and it's purpose.

[model.py](model.py) implements an `invoke` function:

    def invoke(data, content_type_header, accept_header):

For this simple model, the `invoke` function gives us a convenient place to implement our algorithm. But when we deploy the model in Domo, this function also acts as an entrypoint to execute our model. If executing the model in a dataflow is desired, the `invoke` function should both accept and return data as a csv string.

In order to ensure that your model is ready for deployment, we recommend always testing your model using the `invoke` function. It's usually best practice to keep some of the labeled examples separate from the training data to be used for validation/testing. This allows you to test how well the model generalizes to inputs not seen in the training dataset. Again, to keep things simple, we'll just test using the training dataset.

In [4]:
# Write training dataset as csv without headers or index column
train_csv = train_x.to_csv(header=False, index=False)

# Execute invoke function from model.py
from model import EndpointHandler
predicted_y = EndpointHandler().invoke(train_csv, content_type_header='text/csv', accept_header='text/csv')
print(predicted_y)

0
1
0
0
0



## Metrics

During training and validation we can calculate metrics to help us measure model performance. Example metrics are included below as a reference.

In addition to metric name and value, standard deviation and timestamp may be included.

In [5]:
from domojupyter.ai import Metric
from datetime import datetime

metrics = {
    "accuracy": 1.0,
    "recall": 1.0,
    "precision": 1.0
}
now = datetime.now()
domo_metrics = {k: Metric(k, v, None, now) for (k,v) in metrics.items()}

## Model Task

Domo let's you specify which task(s) your model is trained to perform, such as TEXT_GENERATION, CLASSIFICATION, or OTHER if you don't see an appropriate ModelTaskType.

Model input and output may also be configured as part of the task definition. We'll configure the input and output type as CSV so that we can execute our model using the Model Inference tile in Magic ETL.

In [6]:
from domojupyter.ai import ModelTask, ModelTaskType
from domojupyter.ai import CSVModelIOConfiguration

# Infer the input column names and types from our training dataset
input_config = CSVModelIOConfiguration(data_frame=train_x)
# Infer the output column names and types from our training label dataset
output_config = CSVModelIOConfiguration(data_frame=train_y)
task = ModelTask(ModelTaskType.CLASSIFICATION, input_config=input_config, output_config=output_config)

## Kernel Snapshots

Domo's Jupyter Workspaces allow you to customize your environment by installing 3rd party libraries. To ensure that the model hosting environment matches your customized Jupyter environment, a Snapshot is created of the conda environment running the Jupyter kernel. 

A Kernel Snapshot is automatically created the first time you create a model in a workspace. If one or more Snapshots already exist, the most recent snapshot is used for your model. If your environment has changed and you need to create a new snapshot, you can call `create_model` with `create_snapshot=True`.

Creating a new snapshot can take several minutes.

## Creating the Model

Upload the model to Domo's Model Management interface where we can compare performance with other models and deploy it as an endpoint or dataflow tile when ready.

The following information is included:

- name - The name of the model
- entrypoint - The file containing our `invoke` function that is executed once deployed
- files - The serialized model or any other files required to execute our model
- training - Hyperparameters and metrics discovered during training
- tasks - A list of tasks our model supports

In [7]:
from domojupyter.ai import ModelTrainingInformation
import domojupyter.ai.model as ml

model_name = 'Blue Classification'
entrypoint = 'model.py'
extra_files = []
training = ModelTrainingInformation(metrics=domo_metrics, hyperparameters=hyperparameters, algorithm="Custom")
tasks = [task]

ml.create_model(model_name, entrypoint, extra_files, training=training, tasks=tasks)

Creating model
Successfully created model with name: Blue Classification
