In [1]:
from arthurai import ArthurAI
from arthurai.client.apiv3 import InputType, OutputType, Stage
import numpy as np
import joblib
import datetime
import time
import pandas as pd

In [2]:
import sys
sys.path.append("..")
from model_utils import transformations, load_datasets

In this guide, we'll use the credit dataset (and a pre-trained model) to onboard a new model to the Arthur platform. We'll walk through registering the model using a sample of the training data. This is an example of a batch model.

#### Set up connection
Supply your API Key below to autheticate with the platform.

In [3]:
URL = "app.arthur.ai"
ACCESS_KEY = "..."

connection = ArthurAI(url=URL, access_key=ACCESS_KEY, client_version=3)

## Create Model

We'll instantiate a model object with a small amount of metadata about the model input and output types. Then, we'll use a sample of the training data to register the full data schema for this Tabular model.

In [24]:
arthur_model = connection.model(partner_model_id="CreditRisk_BatchDeployment_v0.0.1",
                               input_type=InputType.Tabular,
                               output_type=OutputType.Multiclass,
                               is_batch=True)

In [25]:
(X_train, Y_train), (X_test, Y_test) = load_datasets("../fixtures/datasets/credit_card_default.csv")

In [6]:
Y_train.head()

15693    0
27174    0
24501    0
15849    0
27983    1
Name: default payment next month, dtype: int64

In [7]:
X_train.head()

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6
15693,360000,1,1,2,29,0,0,0,0,0,...,209680,213527,174907,126309,100000,8270,8423,6753,5139,5000
27174,190000,1,2,2,38,0,0,0,0,-1,...,5029,6494,678,11928,1100,1300,2000,678,11514,0
24501,100000,1,2,2,32,-1,2,-1,-1,-1,...,4100,7980,0,9487,0,4107,7980,0,9487,8333
15849,110000,2,1,1,28,0,0,0,0,0,...,111125,106828,84729,82910,11602,4238,4316,11190,0,3000
27983,60000,2,1,2,28,0,0,2,0,0,...,10232,11237,11427,20973,4278,0,1483,673,10000,11540


We need to register what the data schema is for the inputs to the model. Since your model might hundreds or thousands of input features, you can just pass us a pandas DataFrame of your training data, and we'll handle the rest.

In [None]:
arthur_model.from_dataframe(X_train, Stage.ModelPipelineInput)

We need to register the schema for the outputs of the model: what will a typical prediction look like and what will a typical ground truth look like? What names, shapes, and datatypes should Arthur expect for these objects?

Since this is a binary classification model, we'll do this all in one step with the *.add_binary_classifier_output_attributes()* method. All we need to supply is a mapping that establishes:
  * names for the model's predictions
  * names for the model's ground truth
  * the mapping that related these two
  
Our classifier will be making predictions about class *0* and class *1* and will return a probability score for each class. Therefore, we'll set up a name *prediction_0* and a name *prediction_1*. Additionally, our groundtruth will be either a 0 or 1, but we'll always represent ground truth in the one-hot-endoded form. Therefore, we create two field called *gt_0* and *gt_1*. We link these all up in a dictionary and pass that to the model.  

In [26]:
prediction_to_ground_truth_map = {
    "prediction_0": "gt_0",
    "prediction_1": "gt_1"
}

arthur_model.add_binary_classifier_output_attributes("prediction_1", prediction_to_ground_truth_map)

{'prediction_0': <arthurai.client.apiv3.attributes.ArthurAttribute at 0x1214656d0>,
 'gt_0': <arthurai.client.apiv3.attributes.ArthurAttribute at 0x12b17ff90>,
 'prediction_1': <arthurai.client.apiv3.attributes.ArthurAttribute at 0x122448490>,
 'gt_1': <arthurai.client.apiv3.attributes.ArthurAttribute at 0x122448f90>}

Note that the first argument to *.add_binary_classifier_output_attributes()* is the name of the "positive predicted class", for purposes of calculating accuracy metrics. 

Before saving, you can review a model to make sure everything is correct.

In [11]:
arthur_model.review()

Unnamed: 0,name,stage,value_type,categorical,is_unique,categories,range,monitor_for_bias
0,gt_0,GROUND_TRUTH,INTEGER,True,False,"[{value: 0}, {value: 1}]","[None, None]",False
1,gt_1,GROUND_TRUTH,INTEGER,True,False,"[{value: 0}, {value: 1}]","[None, None]",False
2,LIMIT_BAL,PIPELINE_INPUT,INTEGER,False,False,[],"[10000, 1000000]",False
3,SEX,PIPELINE_INPUT,INTEGER,True,False,"[{value: 1}, {value: 2}]","[None, None]",True
4,EDUCATION,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...","[None, None]",True
5,MARRIAGE,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3}]","[None, None]",False
6,AGE,PIPELINE_INPUT,INTEGER,False,False,[],"[21, 75]",False
7,PAY_0,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...","[None, None]",False
8,PAY_2,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...","[None, None]",False
9,PAY_3,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...","[None, None]",False


In [28]:
arthur_model.save()

'cc4d356d-a828-4c52-9650-505595c9ea20'

### Setting baseline data
For tracking data drift, you can upload a dataset to serve as the baseline or reference set. Often, this is a sample of your training data for the associated model. Our reference dataset should ideally include examples of
  * inputs 
  * ground truth
  * model predictions
  
for a sample of the training set. This way, Arthur can monitor for drift and stability in all of these aspects. 

In [29]:
# load our pre-trained classifier so we can generate predictions
sk_model = joblib.load("../fixtures/serialized_models/credit_model.pkl")

In [30]:
# get all input columns
reference_set = X_train.copy()

# get ground truth labels
reference_set["gt_1"] = Y_train
reference_set["gt_0"] = 1-Y_train

# get model predictions
preds = sk_model.predict_proba(X_train)
reference_set["prediction_1"] = preds[:, 1]
reference_set["prediction_0"] = preds[:, 0]


Note that the column names of inputs, predicitons, and ground truths need to match the schema we established at model onboarding. Use *arthur_model.review()* to remind yourself of your naming conventions.

In [31]:
arthur_model.set_reference_data(data=reference_set)

{'counts': {'success': 21000, 'failure': 0, 'total': 21000}, 'failures': [[]]}

## Sending Batches of Inferences

Load test data and trained model. Let's familiarize ourselves with the data and the model.


In [15]:
X_test.shape

In [16]:
sk_model

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=15, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=500,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

In [18]:
sk_model.predict_proba(X_train.iloc[0:1, :])[0]

array([0.93286639, 0.06713361])

There are a couple columns in the uploaded dataframe that we should take special note of. 

Additionally, the predictions/scores from your model should match the column names in the registered schema. If we take a look above at *arthur_model.review()* we'll recall that columns we created correspond to the clasiffier's output probabilities over the classes ("prediction_1" and "prediction_0") and the corresponding ground truth over the possible clases in one-hot form ("gt_1" and "gt_0").

Aside from these model-specific columns, there are some standard inputs need to indentify inferences and batches.
* First, each inference needs a unique identifier so that it can later be joined with ground truth. Include a column named **partner_inference_id** and ensure these IDs are unique across batches. For example, if you run predictions across your customer base on a daily-batch cadence, then a unique identfier could be composed of your customer_id plus the date.  
* Second, each inference needs to be associated with a **batch_id**, but this id will be shared among one or more inferences. 
* Finally, each inference needs an **inference_timestamp** and these don't have to be unique.

We'll use our clasifier to score a batch of inputs and then assemble those inputs and predictions into a dataframe with the matching column names.

In [47]:
num_batches = 50

for i in range(num_batches):
    batch_size=3000
    batch_id = "batch_{}".format(str(np.random.randint(1e3)))

    # generate a small batch of rows from the test set, create unique id for each row
    rows_inds = np.random.randint(X_test.shape[0], size=batch_size)
    batch_inputs_df = X_test.iloc[rows_inds, :]
    inference_id = [str(num) for num in np.random.randint(1e11, size=batch_size)]
    
    # calculate predictions on those rows, fetch ground truth for those rows
    batch_predictions = sk_model.predict_proba(batch_inputs_df)
    batch_ground_truths = Y_test.values[rows_inds]
    
    
    # Next, we put these components together into the dataframe sent to Arthur
    # include all input columns to the model
    batch_df = batch_inputs_df.copy()
    
    # need to include model prediction columns, and partner_inference_id
    batch_df["prediction_0"] = batch_predictions[:, 0]
    batch_df["prediction_1"] = batch_predictions[:, 1]
    batch_df["partner_inference_id"] = inference_id
    
    # also need to include a batch_id for the upload, and a timestamp for all the inferences
    batch_df["batch_id"] = batch_id
    batch_df["inference_timestamp"]=[(datetime.datetime.utcnow())]*batch_size


    # sent the batch of inputs&predictions    
    arthur_model.send_batch_inferences(data=batch_df)
    
    # assemble the inference-wise groundtruth and upload
    ground_truth_df = pd.DataFrame({"partner_inference_id":inference_id,
                                   "gt_0": 1 - batch_ground_truths ,
                                   "gt_1": batch_ground_truths,
                                   "batch_id" : ("gt_" + batch_id) ,
                                   "ground_truth_timestamp":[(datetime.datetime.utcnow())]*batch_size})
    arthur_model.send_batch_ground_truths(data=ground_truth_df)
    
    



## Sending Batches of Ground Truth

Realistically, there will be some delay before you have ground truth for your model's predictions. Whether that ground truth is accessible after one minute or one year, the *send_batch_ground_truths()* method can be called at any later time. The ground truth (labels) will joined with their corresponding predictions to yield accuracy measures. 

Note that the dataframe here contains similiar required columns such as:
 * **partner_inference_id**
 * **batch_id**
 * **ground_truth_timestamp**
 
And then it also contains columns that are specific to our model and match the ground truth schema that we establiashed at onboarding (*gt_0* and *gt_1*).

In [43]:
ground_truth_df.head()

Unnamed: 0,partner_inference_id,gt_0,gt_1,batch_id,ground_truth_timestamp
0,88483590978,1,0,gt_batch_968gt_batch_968gt_batch_968gt_batch_9...,2020-10-11 18:20:54.491177
1,7996408539,0,1,gt_batch_968gt_batch_968gt_batch_968gt_batch_9...,2020-10-11 18:20:54.491177
2,92548791191,1,0,gt_batch_968gt_batch_968gt_batch_968gt_batch_9...,2020-10-11 18:20:54.491177
3,89802159216,1,0,gt_batch_968gt_batch_968gt_batch_968gt_batch_9...,2020-10-11 18:20:54.491177
4,47576849716,1,0,gt_batch_968gt_batch_968gt_batch_968gt_batch_9...,2020-10-11 18:20:54.491177


In [46]:
arthur_model.send_batch_ground_truths(data=ground_truth_df)

{'counts': {'success': 3000, 'failure': 0, 'total': 3000}, 'failures': [[]]}