In [1]:
from arthurai import ArthurAI
from arthurai.common.constants import OutputType, InputType, Stage
import numpy as np
import joblib
import datetime
import time

In [2]:
import sys
sys.path.append("..")
from model_utils import transformations, load_datasets

In this guide, we'll use the credit dataset (and a pre-trained model) to onboard a new model to the Arthur platform. We'll walk through registering the model using a sample of the training data. This is an example of a streaming model.

#### Set up connection
Supply your API Key below to autheticate with the platform.

In [3]:
URL = "dev-v3.arthur.ai"
ACCESS_KEY = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdXRob3JpemVkIjp0cnVlLCJjb250ZXh0cyI6W3siY29udGV4dF9pZCI6ImJmZDEyZTcwLWQwMTgtNDlmMy1hYTY3LTRjYzFmOGRhNWZlNiIsImNvbnRleHRfdHlwZSI6Ik9yZ2FuaXphdGlvbiIsInJvbGUiOiJNb2RlbCBPd25lciJ9XSwiZXhwIjoxNjY2MjIyMDAxfQ.8bmVJLwfKYJ1IA6_KCFybNGk1Zn2QklfmL-4CpGOLH0"
connection = ArthurAI(url=URL, access_key=ACCESS_KEY)

## Create Model

We'll instantiate a model object with a small amount of metadata about the model input and output types. Then, we'll use a sample of the training data to register the full data schema for this Tabular model.

In [4]:
arthur_model = connection.model(partner_model_id="CreditRiskModel_KC_testing_nans",
                               input_type=InputType.Tabular,
                               output_type=OutputType.Multiclass)

In [5]:
(X_train, Y_train), (X_test, Y_test) = load_datasets("../fixtures/datasets/credit_card_default.csv")

In [6]:
Y_train.head()

24864    0
12073    0
8490     0
19658    0
16536    0
Name: default payment next month, dtype: int64

In [7]:
# X_train["PAY_0"] = np.nan
# X_train["PAY_2"] = None
# X_train["PAY_3"][:100] = np.nan
# X_train["PAY_4"][:100] = None

X_train.head()

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6
24864,130000,1,2,1,34,0,0,0,0,0,...,128615,130553,100976,96560,6543,6700,6050,4040,3700,3562
12073,320000,2,1,1,42,0,0,0,0,0,...,103278,62891,64948,66961,9000,6000,3000,3000,3000,3000
8490,10000,1,3,2,21,0,0,2,0,0,...,6154,6154,6280,0,2400,0,0,126,0,0
19658,180000,2,1,2,33,-1,0,0,0,0,...,39634,42625,44935,48436,10000,10000,5000,3000,10000,30000
16536,500000,1,2,2,26,0,0,0,0,0,...,129327,130136,127699,124623,6005,6005,6094,4161,5021,4350


We need to register what the data schema is for the inputs to the model. Since your model might hundreds or thousands of input features, you can just pass us a pandas DataFrame of your training data, and we'll handle the rest.

In [8]:
arthur_model.from_dataframe(X_train, Stage.ModelPipelineInput)

We need to register the schema for the outputs of the model: what will a typical prediction look like and what will a typical ground truth look like? What names, shapes, and datatypes should Arthur expect for these objects?

Since this is a binary classification model, we'll do this all in one step with the *.add_binary_classifier_output_attributes()* method. All we need to supply is a mapping that establishes:
  * names for the model's predictions
  * names for the model's ground truth
  * the mapping that related these two
  
Our classifier will be making predictions about class *0* and class *1* and will return a probability score for each class. Therefore, we'll set up a name *prediction_0* and a name *prediction_1*. Additionally, our groundtruth will be either a 0 or 1, but we'll always represent ground truth in the one-hot-endoded form. Therefore, we create two field called *gt_0* and *gt_1*. We link these all up in a dictionary and pass that to the model.  

In [9]:
prediction_to_ground_truth_map = {
    "prediction_0": "gt_0",
    "prediction_1": "gt_1"
}

arthur_model.add_binary_classifier_output_attributes("prediction_1", prediction_to_ground_truth_map)

{'prediction_0': ArthurAttribute(name='prediction_0', value_type='FLOAT', stage='PREDICTED_VALUE', id=None, label=None, position=0, categorical=False, min_range=0, max_range=1, monitor_for_bias=False, categories=None, bins=None, is_unique=False, is_positive_predicted_attribute=False, attribute_link='gt_0'),
 'gt_0': ArthurAttribute(name='gt_0', value_type='INTEGER', stage='GROUND_TRUTH', id=None, label=None, position=0, categorical=True, min_range=None, max_range=None, monitor_for_bias=False, categories=[AttributeCategory(value='0', label=None), AttributeCategory(value='1', label=None)], bins=None, is_unique=False, is_positive_predicted_attribute=False, attribute_link='prediction_0'),
 'prediction_1': ArthurAttribute(name='prediction_1', value_type='FLOAT', stage='PREDICTED_VALUE', id=None, label=None, position=1, categorical=False, min_range=0, max_range=1, monitor_for_bias=False, categories=None, bins=None, is_unique=False, is_positive_predicted_attribute=True, attribute_link='gt_1')

Note that the first argument to *.add_binary_classifier_output_attributes()* is the name of the "positive predicted class", for purposes of calculating accuracy metrics. 

Before saving, you can review a model to make sure everything is correct.

In [10]:
arthur_model.review()

Unnamed: 0,name,stage,value_type,categorical,is_unique,categories,bins,range,monitor_for_bias
0,LIMIT_BAL,PIPELINE_INPUT,INTEGER,False,False,[],,"[10000, 1000000]",False
1,SEX,PIPELINE_INPUT,INTEGER,True,False,"[{value: 1}, {value: 2}]",,"[None, None]",False
2,EDUCATION,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
3,MARRIAGE,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3}]",,"[None, None]",False
4,AGE,PIPELINE_INPUT,INTEGER,False,False,[],,"[21, 79]",False
5,PAY_0,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
6,PAY_2,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
7,PAY_3,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
8,PAY_4,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
9,PAY_5,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 2}, {value: 3}, {value: 4...",,"[None, None]",False


In [11]:
# arthur_model.save()
arthur_model= connection.get_model("0a8ac46e-c366-4cf3-9b8f-bd1df6870b94")

### Setting baseline data
Next, we'll use the training data to set a baseline reference for calcuating data drift. 

For tracking data drift, you can upload a dataset to serve as the baseline or reference set. Often, this is a sample of your training data for the associated model. Our reference dataset should ideally include examples of
  * inputs 
  * ground truth
  * model predictions
  
for a sample of the training set. This way, Arthur can monitor for drift and stability in all of these aspects. 

In [12]:
# load our pre-trained classifier so we can generate predictions
sk_model = joblib.load("../fixtures/serialized_models/credit_model.pkl")

In [13]:
# get all input columns
reference_set = X_train.copy()

# get ground truth labels
reference_set["gt_1"] = Y_train
reference_set["gt_0"] = 1-Y_train

# get model predictions
preds = sk_model.predict_proba(X_train)
reference_set["prediction_1"] = preds[:, 1]
reference_set["prediction_0"] = preds[:, 0]

# mess up reference set
reference_set["PAY_0"] = np.nan
reference_set["PAY_2"] = None
reference_set["PAY_3"][:100] = np.nan
reference_set["PAY_4"][:100] = None


In [14]:
arthur_model.set_reference_data(data=reference_set)

20900 inferences failed to upload
Reference dataset auto-close was aborted because not all inferences in the reference set were successfully uploaded


 'float64 is unsupported for Avro long',
     'row_number': 903,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 904,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 905,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 906,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 907,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 908,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 909,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 910,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 911,
     'status': 400},
    {'message': 'float64 is unsupported for Avro long',
     'row_number': 912,
     'status': 400},
    {

## Sending Inferences

Load test data and trained model. Let's familiarize ourselves with the data and the model.


In [15]:
X_test.shape

(9000, 23)

In [16]:
sk_model

RandomForestClassifier(max_depth=15, n_estimators=500)

In [17]:
sk_model.predict_proba(X_train.iloc[0:1, :])

array([[0.91252733, 0.08747267]])

To send inferences, we'll iterate through datapoints in a test set and send telemetry to Arthur. You can send inferences one at a time or in a list. We will combine our model inputs and our model predictions into a dictionary called *inference_data*. 

In [18]:
# mess up X_test
X_test["PAY_0"] = np.nan
X_test["PAY_2"] = None
X_test["PAY_3"][:10] = np.nan
X_test["PAY_4"][:10] = None

for i in range(100):
    datarecord = X_test.iloc[i:i+1, :]
    try:
        predicted_probs = sk_model.predict_proba(datarecord)[0]
    except:
        pass
    ground_truth = np.int(Y_test.iloc[i])
    external_id = str(np.random.randint(1e9))

    inputs = datarecord.to_dict(orient='records')[0]
    try:
        prediction = {"prediction_1":predicted_probs[1], 
                    "prediction_0":predicted_probs[0]}
    except:
        pass
    ground_truth={"gt_1": ground_truth, 
                  "gt_0":1-ground_truth}
    cur_time = datetime.datetime.utcnow().isoformat()
    inf_data = inputs.copy()
    try:
        inf_data.update(prediction)
    except:
        pass

    arthur_model.send_inferences([{
        "inference_data": inf_data,
        "ground_truth_data": ground_truth,
        "partner_inference_id" : external_id,
        "inference_timestamp" : cur_time,
        "ground_truth_timestamp" : cur_time
    }])
    
    print("Sent inference with id {}".format(external_id))
    time.sleep(0.001 * np.random.random())

Sent inference with id 72021358
Sent inference with id 767436995
Sent inference with id 75255912
Sent inference with id 145771778
Sent inference with id 41536182
Sent inference with id 249007838
Sent inference with id 237068844
Sent inference with id 294804132
Sent inference with id 513661377
Sent inference with id 433528419
Sent inference with id 675074630
Sent inference with id 847234131
Sent inference with id 346956327
Sent inference with id 344023102
Sent inference with id 368333043
Sent inference with id 93556362
Sent inference with id 26507653
Sent inference with id 298690157
Sent inference with id 563746073
Sent inference with id 265807220
Sent inference with id 420305966
Sent inference with id 837005156
Sent inference with id 522470871
Sent inference with id 718979498
Sent inference with id 728679828
Sent inference with id 674507667
Sent inference with id 715878959
Sent inference with id 277427893
Sent inference with id 266556382
Sent inference with id 664027745
Sent inference 

You can send inferences one at a time but you can also send them in small bunches using the *send_infereces()* method. In that case, you would send a list of dictionaries, each of which is similar to above. 

If you model scoring system is a set up in a batch processor where you run a daily, weekly, or monthly job, then we recommend setting a batch model with Arthur and using the corresponding *send_batch_inferences()* method. An example batch model can be found [here](../../credit_risk_batch/notebooks/Quickstart.ipynb).