In [1]:
from arthurai import ArthurAI
from arthurai.common.constants import InputType, OutputType, Stage
import numpy as np
import joblib
import datetime
import time
import pandas as pd

In [2]:
import sys
sys.path.append("..")
from model_utils import transformations, load_datasets

In this guide, we'll use the credit dataset (and a pre-trained model) to onboard a new model to the Arthur platform. We'll walk through registering the model using a sample of the training data. This is an example of a batch model.

#### Set up connection
Supply your API Key below to autheticate with the platform.

In [3]:
URL = "dev-v3.arthur.ai"
ACCESS_KEY = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdXRob3JpemVkIjp0cnVlLCJjb250ZXh0cyI6W3siY29udGV4dF9pZCI6ImJmZDEyZTcwLWQwMTgtNDlmMy1hYTY3LTRjYzFmOGRhNWZlNiIsImNvbnRleHRfdHlwZSI6Ik9yZ2FuaXphdGlvbiIsInJvbGUiOiJBZG1pbmlzdHJhdG9yIn1dLCJleHAiOjE2MDk5NjI4ODksInVzZXJfaWQiOiI1NDhlZmMxZi0yODZkLTQ0MDctOWU0YS1hM2NhMjgwMDJkMGEifQ.NfsYkMuzlAI1234UzqyRkSfD9Gorg17oJFwDM3JnwHU"

connection = ArthurAI(url=URL, access_key=ACCESS_KEY)

## Create Model

We'll instantiate a model object with a small amount of metadata about the model input and output types. Then, we'll use a sample of the training data to register the full data schema for this Tabular model.

In [4]:
arthur_model = connection.model(partner_model_id="CreditRisk_BatchDeployment_v0.0.1-Harrison-Test",
                               input_type=InputType.Tabular,
                               output_type=OutputType.Multiclass,
                               is_batch=True)

In [5]:
(X_train, Y_train), (X_test, Y_test) = load_datasets("../fixtures/datasets/credit_card_default.csv")

In [6]:
Y_train.head()

19260    1
6871     0
1915     0
19927    0
20055    0
Name: default payment next month, dtype: int64

In [7]:
X_train.head()

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6
19260,250000,2,1,2,29,2,2,2,2,2,...,125721,122209,126129,127847,5900,4500,0,5954,3900,4826
6871,360000,1,1,1,58,-1,-1,-1,-1,-1,...,43000,4650,12132,13953,15008,43065,4650,12132,13953,2200
1915,240000,2,1,2,29,0,0,0,-1,-1,...,22081,3428,67,1832,5009,2000,5000,0,3332,0
19927,330000,2,2,1,39,1,-1,-1,-1,2,...,0,404,404,24420,1380,0,404,0,24420,480
20055,230000,2,1,1,45,-2,-2,-2,-2,-2,...,0,0,0,136,0,0,0,0,136,1233


We need to register what the data schema is for the inputs to the model. Since your model might hundreds or thousands of input features, you can just pass us a pandas DataFrame of your training data, and we'll handle the rest.

In [8]:
arthur_model.from_dataframe(X_train, Stage.ModelPipelineInput)

We need to register the schema for the outputs of the model: what will a typical prediction look like and what will a typical ground truth look like? What names, shapes, and datatypes should Arthur expect for these objects?

Since this is a binary classification model, we'll do this all in one step with the *.add_binary_classifier_output_attributes()* method. All we need to supply is a mapping that establishes:
  * names for the model's predictions
  * names for the model's ground truth
  * the mapping that related these two
  
Our classifier will be making predictions about class *0* and class *1* and will return a probability score for each class. Therefore, we'll set up a name *prediction_0* and a name *prediction_1*. Additionally, our groundtruth will be either a 0 or 1, but we'll always represent ground truth in the one-hot-endoded form. Therefore, we create two field called *gt_0* and *gt_1*. We link these all up in a dictionary and pass that to the model.  

In [9]:
prediction_to_ground_truth_map = {
    "prediction_0": "gt_0",
    "prediction_1": "gt_1"
}

arthur_model.add_binary_classifier_output_attributes("prediction_1", prediction_to_ground_truth_map)

{'prediction_0': ArthurAttribute(name='prediction_0', value_type='FLOAT', stage='PREDICTED_VALUE', id=None, label=None, position=0, categorical=False, min_range=0, max_range=1, monitor_for_bias=False, categories=None, bins=None, is_unique=False, is_positive_predicted_attribute=False, attribute_link='gt_0'),
 'gt_0': ArthurAttribute(name='gt_0', value_type='INTEGER', stage='GROUND_TRUTH', id=None, label=None, position=0, categorical=True, min_range=None, max_range=None, monitor_for_bias=False, categories=[AttributeCategory(value='0', label=None), AttributeCategory(value='1', label=None)], bins=None, is_unique=False, is_positive_predicted_attribute=False, attribute_link='prediction_0'),
 'prediction_1': ArthurAttribute(name='prediction_1', value_type='FLOAT', stage='PREDICTED_VALUE', id=None, label=None, position=1, categorical=False, min_range=0, max_range=1, monitor_for_bias=False, categories=None, bins=None, is_unique=False, is_positive_predicted_attribute=True, attribute_link='gt_1')

Note that the first argument to *.add_binary_classifier_output_attributes()* is the name of the "positive predicted class", for purposes of calculating accuracy metrics. 

Before saving, you can review a model to make sure everything is correct.

In [10]:
arthur_model.review()

Unnamed: 0,name,stage,value_type,categorical,is_unique,categories,bins,range,monitor_for_bias
0,LIMIT_BAL,PIPELINE_INPUT,INTEGER,False,False,[],,"[10000, 1000000]",False
1,SEX,PIPELINE_INPUT,INTEGER,True,False,"[{value: 1}, {value: 2}]",,"[None, None]",False
2,EDUCATION,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
3,MARRIAGE,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3}]",,"[None, None]",False
4,AGE,PIPELINE_INPUT,INTEGER,False,False,[],,"[21, 79]",False
5,PAY_0,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
6,PAY_2,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
7,PAY_3,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 1}, {value: 2}, {value: 3...",,"[None, None]",False
8,PAY_4,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 2}, {value: 3}, {value: 4...",,"[None, None]",False
9,PAY_5,PIPELINE_INPUT,INTEGER,True,False,"[{value: 0}, {value: 2}, {value: 3}, {value: 4...",,"[None, None]",False


In [11]:
arthur_model.save()

'6b679915-9882-4027-89a5-36f4399e1484'

### Setting baseline data
For tracking data drift, you can upload a dataset to serve as the baseline or reference set. Often, this is a sample of your training data for the associated model. Our reference dataset should ideally include examples of
  * inputs 
  * ground truth
  * model predictions
  
for a sample of the training set. This way, Arthur can monitor for drift and stability in all of these aspects. 

In [12]:
# load our pre-trained classifier so we can generate predictions
sk_model = joblib.load("../fixtures/serialized_models/credit_model.pkl")

The sklearn.ensemble.forest module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.ensemble. Anything that cannot be imported from sklearn.ensemble is now part of the private API.
The sklearn.tree.tree module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API.
Trying to unpickle estimator DecisionTreeClassifier from version 0.21.3 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
Trying to unpickle estimator RandomForestClassifier from version 0.21.3 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.


In [13]:
# get all input columns
reference_set = X_train.copy()

# get ground truth labels
reference_set["gt_1"] = Y_train
reference_set["gt_0"] = 1-Y_train

# get model predictions
preds = sk_model.predict_proba(X_train)
reference_set["prediction_1"] = preds[:, 1]
reference_set["prediction_0"] = preds[:, 0]


Note that the column names of inputs, predicitons, and ground truths need to match the schema we established at model onboarding. Use *arthur_model.review()* to remind yourself of your naming conventions.

In [14]:
arthur_model.set_reference_data(data=reference_set)

({'counts': {'success': 21000, 'failure': 0, 'total': 21000},
  'failures': [[]]},
 {'dataset_close_result': 'success'})

## Sending Batches of Inferences

Load test data and trained model. Let's familiarize ourselves with the data and the model.


In [15]:
X_test.shape

(9000, 23)

In [16]:
sk_model

From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.


RandomForestClassifier(ccp_alpha=None, max_depth=15, n_estimators=500)

In [17]:
sk_model.predict_proba(X_train.iloc[0:1, :])[0]

array([0.1731164, 0.8268836])

There are a couple columns in the uploaded dataframe that we should take special note of. 

Additionally, the predictions/scores from your model should match the column names in the registered schema. If we take a look above at *arthur_model.review()* we'll recall that columns we created correspond to the clasiffier's output probabilities over the classes ("prediction_1" and "prediction_0") and the corresponding ground truth over the possible clases in one-hot form ("gt_1" and "gt_0").

Aside from these model-specific columns, there are some standard inputs need to indentify inferences and batches.
* First, each inference needs a unique identifier so that it can later be joined with ground truth. Include a column named **partner_inference_id** and ensure these IDs are unique across batches. For example, if you run predictions across your customer base on a daily-batch cadence, then a unique identfier could be composed of your customer_id plus the date.  
* Second, each inference needs to be associated with a **batch_id**, but this id will be shared among one or more inferences. 
* Finally, each inference needs an **inference_timestamp** and these don't have to be unique.

We'll use our clasifier to score a batch of inputs and then assemble those inputs and predictions into a dataframe with the matching column names.

In [18]:
import os
def make_data_folders(path):
    if not os.path.exists(f"{path}/inferences"):
        os.makedirs(f"{path}/inferences")
    if not os.path.exists(f"{path}/ground_truth"):
        os.makedirs(f"{path}/ground_truth")

num_batches = 1
batch_ids = []

for i in range(num_batches):
    batch_size=3000
    batch_id = f"batch_{np.random.randint(1e3)}"
    batch_ids.append(batch_id)
    file_loc = f"../inference_data/{batch_id}"
    make_data_folders(file_loc)

    # generate a small batch of rows from the test set, create unique id for each row
    rows_inds = np.random.randint(X_test.shape[0], size=batch_size)
    batch_inputs_df = X_test.iloc[rows_inds, :]
    inference_id = [str(num) for num in np.random.randint(1e11, size=batch_size)]
    
    # calculate predictions on those rows, fetch ground truth for those rows
    batch_predictions = sk_model.predict_proba(batch_inputs_df)
    batch_ground_truths = Y_test.values[rows_inds]
    
    # Next, we put these components together into the dataframe sent to Arthur
    # include all input columns to the model
    batch_df = batch_inputs_df.copy()
    
    # need to include model prediction columns, and partner_inference_id
    batch_df["prediction_0"] = batch_predictions[:, 0]
    batch_df["prediction_1"] = batch_predictions[:, 1]
    batch_df["partner_inference_id"] = inference_id
    # also need to include a batch_id for the upload, and a timestamp for all the inferences
    batch_df["inference_timestamp"]=[(datetime.datetime.utcnow())]*batch_size
    # Save batch data to a parquet file
    batch_df.to_parquet(f"{file_loc}/inferences/data.parquet")
    
    
    # assemble the inference-wise groundtruth and upload
    ground_truth_df = pd.DataFrame({"partner_inference_id":inference_id,
                                   "gt_0": 1 - batch_ground_truths ,
                                   "gt_1": batch_ground_truths,
                                   "ground_truth_timestamp":[(datetime.datetime.utcnow())]*batch_size})
    # Save ground truth data to a parquet file
    ground_truth_df.to_parquet(f"{file_loc}/ground_truth/data.parquet")

Once the inference and ground truth data is saved to a parquet file that data can later be retrieved and sent to the platform.

In [19]:
for batch_id in batch_ids:
    file_loc = f"../inference_data/{batch_id}"
    res = arthur_model.send_batch_inferences(directory_path=f"{file_loc}/inferences", batch_id=batch_id)
    print(res)
    
    res = arthur_model.send_batch_ground_truths(directory_path=f"{file_loc}/ground_truth")
    print(res)

({'counts': {'success': 3000, 'failure': 0, 'total': 3000}, 'failures': [[]]}, {'dataset_close_result': 'success'})
{'counts': {'success': 3000, 'failure': 0, 'total': 3000}, 'failures': [[]]}


## Sending Batches of Ground Truth

Realistically, there will be some delay before you have ground truth for your model's predictions. Whether that ground truth is accessible after one minute or one year, the *send_batch_ground_truths()* method can be called at any later time. The ground truth (labels) will joined with their corresponding predictions to yield accuracy measures. 

Note that the dataframe here contains similiar required columns such as:
 * **partner_inference_id**
 * **batch_id**
 * **ground_truth_timestamp**
 
And then it also contains columns that are specific to our model and match the ground truth schema that we establiashed at onboarding (*gt_0* and *gt_1*).

In [None]:
ground_truth_df.head()

In [20]:
arthur_model.send_batch_ground_truths(data=ground_truth_df)
# Can also supply the directory with the ground truth parquet files
# arthur_model.send_batch_ground_truths(directory_path=f"{file_loc}/ground_truth")

{'counts': {'success': 3000, 'failure': 0, 'total': 3000}, 'failures': [[]]}