# 05Tools: Understanding Model Performance and Fairness with the What-If Tool (WIT)

The [What-If Tool (WIT)](https://pair-code.github.io/what-if-tool/) helps understand a model behavior across a wide range of inputs.  In this notebook the 05 series models will be evaluated with the WIT tool.

This notebook will show how to connect the tool to the model at a Vertex AI Endpoint and load example data from BigQuery using Pandas as a preparation step.

### Prerequisites:
-  At least 1 of the notebooks in this series [05, 05a-05i]

### Conceptual Flow & Workflow
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_WIT_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_WIT_console.png" width="45%">
</p>

---
## Known Issue & Workaround
The WIT tool does not seem to work with Vertex AI Workbench Versions of JupyterLab - see [issue](https://github.com/PAIR-code/what-if-tool/issues/200).  To work around this issue the notebook can be opened in Google Colab.  An additional step with Colab is the installation of the `witwidget` package and authenticating to the Google Cloud Project. These two steps are covered by the code in this section being run after opening in Colab via the button below:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/05%20-%20TensorFlow/05Tools%20-%20Interpretability%20with%20WIT.ipynb)


In [135]:
try:
    import google.colab
    !pip install --upgrade witwidget -q
    !pip install --upgrade google-cloud-aiplatform -q
    !pip install --upgrade google-cloud-bigquery-storage -q
    from google.colab import auth
    auth.authenticate_user()
except Exception:
    pass

The Next cell will restart the runtime by first stopping it and then Colab will automatically restart:

In [None]:
import os
os.kill(os.getpid(), 9)

In [None]:
PROJECT_ID = 'statmike-mlops-349915'
!gcloud config set project {PROJECT_ID}

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [3]:
from google.cloud import aiplatform
from google.cloud import bigquery

from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

import witwidget
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

import numpy as np

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client(project=PROJECT_ID)

parameters:

In [5]:
BUCKET = PROJECT_ID

---
## Get Vertex AI Endpoint And Deployed Model

In [7]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
endpoint = endpoints[0]

In [8]:
endpoint.display_name

'05_fraud'

In [9]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [10]:
model.display_name

'05i_fraud'

In [11]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05i_fraud@1'

In [12]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05i/20220728003419/18/model'

## Get Data for Model Exploration
Retrive the test data for this series:

In [13]:
pred = bq.query(query = f"SELECT * EXCEPT({VAR_TARGET}), {VAR_TARGET} FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' ORDER BY {VAR_TARGET} DESC").to_dataframe()
pred = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+['splits'])]]
pred.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,148074,-2.219219,0.727831,-5.45823,5.92485,3.932464,-3.085984,-1.67787,0.865075,-3.17726,...,0.417472,-0.817343,-0.028752,0.025723,-0.825835,-0.013089,0.413291,-0.131387,0.0,1
1,129668,0.753356,2.284988,-5.164492,3.831112,-0.073622,-1.316596,-1.855495,0.831079,-1.567514,...,0.382007,0.033958,0.187697,0.358433,-0.488934,-0.258802,0.296145,-0.047174,2.0,1
2,56887,-0.075483,1.812355,-2.566981,4.127549,-1.628532,-0.805895,-3.390135,1.019353,-2.451251,...,0.794372,0.270471,-0.143624,0.013566,0.634203,0.213693,0.773625,0.387434,5.0,1
3,146998,-2.06424,2.629739,-0.748406,0.694992,0.418178,1.39252,-1.697801,-6.333065,1.724184,...,6.215514,-1.276909,0.459861,-1.051685,0.209178,-0.319859,0.015434,-0.050117,8.0,1
4,78725,-4.312479,1.886476,-2.338634,-0.475243,-1.185444,-2.112079,-2.122793,0.272565,0.290273,...,0.550541,-0.06787,-1.114692,0.269069,-0.020572,-0.963489,-0.918888,0.001454,60.0,1


In [14]:
len(pred.index)

28522

In [15]:
pred.shape

(28522, 31)

## Python Function For Predictions
The WIT tool connects to models for prediction is several ways.  To demonstrate custom prediction functions, this section builds a Python function that calls a Vertex AI Endpoint.

Try 1: Dictionaries

In [16]:
newobs_dicts = pred.to_dict(orient='records')

In [17]:
#newobs_dicts[0]

In [18]:
def remote_predictor_dicts(obs):
    if type(obs) is dict: obs = [obs]
    predictions = []
    batch_size = 1000
    for i in range(0, len(obs), batch_size): # do batches
      instances = [json_format.ParseDict({key:value for key, value in example.items() if key != VAR_TARGET}, Value()) for example in obs[i:i+batch_size]]
      predictions.extend(endpoint.predict(instances = instances).predictions)
    return predictions

In [19]:
remote_predictor_dicts(newobs_dicts[0:2])

[[0.00166473, 0.998335302], [0.0168009363, 0.98319906]]

Try 2: Lists

In [20]:
newobs_lists = pred.values.tolist()
newobs_names = pred.columns.values.tolist()

In [21]:
#newobs_lists[0], newobs_names

In [22]:
def remote_predictor_lists(obs):
    examples = [dict(zip(newobs_names, ob)) for ob in obs]
    predictions = []
    batch_size = 1000
    for i in range(0, len(examples), batch_size): # do batches
      instances = [json_format.ParseDict({key:value for key, value in example.items() if key != VAR_TARGET}, Value()) for example in examples[i:i+batch_size]]
      predictions.extend(endpoint.predict(instances = instances).predictions)
    return predictions

In [23]:
remote_predictor_lists(newobs_lists[0:2])

[[0.00166473, 0.998335302], [0.0168009363, 0.98319906]]

---
## Configure WIT
The What-If Tool expects two things: a set of examples to profile the model and a link to the model to create the "what-if .." predictions during the model evaluation.

Using Vertex AI Model at Vertex AI Endpoint by Wrapping in a Python Function to make the calls:
- Guide for [WitConfigBuilder](https://github.com/pair-code/what-if-tool/blob/master/witwidget/notebook/visualization.py)

Input data as dictionaries:
```python
config_builder = (
    WitConfigBuilder(newobs_dicts[0:n_examples])
    .set_custom_predict_fn(remote_predictor_dicts)
    .set_target_feature('Class')
    .set_label_vocab(['Normal', 'Fraud'])
)
```

Input data as Lists:
```python
config_builder = (
    WitConfigBuilder(newobs_lists[0:n_examples], newobs_names)
    .set_custom_predict_fn(remote_predictor_lists)
    .set_target_feature('Class')
    .set_label_vocab(['Normal', 'Fraud'])
)
```

In [144]:
n_examples = 1000

In [None]:
config_builder = (
    WitConfigBuilder(newobs_dicts[0:n_examples])
    .set_custom_predict_fn(remote_predictor_dicts)
    .set_target_feature('Class')
    .set_label_vocab(['Normal', 'Fraud'])
)

---
## Explore The Model

In [None]:
WitWidget(config_builder, height = 800)

---
## Example Screenshot

<img src="../architectures/notebooks/05/wit.png">