# LABXX: What-if Tool: Model Interpretability Using Mortgage Data 

**Learning Objectives**

1. Create a What-if Tool visualization
2. What-if Tool exploration using the XGBoost Model
 
 
## Introduction 

This notebook shows how to use the [What-if Tool (WIT)](https://pair-code.github.io/what-if-tool/) on a deployed [Cloud AI Platform](https://cloud.google.com/ai-platform/) model. The What-If Tool provides an easy-to-use interface for expanding understanding of black-box classification and regression ML models. With the plugin, you can perform inference on a large set of examples and immediately visualize the results in a variety of ways. Additionally, examples can be edited manually or programmatically and re-run through the model in order to see the results of the changes. It contains tooling for investigating model performance and fairness over subsets of a dataset.  The purpose of the tool is to give people a simple, intuitive, and powerful way to explore and investigate trained ML models through a visual interface with absolutely no code required.

[Extreme Gradient Boosting (XGBoost)](https://xgboost.ai/) is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree based algorithms are considered best-in-class right now. Please see the chart below for the evolution of tree-based algorithms over the years.

*You don't need your own cloud project* to run this notebook. 

** UPDATE LINK BEFORE PRODUCTION **:  Each learning objective will correspond to a __#TODO__ in the [student lab notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/gwendolyn-dev/courses/machine_learning/deepdive2/ml_on_gc/what_if_mortgage.ipynb)) -- try to complete that notebook first before reviewing this solution notebook.

## Set up environment variables and load necessary libraries 
We will start by importing the necessary libraries for this lab.

In [5]:
import sys
python_version = sys.version_info[0]
print("Python Version: ", python_version)

Python Version:  3


In [None]:
!pip3 install witwidget

In [10]:
import pandas as pd
import numpy as np
import witwidget

from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

## Loading the mortgage test dataset

The model we'll be exploring here is a binary classification model built with XGBoost and trained on a [mortgage dataset](https://www.ffiec.gov/hmda/hmdaflat.htm). It predicts whether or not a mortgage application will be approved. In this section we'll:

* Download some test data from Cloud Storage and load it into a numpy array + Pandas DataFrame
* Preview the features for our model in Pandas

In [19]:
# Download our Pandas dataframe and our test features and labels
!gsutil cp gs://mortgage_dataset_files/data.pkl .
!gsutil cp gs://mortgage_dataset_files/x_test.npy .
!gsutil cp gs://mortgage_dataset_files/y_test.npy .

Copying gs://mortgage_dataset_files/data.pkl...
| [1 files][104.0 MiB/104.0 MiB]                                                
Operation completed over 1 objects/104.0 MiB.                                    
Copying gs://mortgage_dataset_files/x_test.npy...
/ [1 files][172.0 KiB/172.0 KiB]                                                
Operation completed over 1 objects/172.0 KiB.                                    
Copying gs://mortgage_dataset_files/y_test.npy...
/ [1 files][  628.0 B/  628.0 B]                                                
Operation completed over 1 objects/628.0 B.                                      


## Preview the Features 

Preview the features from our model as a pandas DataFrame

In [41]:
features = pd.read_pickle('data.pkl')
features.head()

Unnamed: 0,as_of_year,occupancy,loan_amt_thousands,county_code,applicant_income_thousands,population,ffiec_median_fam_income,tract_to_msa_income_pct,num_owner_occupied_units,num_1_to_4_family_units,...,"purchaser_type_Life insurance company, credit union, mortgage bank, or finance company",purchaser_type_Loan was not originated or was not sold in calendar year covered by register,purchaser_type_Other type of purchaser,purchaser_type_Private securitization,hoepa_status_HOEPA loan,hoepa_status_Not a HOEPA loan,lien_status_Not applicable (purchased loans),lien_status_Not secured by a lien,lien_status_Secured by a first lien,lien_status_Secured by a subordinate lien
310650,2016,1,110.0,119.0,55.0,5930.0,64100.0,98.81,1305.0,1631.0,...,0,0,0,0,0,1,0,0,1,0
630129,2016,1,480.0,33.0,270.0,4791.0,90300.0,144.06,1420.0,1450.0,...,0,1,0,0,0,1,0,0,1,0
715484,2016,2,240.0,59.0,96.0,3439.0,105700.0,104.62,853.0,1076.0,...,0,0,0,0,0,1,0,0,1,0
887708,2016,1,76.0,65.0,85.0,3952.0,61300.0,90.93,1272.0,1666.0,...,0,1,0,0,0,1,0,0,0,1
719598,2016,1,100.0,127.0,70.0,2422.0,46400.0,88.37,650.0,1006.0,...,0,1,0,0,0,1,0,0,1,0


In [42]:
features.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999999 entries, 310650 to 875688
Data columns (total 44 columns):
as_of_year                                                                                     999999 non-null int16
occupancy                                                                                      999999 non-null int8
loan_amt_thousands                                                                             999999 non-null float64
county_code                                                                                    999999 non-null float64
applicant_income_thousands                                                                     999999 non-null float64
population                                                                                     999999 non-null float64
ffiec_median_fam_income                                                                        999999 non-null float64
tract_to_msa_income_pct                                 

## Load the test features and labels into numpy arrays

Developing machine learning models in Python often requires the use of NumPy arrays.  Recall that NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays.  NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.  As such, it is common to need to save NumPy arrays to file.  Note that the data info reveals the following datatypes dtypes: float64(8), int16(1), int8(1), uint8(34) -- and no strings or "objects". So, let's now load the features and labels into numpy arrays.   

In [43]:
x_test = np.load('x_test.npy')
y_test = np.load('y_test.npy')

Let's take a look at the contents of the 'x_test.npy' file.  You can see the "array" structure.

In [44]:
print(x_test)

[[2.016e+03 1.000e+00 4.170e+02 ... 0.000e+00 1.000e+00 0.000e+00]
 [2.016e+03 1.000e+00 2.760e+02 ... 0.000e+00 1.000e+00 0.000e+00]
 [2.016e+03 1.000e+00 6.000e+01 ... 0.000e+00 1.000e+00 0.000e+00]
 ...
 [2.016e+03 1.000e+00 5.000e+02 ... 0.000e+00 0.000e+00 0.000e+00]
 [2.016e+03 1.000e+00 1.100e+02 ... 0.000e+00 1.000e+00 0.000e+00]
 [2.016e+03 1.000e+00 3.680e+02 ... 0.000e+00 1.000e+00 0.000e+00]]


## Combine the features and labels into one array for the What-if Tool

Note that the numpy.hstack() function is used to stack the sequence of input arrays horizontally (i.e. column wise) to make a single array.  In the following example, the numpy matrix is reshaped into a vector using the reshape function with .reshape((-1, 1) to convert the array into a single column matrix.

In [45]:
test_examples = np.hstack((x_test,y_test.reshape(-1,1)))

## Using the What-if Tool to interpret our model
With our test examples ready, we can now connect our model to the What-if Tool using the `WitWidget`. To use the What-if Tool with Cloud AI Platform, we need to send it:
* A Python list of our test features + ground truth labels
* Optionally, the names of our columns
* Our Cloud project, model, and version name (we've created a public one for you to play around with)

See the next cell for some exploration ideas in the What-if Tool.

##  Create a What-if Tool visualization

This prediction adjustment function is needed as this xgboost model's prediction returns just a score for the positive class of the binary classification, whereas the What-If Tool expects a list of scores for each class (in this case, both the negative class and the positive class).  



**NOTE:** The WIT may take a minute to load.  While it is loading, review the parameters that are defined in the next cell, BUT NOT RUN IT, it is simply for reference.

In [None]:

# ******** DO NOT RUN THIS CELL ********

# TODO 1

PROJECT_ID = 'YOUR_PROJECT_ID'
MODEL_NAME = 'YOUR_MODEL_NAME'
VERSION_NAME = 'YOUR_VERSION_NAME'
TARGET_FEATURE = 'mortgage_status'
LABEL_VOCAB = ['denied', 'approved']

# TODO 1a

config_builder = (WitConfigBuilder(test_examples.tolist(), features.columns.tolist() + ['mortgage_status'])
  .set_ai_platform_model(PROJECT_ID, MODEL_NAME, VERSION_NAME, adjust_prediction=adjust_prediction)
  .set_target_feature(TARGET_FEATURE)
  .set_label_vocab(LABEL_VOCAB))

Run this cell to load the WIT config builder.  **NOTE:** The WIT may take a minute to load

In [24]:
# TODO 1b

def adjust_prediction(pred):
  return [1 - pred, pred]

config_builder = (WitConfigBuilder(test_examples.tolist(), features.columns.tolist() + ['mortgage_status'])
  .set_ai_platform_model('wit-caip-demos', 'xgb_mortgage', 'v1', adjust_prediction=adjust_prediction)
  .set_target_feature('mortgage_status')
  .set_label_vocab(['denied', 'approved']))
WitWidget(config_builder, height=800)

WitWidget(config={'use_aip': True, 'model_name': 'xgb_mortgage', 'uses_json_list': True, 'get_explanations': T…

## What-if Tool exploration using the XGBoost Model

#### TODO 2

* **Individual data points**: The default graph shows all data points from the test set, colored by their ground truth label (approved or denied)
  * Try selecting data points close to the middle and tweaking some of their feature values. Then run inference again to see if the model prediction changes
  * Select a data point and then move the "Show nearest counterfactual datapoint" slider to the right. This will highlight a data point with feature values closest to your original one, but with a different prediction

####  TODO 2a

* **Binning data**: Create separate graphs for individual features
  * From the "Binning - X axis" dropdown, try selecting one of the agency codes, for example "Department of Housing and Urban Development (HUD)". This will create 2 separate graphs, one for loan applications from the HUD (graph labeled 1), and one for all other agencies (graph labeled 0). This shows us that loans from this agency are more likely to be denied

####  TODO 2b

* **Exploring overall performance**: Click on the "Performance & Fairness" tab to view overall performance statistics on the model's results on the provided dataset, including confusion matrices, PR curves, and ROC curves.
   * Experiment with the threshold slider, raising and lowering the positive classification score the model needs to return before it decides to predict "approved" for the loan, and see how it changes accuracy, false positives, and false negatives.
   * On the left side "Slice by" menu, select "loan_purpose_Home purchase". You'll now see performance on the two subsets of your data: the "0" slice shows when the loan is not for a home purchase, and the "1" slice is for when the loan is for a home purchase. Notice that the model's false positive rate is much higher on loans for home purchases. If you expand the rows to look at the confusion matrices, you can see that the model predicts "approved" more often for home purchase loans.
   * You can use the optimization buttons on the left side to have the tool auto-select different positive classification thresholds for each slice in order to achieve different goals. If you select the "Demographic parity" button, then the two thresholds will be adjusted so that the model predicts "approved" for a similar percentage of applicants in both slices. What does this do to the accuracy, false positives and false negatives for each slice?


Copyright 2020 Google Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.