# End-to-end Responsible AI lifecycle walkthrough

The goal of this notebook is to walk you through a concrete use case by following the three phases of the [ML workflow](https://www.microsoft.com/en-us/research/publication/software-engineering-for-machine-learning-a-case-study/) and applying the most prominent recommendations from the Responsible AI lifecycle at each stage. This will be done in a cloud-native manner by leveraging [Azure ML MLOps capabilities]().

This use case uses the well-known [UCI adult census dataset](https://archive.ics.uci.edu/ml/datasets/Adult). For our purposes, we will use treat this as a loan decision classification problem. We will pretend that the label indicates whether each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions will be used to decide whether an individual should be offered a loan.


## Initial Setup

### Connecting to your Azure ML workspace

In [19]:
import azureml.core
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, sep='\t')

abdou-azure-ml-workspace	francecentral	abdou-resources-group


The above cell creates a workspace object from the existing workspace. ``Workspace.from_config()`` reads the file ``config.json`` and loads the details into an object named ``ws``. The compute instance has a copy of this file saved in its root directory. If you run the code elsewhere, you'll need to [create the file](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#workspace).

## Data loading

We use the **adult census** dataset that we collect throught the **shap** library. Let's load and have a first look at the data.

In [20]:
import shap # Data is collected through the shap library
import pandas as pd

# Load the adult cencus dataset
X_raw, Y = shap.datasets.adult()
df = pd.DataFrame(X_raw, Y)
print ("X_raw shape:", X_raw.shape)
X_raw.head()

X_raw shape: (32561, 12)


Unnamed: 0,Age,Workclass,Education-Num,Marital Status,Occupation,Relationship,Race,Sex,Capital Gain,Capital Loss,Hours per week,Country
0,39.0,7,13.0,4,1,0,4,1,2174.0,0.0,40.0,39
1,50.0,6,13.0,2,4,4,4,1,0.0,0.0,13.0,39
2,38.0,4,9.0,0,6,0,4,1,0.0,0.0,40.0,39
3,53.0,4,7.0,2,6,4,2,1,0.0,0.0,40.0,39
4,28.0,4,13.0,2,10,5,2,0,0.0,0.0,40.0,5


0

## Data preprocessing and cleaning

### Identifying and handling the missing values

In [26]:
# Number of missing values over all columns
X_raw.isna().sum().sum()

0

All features above look numeric, however some of them are just "numeric codes" and the features they represent are rather categorical. 
So for more accurate results, we separate categorical features from “real” numeric ones.

In [3]:
import numpy as np
print(X_raw.dtypes)
categorical_features_indices = np.where(np.logical_or(X_raw.dtypes == np.int8, X_raw.dtypes == np.int32))[0]

print('categorical_features_indices:',categorical_features_indices)

numeric_features_indices = np.where(X_raw.dtypes == np.float32)[0]
numeric_features_indices
print('numeric_features_indices:',numeric_features_indices)

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

column_transformer = ColumnTransformer ([
    ('onehot', OneHotEncoder(handle_unknown='ignore'),
    categorical_features_indices),
    ('scaler', StandardScaler(),
    numeric_features_indices)
])

Age               float32
Workclass            int8
Education-Num     float32
Marital Status       int8
Occupation           int8
Relationship        int64
Race                 int8
Sex                  int8
Capital Gain      float32
Capital Loss      float32
Hours per week    float32
Country              int8
dtype: object
categorical_features_indices: [ 1  3  4  6  7 11]
numeric_features_indices: [ 0  2  8  9 10]


In [4]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
print("Before label encoding:",Y) # --> [False False False  ... False False True]
Y=le.fit_transform(Y)
print("After label encoding:",Y) # --> [0 0 0  ... 0 0 1]

Before label encoding: [False False False ... False False  True]
After label encoding: [0 0 0 ... 0 0 1]


## Data split and Features enrichment

In [5]:
from sklearn.model_selection import train_test_split

A=X_raw[['Sex', 'Race']]

X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(
    X_raw, Y, A,
    test_size=0.2, random_state=0, stratify=Y)

X_train.reset_index(drop=True)
X_test.reset_index(drop=True)
A_train.reset_index(drop=True)
A_test.reset_index(drop=True)

print("X_raw shape: {}, X_train shape: {}, X_test shape: {}".format(
    X_raw.shape, X_train.shape, X_test.shape))
    
# test dataframe: features enrichment
import pandas as pd

pandas_warnings=pd.get_option('mode.chained_assignment')
# to avoid warning 'A value is trying to be set on a copy of a slice from a DataFrame'

pd.set_option('mode.chained_assignment', None)

# improve labels by replacing numbers with labels
A_test.Sex.loc[(A_test['Sex']==0)] = 'female'
A_test.Sex.loc[(A_test['Sex']==1)] = 'male'

A_test.Race.loc[(A_test['Race']==0)] = 'Amer-Indian-Eskimo'
A_test.Race.loc[(A_test['Race']==1)] = 'Asian-Pac-Islander'
A_test.Race.loc[(A_test['Race']==2)] = 'Black'
A_test.Race.loc[(A_test['Race']==3)] = 'Other'
A_test.Race.loc[(A_test['Race']==4)] = 'White'

pd.set_option('mode.chained_assignment', pandas_warnings)

A_test.head()

X_raw shape: (32561, 12), X_train shape: (26048, 12), X_test shape: (6513, 12)


Unnamed: 0,Sex,Race
13077,male,White
25002,male,Asian-Pac-Islander
23777,female,White
71,female,Black
955,male,White


# Training

### Logistic regression

In [6]:
# Train your first classification model with Logistic Regression
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

clf = Pipeline(steps=[
    ('preprocessor', column_transformer),
    ('classifier_LR', LogisticRegression(solver='liblinear', fit_intercept=True))])

unmitigated_predictor1 = clf.fit(X_train, Y_train)
print('unmitigated_predictor1.score:', unmitigated_predictor1.score(X_test, Y_test))

unmitigated_predictor1.score: 0.8461538461538461


### SVM

In [7]:
# Train your second classification model with SVM
from sklearn import svm
svm_predictor = svm.SVC()
clf = Pipeline(steps=[
    ('preprocessor', column_transformer),
    ('classifier_SVM', svm_predictor)])

unmitigated_predictor2 = clf.fit(X_train, Y_train)
print('unmitigated_predictor2.score:', unmitigated_predictor2.score(X_test, Y_test))

unmitigated_predictor2.score: 0.8509135575003839


### CatBoost Classifier

In [8]:
# !pip install catboost

In [9]:
# Train your third classification model with Catboost Classifier
from catboost import CatBoostClassifier # !pip install catboost==0.18.1

cbc = CatBoostClassifier(
    random_seed=42, logging_level="Silent", iterations=150)


clf = Pipeline(steps=[
    ('preprocessor', column_transformer),
    ('classifier_CBC', cbc)])

unmitigated_predictor3 = clf.fit(X_train, 
                                 Y_train
                                 #classifier_CBC__eval_set(X_test, Y_test)
                                 #classifier_CBC__cat_features=categorical_features_indices
                                )
print('unmitigated_predictor3.score:', unmitigated_predictor3.score(X_test, Y_test))

unmitigated_predictor3.score: 0.873637340703209


### Merge the trained models into an Array

In [10]:
unmitigated_predictors=[]
unmitigated_predictors.append(unmitigated_predictor1)
unmitigated_predictors.append(unmitigated_predictor2)
unmitigated_predictors.append(unmitigated_predictor3)
unmitigated_predictors

[Pipeline(memory=None,
          steps=[('preprocessor',
                  ColumnTransformer(n_jobs=None, remainder='drop',
                                    sparse_threshold=0.3,
                                    transformer_weights=None,
                                    transformers=[('onehot',
                                                   OneHotEncoder(categories='auto',
                                                                 drop=None,
                                                                 dtype=<class 'numpy.float64'>,
                                                                 handle_unknown='ignore',
                                                                 sparse=True),
                                                   array([ 1,  3,  4,  6,  7, 11])),
                                                  ('scaler',
                                                   StandardScaler(copy=True,
                                               

### Models registration

In [22]:
from azureml.core.model import Model
from joblib import dump

registered_unmitigated_predictors=[]
for trained_model in unmitigated_predictors:
    model_name = list(trained_model.named_steps.keys())[-1]
    model_name_toregister=model_name+trained_model.steps[-1][0]
    model_path_local='./outputs/' + model_name_toregister + '.pkl'

    dump(value=trained_model, filename=model_path_local)
    registered_model = Model.register(
        workspace=ws, 
        model_name=model_name_toregister, 
        model_path=model_path_local)
    
    registered_unmitigated_predictors.append(registered_model)
    print(registered_model)

FileNotFoundError: [Errno 2] No such file or directory: './outputs/classifier_LRclassifier_LR.pkl'

In [11]:
for unmitigated_predictor in unmitigated_predictors:
    unmitigated_predictor=unmitigated_predictors[0]
    Y_pred=unmitigated_predictor.predict(X_test)
    print(Y_pred)

[0 0 0 ... 0 0 1]
[0 0 0 ... 0 0 1]
[0 0 0 ... 0 0 1]


## Prepare the confusion matrix

In [13]:
from sklearn.metrics import confusion_matrix

import sklearn.metrics as skm

conf_mx = confusion_matrix(Y_test, Y_pred)

# confusion matrix
print("Confusion matrix:\n",conf_mx)

Confusion matrix:
 [[4596  349]
 [ 653  915]]


### Metrics

In [15]:
print("Confusion matrix:\n",conf_mx)

true_positive  = conf_mx[0,0]
true_negative  = conf_mx[1,1] 
false_negative = conf_mx[0,1]
false_positive = conf_mx[1,0]

total_positive = true_positive + false_negative
total_negative = true_negative + false_positive
total_population  = total_positive + total_negative

recall      = true_positive/total_positive # also called Sensitivity or True Positive Rate
specificity = true_negative/total_negative # also cald True Negative Rate
accuracy    = (true_positive + true_negative) / total_population
precision   = true_positive/(true_positive + false_positive)
f1_score    = 2 * (precision*recall) / (precision+recall)

print ("Recall = Sensitivity = True Positive Rate =", recall)
print ("Specificity = True Negative Rate =", specificity)
print ("Accuracy =", accuracy)
print ("Precision =", precision)
print ("F1 Score=", f1_score)

Confusion matrix:
 [[4596  349]
 [ 653  915]]
Recall = Sensitivity = True Positive Rate = 0.9294236602628918
Specificity = True Negative Rate = 0.5835459183673469
Accuracy = 0.8461538461538461
Precision = 0.8755953514955229
F1 Score= 0.9017068864037668


In [16]:
import sklearn.metrics as skm

for unmitigated_predictor in unmitigated_predictors:
    Y_pred=unmitigated_predictor.predict(X_test)
    conf_mx = confusion_matrix(Y_test, Y_pred)
    print("CLASSIFIER:",unmitigated_predictor.steps[-1][0])
    print("Confusion matrix:\n",skm.confusion_matrix(Y_test,Y_pred))

    print("Recall: {}\nAccuracy: {}\nPrecision: {}\nF1 Score: {}\n".format(
        skm.recall_score(Y_test, Y_pred,average='binary'),
        skm.accuracy_score(Y_test, Y_pred),
        skm.precision_score(Y_test, Y_pred),
        skm.f1_score(Y_test, Y_pred)))

CLASSIFIER: classifier_LR
Confusion matrix:
 [[4596  349]
 [ 653  915]]
Recall: 0.5835459183673469
Accuracy: 0.8461538461538461
Precision: 0.7238924050632911
F1 Score: 0.646186440677966

CLASSIFIER: classifier_SVM
Confusion matrix:
 [[4636  309]
 [ 662  906]]
Recall: 0.5778061224489796
Accuracy: 0.8509135575003839
Precision: 0.745679012345679
F1 Score: 0.651095939633489

CLASSIFIER: classifier_CBC
Confusion matrix:
 [[4677  268]
 [ 555 1013]]
Recall: 0.6460459183673469
Accuracy: 0.873637340703209
Precision: 0.7907884465261514
F1 Score: 0.7111267111267112



### Fairlearn dashboard

In [17]:
from fairlearn.widget import FairlearnDashboard
import joblib, numpy as np

# the following dict contains (<model_id>, <predictions>) pairs
ys_pred = {}

for rup in registered_unmitigated_predictors:
    id=rup.id # extract <model_id> from registered models
    model_name=rup.name
    version=rup.version
    model_path=Model.get_model_path(model_name=model_name, version=version, _workspace=ws)
    unmitigated_predictor = joblib.load(model_path) # retrieve <predictions>
    ys_pred[id]=unmitigated_predictor.predict(X_test)
    
FairlearnDashboard(
    sensitive_features=A_test,
    sensitive_feature_names=np.array(A_test.columns),
    y_true=Y_test,
    y_pred=ys_pred)

ModuleNotFoundError: No module named 'fairlearn.widget'