# End-to-end Responsible AI lifecycle walkthrough

The goal of this notebook is to walk you through a concrete use case by following the [ML workflow](https://www.microsoft.com/en-us/research/publication/software-engineering-for-machine-learning-a-case-study/) and applying the most prominent recommendations from the Responsible AI lifecycle at each stage. This will be done in a cloud-native manner by leveraging [Azure ML MLOps capabilities](https://docs.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment).

This use case uses the well-known [UCI adult census dataset](https://archive.ics.uci.edu/ml/datasets/Adult). For our purposes, we will use treat this as a loan decision classification problem. We will pretend that the label indicates whether each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions will be used to decide whether an individual should be offered a loan.

In what follows, we will go through the stages of the ML workflow sequentially, please consult the acompagnying whitepaper under the whitepaper folder of the repo for more details on the phases of the workflow and the Responsible AI lifecycle activities we take on for each stage.


## Initial Setup

### Connecting to your Azure ML workspace

This step is not needed immidiately but will be very useful for the model training and deployment later on. 

In [2]:
import azureml.core
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, sep='\t')

msazureml	australiaeast	mlservices-rg


The above cell creates a workspace object from the existing workspace. ``Workspace.from_config()`` reads the file ``config.json`` and loads the details into an object named ``ws``. The compute instance has a copy of this file saved in its root directory. If you run the code elsewhere, you'll need to [create the file](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#workspace).

## Data loading

We use the **adult census** dataset that we collect throught the **shap** library. Let's load and have a first look at the data.

In [3]:
import shap # Data is collected through the shap library
import pandas as pd

# Load the adult cencus dataset
X_raw, Y = shap.datasets.adult()
df = pd.DataFrame(X_raw, Y)
print ("X_raw shape:", X_raw.shape)
X_raw.head()

X_raw shape: (32561, 12)


Unnamed: 0,Age,Workclass,Education-Num,Marital Status,Occupation,Relationship,Race,Sex,Capital Gain,Capital Loss,Hours per week,Country
0,39.0,7,13.0,4,1,0,4,1,2174.0,0.0,40.0,39
1,50.0,6,13.0,2,4,4,4,1,0.0,0.0,13.0,39
2,38.0,4,9.0,0,6,0,4,1,0.0,0.0,40.0,39
3,53.0,4,7.0,2,6,4,2,1,0.0,0.0,40.0,39
4,28.0,4,13.0,2,10,5,2,0,0.0,0.0,40.0,5


## Data preprocessing and cleaning

### Identifying and handling the missing values

In our case, there seems to be no missing values in the dataset so we are happy.

In [5]:
# Number of missing values over all columns
X_raw.isna().sum().sum()

0

All features above look numeric, however some of them are just "numeric codes" and the features they represent are rather categorical. 
So for more accurate results, we separate categorical features from “real” numeric ones.

In [6]:
import numpy as np
print(X_raw.dtypes)
categorical_features_indices = np.where(np.logical_or(X_raw.dtypes == np.int8, X_raw.dtypes == np.int32))[0]

print('categorical_features_indices:',categorical_features_indices)

numeric_features_indices = np.where(X_raw.dtypes == np.float32)[0]
numeric_features_indices
print('numeric_features_indices:',numeric_features_indices)

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

column_transformer = ColumnTransformer ([
    ('onehot', OneHotEncoder(handle_unknown='ignore'),
    categorical_features_indices),
    ('scaler', StandardScaler(),
    numeric_features_indices)
])

Age               float32
Workclass            int8
Education-Num     float32
Marital Status       int8
Occupation           int8
Relationship        int64
Race                 int8
Sex                  int8
Capital Gain      float32
Capital Loss      float32
Hours per week    float32
Country              int8
dtype: object
categorical_features_indices: [ 1  3  4  6  7 11]
numeric_features_indices: [ 0  2  8  9 10]


## Data Labeling

In [7]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
print("Before label encoding:",Y) # --> [False False False  ... False False True]
Y=le.fit_transform(Y)
print("After label encoding:",Y) # --> [0 0 0  ... 0 0 1]

Before label encoding: [False False False ... False False  True]
After label encoding: [0 0 0 ... 0 0 1]


## Data split and Features enrichment

In [13]:
from sklearn.model_selection import train_test_split

A=X_raw[['Sex']]

X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(
    X_raw, Y, A,
    test_size=0.2, random_state=0, stratify=Y)

X_train.reset_index(drop=True)
X_test.reset_index(drop=True)
A_train.reset_index(drop=True)
A_test.reset_index(drop=True)

print("X_raw shape: {}, X_train shape: {}, X_test shape: {}".format(
    X_raw.shape, X_train.shape, X_test.shape))
    
# test dataframe: features enrichment
import pandas as pd

pandas_warnings=pd.get_option('mode.chained_assignment')
# to avoid warning 'A value is trying to be set on a copy of a slice from a DataFrame'

pd.set_option('mode.chained_assignment', None)

# improve labels by replacing numbers with labels
A_test.Sex.loc[(A_test['Sex']==0)] = 'female'
A_test.Sex.loc[(A_test['Sex']==1)] = 'male'



pd.set_option('mode.chained_assignment', pandas_warnings)

A_test.head()

X_raw shape: (32561, 12), X_train shape: (26048, 12), X_test shape: (6513, 12)


Unnamed: 0,Sex
13077,male
25002,male
23777,female
71,female
955,male


# Training

### Fitting a CatBoost Classifier

In [25]:
# !pip install catboost

In [9]:
# Train your third classification model with Catboost Classifier
from catboost import CatBoostClassifier # !pip install catboost==0.18.1

model_1 = CatBoostClassifier(
    random_seed=42, logging_level="Silent", iterations=150)


pipeline_1 = Pipeline(steps=[
    ('preprocessor', column_transformer),
    ('classifier_CBC', model_1)])

catboost_predictor = pipeline_1.fit(X_train, Y_train)

print('catboost_predictor.score:', catboost_predictor.score(X_test, Y_test))

catboost_predictor.score: 0.873637340703209


### Model transparency

In [39]:
# Using InterpretML

from interpret import show
from interpret.perf import ROC


# 1. Blackbox model performance
blackbox_perf = ROC(catboost_predictor .predict_proba).explain_perf(X_test, Y_test, name='Catboost Classifier')

# 2. Local Explanations
from interpret.blackbox import LimeTabular
from interpret import show

#Blackbox explainers need a predict function, and optionally a dataset
lime = LimeTabular(predict_fn=catboost_predictor.predict_proba, data=X_train, random_state=1)

#Pick the instances to explain, optionally pass in labels if you have them
lime_local = lime.explain_local(X_test[:5], Y_test[:5], name='LIME')

from interpret.blackbox import PartialDependence

pdp = PartialDependence(predict_fn=catboost_predictor.predict_proba, data=X_train)
pdp_global = pdp.explain_global(name='Partial Dependence')

# Show them all in one dashboard
show([blackbox_perf, lime_local, pdp_global])

In [16]:
# Using raiwidgets

from raiwidgets import ExplanationDashboard
from interpret.ext.blackbox import TabularExplainer

# explain predictions on your local machine
# "features" and "classes" fields are optional
explainer = TabularExplainer(catboost_predictor, 
                             X_train)

# explain overall model predictions (global explanation)
global_explanation = explainer.explain_global(X_test)

ExplanationDashboard(global_explanation, catboost_predictor)

The option feature_dependence has been renamed to feature_perturbation!
The option feature_perturbation="independent" is has been renamed to feature_perturbation="interventional"!
The feature_perturbation option is now deprecated in favor of using the appropriate masker (maskers.Independent, or maskers.Impute)


  0%|          | 0/6513 [00:00<?, ?it/s]

Interpret started at https://abdou-default-compute-5000.francecentral.instances.azureml.ms


<raiwidgets.explanation_dashboard.ExplanationDashboard at 0x7ff1aad143c8>

In [19]:
# Global explenation

ranked_global_importance_names = global_explanation.get_ranked_global_names() 
ranked_global_importance_values = global_explanation.get_ranked_global_values()  
shap.summary_plot(np.array([ranked_global_importance_values]), ranked_global_importance_names, plot_type="bar")


### Using Glassbox model: EBM

In [28]:
from interpret.glassbox import ExplainableBoostingClassifier

seed = 1
model_2 = ExplainableBoostingClassifier(random_state=seed, n_jobs=-1)


pipeline_2 = Pipeline(steps=[
    ('preprocessor', column_transformer),
    ('classifier_EBM', model_2)])

ebm_predictor = pipeline_2.fit(X_train, Y_train)
print('ebm_predictor.score:', ebm_predictor.score(X_test, Y_test))

Sparse data not fully supported, will be densified for now, may cause OOM


### Assessing Fairness issues

In [27]:
from raiwidgets import FairnessDashboard
Y_pred = catboost_predictor.predict(X_test)
FairnessDashboard(sensitive_features=A_test,
                  y_true=Y_test,
                  y_pred=Y_pred)

Fairness started at https://abdou-default-compute-5001.francecentral.instances.azureml.ms


<raiwidgets.fairness_dashboard.FairnessDashboard at 0x7fd0324082e8>

### Mitigating fairness issues

In [21]:
from fairlearn.reductions import GridSearch
from fairlearn.reductions import DemographicParity, ErrorRate

sweep = GridSearch(
    model_1,
    constraints=DemographicParity(),
    grid_size=70)

sweep.fit(X_train, Y_train, sensitive_features=A_train.Sex)

In [24]:
from raiwidgets import FairnessDashboard
mitigated_predictors = sweep.predictors_

ys_mitigated_predictors = {} # it contains (<model_id>, <predictions>) pairs

# the original prediction:
ys_mitigated_predictors["census_unmitigated"]=catboost_predictor.predict(X_test)

base_predictor_name="mitigated_predictor_{0}"
model_id=1

for mp in mitigated_predictors:
    id=base_predictor_name.format(model_id)
    ys_mitigated_predictors[id]=mp.predict(X_test)
    model_id=model_id+1
    
FairnessDashboard(
    sensitive_features=A_test,
    y_true=Y_test,
    y_pred=ys_mitigated_predictors)

Fairness started at https://abdou-default-compute-5001.francecentral.instances.azureml.ms


<raiwidgets.fairness_dashboard.FairnessDashboard at 0x7ff19599a128>

You have provided 'metrics', 'y_true', 'y_pred' as positional arguments. Please pass them as keyword arguments. From version 0.10.0 passing them as positional arguments will result in an error.


## Model evaluation

In [14]:
# Metrics
from fairlearn.metrics import (
    MetricFrame,
    selection_rate, demographic_parity_difference, demographic_parity_ratio,
    false_positive_rate, false_negative_rate,
    false_positive_rate_difference, false_negative_rate_difference,
    equalized_odds_difference)
from sklearn.metrics import balanced_accuracy_score, roc_auc_score

# Some helper functions to be used later
def get_metrics_df(models_dict, y_true, group):
    metrics_dict = {
        "Overall selection rate": (
            lambda x: selection_rate(y_true, x), True),
        "Demographic parity difference": (
            lambda x: demographic_parity_difference(y_true, x, sensitive_features=group), True),
        "Demographic parity ratio": (
            lambda x: demographic_parity_ratio(y_true, x, sensitive_features=group), True),
        "------": (lambda x: "", True),
        "Overall balanced error rate": (
            lambda x: 1-balanced_accuracy_score(y_true, x), True),
        "Balanced error rate difference": (
            lambda x: MetricFrame(balanced_accuracy_score, y_true, x, sensitive_features=group).difference(method='between_groups'), True),
        " ------": (lambda x: "", True),
        "False positive rate difference": (
            lambda x: false_positive_rate_difference(y_true, x, sensitive_features=group), True),
        "False negative rate difference": (
            lambda x: false_negative_rate_difference(y_true, x, sensitive_features=group), True),
        "Equalized odds difference": (
            lambda x: equalized_odds_difference(y_true, x, sensitive_features=group), True),
        "  ------": (lambda x: "", True),
        "Overall AUC": (
            lambda x: roc_auc_score(y_true, x), False),
        "AUC difference": (
            lambda x: MetricFrame(roc_auc_score, y_true, x, sensitive_features=group).difference(method='between_groups'), False),
    }
    df_dict = {}
    for metric_name, (metric_func, use_preds) in metrics_dict.items():
        df_dict[metric_name] = [metric_func(preds) if use_preds else metric_func(scores) 
                                for model_name, (preds, scores) in models_dict.items()]
    return pd.DataFrame.from_dict(df_dict, orient="index", columns=models_dict.keys())

In [15]:
# Scores on test set
test_scores = catboost_predictor.predict_proba(X_test)[:, 1]
# Predictions (0 or 1) on test set
test_preds = (test_scores >= np.mean(Y_train)) * 1
# Metrics
models_dict = {"catboost": (test_preds, test_scores)}

get_metrics_df(models_dict, Y_test, A_test)

You have provided 'metrics', 'y_true', 'y_pred' as positional arguments. Please pass them as keyword arguments. From version 0.10.0 passing them as positional arguments will result in an error.
You have provided 'metrics', 'y_true', 'y_pred' as positional arguments. Please pass them as keyword arguments. From version 0.10.0 passing them as positional arguments will result in an error.


Unnamed: 0,catboost
Overall selection rate,0.340396
Demographic parity difference,0.305226
Demographic parity ratio,0.306794
------,
Overall balanced error rate,0.15928
Balanced error rate difference,0.0426844
------,
False positive rate difference,0.195099
False negative rate difference,0.10973
Equalized odds difference,0.195099


## Model deployment

### Step 1: Model registration

First, we save the model to a file and verify we can properly load it.

In [12]:
import pickle

with open('catboost_predictor', 'wb') as file:
    pickle.dump(catboost_predictor,file)

# Checking the model can be properly loaded from the file
with open('catboost_predictor', 'rb') as file:
    loaded_model = pickle.load(file)

loaded_model.predict(X_test)

array([0, 0, 0, ..., 0, 0, 0])

Next, we use the Model class to save the model to Azure by providing the path to the model_file we just saved.

In [13]:
from azureml.core.model import Model

# Register model
model = Model.register(ws, model_name="catboost_predictor", model_path="./catboost_predictor")
print('Name:', model.name)
print('Version:', model.version)

Registering model catboost_predictor


### Step 2: Define an entry script

Please follow the instructions on the whitepaper to create a first entry script ``score.py``  under ``source_dir`` directory before continuing.


### Step 3: Define an inference configuration

In [66]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig

env = Environment(name="project_environment")

conda_dep = CondaDependencies()

# Installs azure-ml-api-sdk package
conda_dep.add_pip_package("azure-ml-api-sdk")

# Installs catboost package
conda_dep.add_pip_package("catboost")

# Installs numpy package
conda_dep.add_pip_package("numpy")

# Installs pandas package
conda_dep.add_pip_package("pandas")

# Installs sklearn package
conda_dep.add_pip_package("sklearn")

# Adds dependencies to PythonSection of env
env.python.conda_dependencies=conda_dep

inference_config = InferenceConfig(
    environment=env,
    source_directory="./source_dir",
    entry_script="./score.py",
)

### Step 4: Define a deployment configuration

In [60]:
from azureml.core.webservice import LocalWebservice

deployment_config = LocalWebservice.deploy_configuration(port=6789)

## Step 5: Deploy your ML model

In [61]:
service = Model.deploy(
    ws,
    "myservice",
    [model],
    inference_config,
    deployment_config,
    overwrite=True,
)
service.wait_for_deployment(show_output=True)

print(service.get_logs())


Downloading model catboost_predictor:1 to /tmp/azureml_uikco5i9/catboost_predictor/1
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry abdoudefaultcontainerregistry.azurecr.io
Logging into Docker registry abdoudefaultcontainerregistry.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM abdoudefaultcontainerregistry.azurecr.io/azureml/azureml_c2add3fa2bf2efed1a677650ec3e09e9
 ---> e53cd91d6f55
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 771555cb3126
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjU2N2Q3MGNkLWVlMTYtNDlkOS1iZTlmLTMxMDg4YjY1ZTUzYiIsInJlc291cmNlR3JvdXBOYW1lIjoiYWJkb3UtcmVzb3VyY2VzLWdyb3VwIiwiYWNjb3VudE5hbWUiOiJhYmRvdS1henVyZS1tbC13b3Jrc3BhY2UiLCJ3b3Jrc3BhY2VJZCI6Ijg4OWVkZjRlLTBhMTktNDY3Yi05NTZiLWJiZmY4YTI0OWY4YyJ9LCJtb2RlbHMiOnt9LCJtb2RlbHNJbmZvIjp7fX0= | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 237b9f91108b
 ---> 734c3fc6ef74
S

### Step 6: Call into our model

In [62]:
import requests
import json

uri = service.scoring_uri
headers = {"Content-Type": "application/json"}

json_test = X_test.head().to_json()
data = json.dumps(json_test)
response = requests.post(uri, data=data, headers=headers)
print(response.json())

{'data': [0, 0, 0, 0, 0], 'message': 'Successfully classified loan'}


In [64]:
Y_test[:5]

array([0, 0, 0, 0, 0])

It looks like our model works and we correctly classified the first five entries of the test set.

### Deployment to Azure Container Instance: repeating steps  4 - 6 with ACI compute target

In [67]:
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import Model

# Step 4: Define deployment config
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

# Step 5: Deploying the model
service = Model.deploy(ws, "aciservice", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

# Step 6: Consuming the endpoint
uri = service.scoring_uri
headers = {"Content-Type": "application/json"}
json_test = X_test.head().to_json()
data = json.dumps(json_test)
response = requests.post(uri, data=data, headers=headers)
print(response.json())


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-09-07 15:01:33+00:00 Creating Container Registry if not exists.
2021-09-07 15:01:33+00:00 Registering the environment.
2021-09-07 15:01:36+00:00 Use the existing image.
2021-09-07 15:01:36+00:00 Generating deployment configuration.
2021-09-07 15:01:37+00:00 Submitting deployment to compute.
2021-09-07 15:01:42+00:00 Checking the status of deployment aciservice..
2021-09-07 15:04:12+00:00 Checking the status of inference endpoint aciservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
