# Responsible ML - Fairness, Explainability, and Error Analysis

## Install prerequisites

Before running the notebook, make sure the correct versions of these libraries are installed.

In [None]:
!pip install azureml-sdk --upgrade

In [None]:
!pip install fairlearn==0.6.2 --upgrade

In [None]:
!pip install azureml-contrib-interpret --upgrade

In [None]:
!pip install azureml-contrib-fairness --upgrade

In [None]:
!pip install azureml-interpret --upgrade

In [None]:
!pip install gevent requests flask flask-cors

In [None]:
!pip install interpret-community --upgrade

In [None]:
!pip install --upgrade azureml.contrib.interpret

In [None]:
!pip install raiwidgets==0.8.0 --upgrade

In [None]:
!pip install azureml-interpret --upgrade

### ---- Restart Kernel ----

## Create working directory

The cell below creates our working directory. This will hold our generated scripts.

In [1]:
import warnings
import os
warnings.filterwarnings('ignore')
project_folder = './scripts'

if not os.path.exists(project_folder):
    os.makedirs(project_folder)

## Write utils.py into working directory

The `sklearn.preprocessing.LabelEncoder` encodes target labels with value between 0 and n_classes-1.

The `sklearn.model_selection.train_test_split` splits arrays or matrices into random train and test subsets

The `sklearn.metrics.accuracy_score` is an accuracy classification score. In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

The `sklearn.metrics.confusion_matrix` is compute confusion matrix to evaluate the accuracy of a classification.

The `sklearn.metrics.f1_score` computes the F1 score, also known as balanced F-score or F-measure. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.

The `sklearn.metrics.precision_score` computes the precision. The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives.

The `sklearn.metrics.recall_score` computes the recall. The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The `sklearn.metrics.roc_auc_score` computes Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

The `sklearn.metrics.roc_curve` computes Receiver operating characteristic (ROC).

The `Model Class` represents the result of machine learning training. A model is the result of a Azure Machine learning training Run or some other model training process outside of Azure. Regardless of how the model is produced, it can be registered in a workspace, where it is represented by a name and a version. 


For more information on **Model Class**, please visit: [Microsoft Model Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)

In [None]:
%%writefile $project_folder/utils.py
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt
import os
import seaborn as sns
from azureml.core import Model
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, precision_score, recall_score, roc_auc_score, roc_curve
from azureml.core import Dataset
from azureml.data.datapath import DataPath
from azureml.core import Model

def split_dataset(X_raw, Y):
    A = X_raw[['SEX','RACE']]
    X = X_raw.drop(labels=['SEX', 'RACE'],axis = 1)
    X = pd.get_dummies(X)

    le = LabelEncoder()
    Y = le.fit_transform(Y)

    X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_raw, 
                                                        Y, 
                                                        A,
                                                        test_size = 0.2,
                                                        random_state=123,
                                                        stratify=Y)

    # Work around indexing bug
    X_train = X_train.reset_index(drop=True)
    A_train = A_train.reset_index(drop=True)
    X_test = X_test.reset_index(drop=True)
    A_test = A_test.reset_index(drop=True)

    # Improve labels
    A_test.SEX.loc[(A_test['SEX'] == 0)] = 'female'
    A_test.SEX.loc[(A_test['SEX'] == 1)] = 'male'

    A_test.RACE.loc[(A_test['RACE'] == 0)] = 'Amer-Indian-Eskimo'
    A_test.RACE.loc[(A_test['RACE'] == 1)] = 'Asian-Pac-Islander'
    A_test.RACE.loc[(A_test['RACE'] == 2)] = 'Black'
    A_test.RACE.loc[(A_test['RACE'] == 3)] = 'Other'
    A_test.RACE.loc[(A_test['RACE'] == 4)] = 'White'
    return X_train, X_test, Y_train, Y_test, A_train, A_test 

def prepareDataset(X_raw):
    df = X_raw.to_pandas_dataframe()
    df = df.drop(columns=['HOURS PER WEEK','COUNTRY'])
    df = df.sample(n=30000)
    df[list(df.columns)] = df[list(df.columns)].astype(int)
    Y = df['SHOULD_APPROVE'].values
    synth_df = df.drop(columns=['SHOULD_APPROVE'])
    return synth_df, Y

def fetch_registered_dataset(ws):
    datastore = ws.get_default_datastore()
    datastore.upload_files(files=['x_raw.csv'], overwrite=True)
    datastore_path = [DataPath(datastore, 'x_raw.csv')]
    tabular = Dataset.Tabular.from_delimited_files(path=datastore_path)
    return tabular
    
def analyze_model(clf, X_test, Y_test, preds):
    accuracy = accuracy_score(Y_test, preds)
    print(f'Accuracy', np.float(accuracy))

    precision = precision_score(Y_test, preds, average="macro")
    print(f'Precision', np.float(precision))

    recall = recall_score(Y_test, preds, average="macro")
    print(f'Recall', np.float(recall))

    f1score = f1_score(Y_test, preds, average="macro")
    print(f'F1 Score', np.float(f1score))

    class_names = clf.classes_
    fig, ax = plt.subplots()
    tick_marks = np.arange(len(class_names))
    plt.xticks(tick_marks, class_names)
    plt.yticks(tick_marks, class_names)
    sns.heatmap(pd.DataFrame(confusion_matrix(Y_test, preds)), annot=True, cmap='YlGnBu', fmt='g')
    ax.xaxis.set_label_position('top')
    plt.tight_layout()
    plt.title('Confusion Matrix', y=1.1)
    plt.ylabel('Actual label')
    plt.xlabel('Predicted label')
    plt.show()
    plt.close()

    preds_proba = clf.predict_proba(X_test)[::,1]
    fpr, tpr, _ = roc_curve(Y_test, preds_proba, pos_label = clf.classes_[1])
    auc = roc_auc_score(Y_test, preds_proba)
    plt.plot(fpr, tpr, label="data 1, auc=" + str(auc))
    plt.legend(loc=4)
    plt.show()
    plt.close()

def register_model(name, model, ws):
    print("Registering ", name)
    model_path = "models/{0}.pkl".format(name)
    if (name == "loan_approval_grid_model_30"):
        print (model.coef_)
    joblib.dump(value=model, filename=model_path)
    registered_model = Model.register(model_path=model_path,
                                      model_name=name,
                                      workspace=ws)
    print("Registered ", registered_model.id)
    return registered_model.id

## Setup Azure ML

In the next cell, we create a new Workspace config object using the `<subscription_id>`, `<resource_group_name>`, and `<workspace_name>`. This will fetch the matching Workspace and prompt you for authentication. Please click on the link and input the provided details.

For more information on **Workspace**, please visit: [Microsoft Workspace Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py)

`<subscription_id>` = You can get this ID from the landing page of your Resource Group.

`<resource_group_name>` = This is the name of your Resource Group.

`<workspace_name>` = This is the name of your Workspace.

In [None]:
from azureml.core.workspace import Workspace

try:    
    ws = Workspace(
        subscription_id = '<subscription_id>', 
        resource_group = '<resource_group>', 
        workspace_name = '<workspace_name>')

    # Writes workspace config file
    ws.write_config()
    
    print('Library configuration succeeded')
except Exception as e:
    print(e)
    print('Workspace not found')

## Fetch Privatized Data

Let's retrieve our dataset from the default workspace Datastore.

In [None]:
from scripts.utils import *

tabular = fetch_registered_dataset(ws)

# Build Model

<img align="left" src="./images/MLOPs-1.gif"/>

## Create experiment

In our script, there are four distinct sections:

1. Feature encoding for the Scikit-learn training.
1. Executing the Scikit-learn experiment


The `Pipeline()` function purpose is to assemble several steps that can be cross-validated together while setting different parameters.

The `sklearn.linear_model.LogisticRegression` class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers.

The `sklearn.preprocessing.StandardScaler()` function standardizes features by removing the mean and scaling to unit variance

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from azureml.core import Dataset, Run
from sklearn.preprocessing import LabelEncoder
from scripts.utils import *
import joblib
import numpy as np
import pandas as pd

run = Run.get_context()

# Fetch dataset from the run by name
synth_df, Y = prepareDataset(tabular)

#Split dataset
X_train, X_test, Y_train, Y_test, A_train, A_test = split_dataset(synth_df, Y)

# Setup scikit-learn pipeline
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])
preprocessor = ColumnTransformer([('onehot', OneHotEncoder(handle_unknown='ignore'), ['SEX', 'RACE',
                                                                                    'WORKCLASS', 'MARITAL STATUS', 
                                                                                      'OCCUPATION','RELATIONSHIP']),
                                       ('scaler', StandardScaler(), ['CAPITAL GAIN', 'CAPITAL LOSS', 
                                                                     'EDUCATION-NUM'])])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='liblinear', fit_intercept=True))])

model = clf.fit(X_train, Y_train)
preds = clf.predict(X_test)
analyze_model(clf, X_test, Y_test, preds)

# Fairlearn

<img align="left" src="./images/RespML-FairnessBias.gif"/>


Artificial intelligence and machine learning systems can display unfair behavior.

Let's use Fairlearn open-source Python package with Azure Machine Learning to perform the following tasks:

* Assess the fairness of your model predictions. To learn more about fairness in machine learning, see the fairness in machine learning article.
* Upload, list and download fairness assessment insights to/from Azure Machine Learning studio.
* See a fairness assessment dashboard in Azure Machine Learning studio to interact with your model(s)' fairness insights.

The FairlearnDashboard class, wraps the dashboard component.

In [None]:
from fairlearn.widget import FairlearnDashboard

FairlearnDashboard(sensitive_features=A_test,
                   sensitive_feature_names=['Sex', 'Race'],
                   y_true=Y_test.tolist(),
                   y_pred=[preds.tolist()])

# InterpretML

As our next step we will retrieve our trained model and instantiate the Explainability Dashboard with the data we encoded above.

After the Explainability Dashboard has loaded you will be able to navigate through the user interface to identify the most important features of your new model.

<img align="left" src="./images/RespML-Explainability.gif"/>

In [None]:
from interpret.ext.blackbox import KernelExplainer

explainer = KernelExplainer(clf.steps[-1][1], 
                             initialization_examples=X_train, 
                             features=X_train.columns, 
                             classes=['Rejected', 'Approved'], 
                             transformations=preprocessor)

global_explanation = explainer.explain_global(X_test)

In [None]:
from interpret_community.widget import ExplanationDashboard
ExplanationDashboard._cdn_path = "newDash2.js"
ExplanationDashboard(global_explanation, model, datasetX=X_test, trueY=Y_test)

# Error Analysis

The Error Analysis widget helps us to get a deep understanding of how failure is distributed for a model. It also aides in debugging ML errors with active data exploration and interpretability techniques.

In [None]:
from raiwidgets import ErrorAnalysisDashboard

ErrorAnalysisDashboard(global_explanation, model, dataset=X_test, true_y=Y_test)

# Mitigation

Let's retrieve our trained model and prepare our data to take the expected shape for the Fairlearn Dashboard. This can be done by encoding the dataset in the same way that we encoded it within the training script. Once the Fairlearn Dashboard has loaded we will be able to interact with the user interface to detect any unfairness against our dataset.

## Mitigate training script

The script below will retrain a new model and mitigate the unfairness based on the analysis made above.

In [10]:
from fairlearn.reductions import DemographicParity, GridSearch, ErrorRate
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import os
import pandas as pd
from scripts.utils import *

# Fetch dataset from the run by name
synth_df, Y = prepareDataset(tabular)

#Split dataset
X_train, X_test, Y_train, Y_test, A_train, A_test = split_dataset(synth_df, Y)

first_sweep = GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),
                   constraints=DemographicParity(),
                   grid_size=70)

first_sweep.fit(X_train, Y_train, sensitive_features=A_train.SEX)

predictors = first_sweep.predictors_

## Unfair model vs Mitigated model

Below we will instantiate the Fairlearn Dashboard in order to compare the unfair model versus the mitigated model and check the accuracy for both.

In [None]:
from fairlearn.widget import FairlearnDashboard
from fairlearn.reductions import DemographicParity, ErrorRate
import joblib

errors, disparities = [], []
for m in predictors:
    classifier = lambda X: m.predict(X)
    
    error = ErrorRate()
    error.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.SEX)
    disparity = DemographicParity()
    disparity.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.SEX)
    
    errors.append(error.gamma(classifier)[0])
    disparities.append(disparity.gamma(classifier).max())
    
all_results = pd.DataFrame( {"predictor": predictors, "error": errors, "disparity": disparities})

all_models_dict = {"loan_approval_unmitigated": model}
dominant_models_dict = {"loan_approval_unmitigated": model}
base_name_format = "loan_approval_grid_model_{0}"
row_id = 0
for row in all_results.itertuples():
    model_name = base_name_format.format(row_id)
    all_models_dict[model_name] = row.predictor
    errors_for_lower_or_eq_disparity = all_results["error"][all_results["disparity"]<=row.disparity]
    if row.error <= errors_for_lower_or_eq_disparity.min():
        dominant_models_dict[model_name] = row.predictor
    row_id = row_id + 1

dashboard_all = dict()
models_all = dict()
for name, predictor in all_models_dict.items():
    value = predictor.predict(X_test)
    dashboard_all[name] = value
    models_all[name] = predictor
    
dominant_all = dict()
for n, p in dominant_models_dict.items():
    dominant_all[n] = p.predict(X_test)
    
FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['Sex', 'Race'],
                   y_true=Y_test.tolist(), y_pred=dominant_all)

## Register all models

Register all models with Azure ML.

In [None]:
os.makedirs('models', exist_ok=True)

model_name_id_mapping = dict()
for name, model in models_all.items():
    m_id = register_model(name, model, ws)
    model_name_id_mapping[name] = m_id

dominant_all_ids = dict()
for name, y_pred in dominant_all.items():
    dominant_all_ids[model_name_id_mapping[name]] = y_pred

## Register model :30

Register our preferred model with Azure ML.

In [None]:
from azureml.core import Model

model_name="loan_approval_grid_model_30"
model=models_all[model_name]
print("Registering ", model_name)
model_path = "models/{0}.pkl".format(model_name)
print (model.coef_)
joblib.dump(value=model, filename=model_path)
registered_model = Model.register(model_path=model_path,
                                  model_name=model_name,
                                  workspace=ws)
print("Registered ", registered_model.id)

## Uploading a dashboard

We create a _dashboard dictionary_ using Fairlearn's `metrics` package. The `_create_group_metric_set` method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available), and we must specify the type of prediction. Note that we use the `dashboard_registered` dictionary we just created:

In [14]:
from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id
from fairlearn.metrics._group_metric_set import _create_group_metric_set

sf = { 'sex': A_test.SEX, 'race': A_test.RACE }
dash_dict_all = _create_group_metric_set(y_true=Y_test,
                                         predictions=dominant_all_ids,
                                         sensitive_features=sf,
                                         prediction_type='binary_classification')

## Upload Explanations

The Experiment constructor allows to create an experiment instance. The constructor takes in the current workspace, and an experiment name. 

The ExplanationClient object defines the client that uploads and downloads explanations. 

For more information on **ExplanationClient**, please visit: [Microsoft ExplanationClient Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-interpret/azureml.interpret.explanationclient?view=azure-ml-py)

In [None]:
from azureml.interpret import ExplanationClient
from azureml.core import Experiment

exp = Experiment(ws, "Loan_Approval_Exp")
print(exp)

run = exp.start_logging()
try:
    dashboard_title = "Upload MultiAsset from Grid Search with Loan Approval dataset"
    upload_id = upload_dashboard_dictionary(run,
                                            dash_dict_all,
                                            dashboard_name=dashboard_title)
    print("\nUploaded to id: {0}\n".format(upload_id))

    downloaded_dict = download_dashboard_by_upload_id(run, upload_id)
    
    
finally:
    run.complete()

client = ExplanationClient.from_run(run)
client.upload_model_explanation(global_explanation, comment = "Loan Approval data global explanation")