# Watson OpenScale Fairness Metrics and Transformers

## 1. Introduction <a name="introduction"></a>
The notebook will compute Fairness Metrics **Statistical Parity Difference**, **Smoothed Empirical Differential**, on the model prediction and then show how **Fair Score Transformer** can be used to transform the model output for fair prediction. and then compute **Performance measures** to evalue the model performance, do **Multi-Dimensinal Subset Scan** search any potiencial biases,<br/>

This document includes below sections, you will *`edit`* and *`restart`* notebook kernel in **Setup** section:

- [1.Introduction](#introduction)
- [2.Setup Envrionments](#setup)
- [3.Statistical Parity Difference](#spd)
- [4.Smoothed Empirical Differential](#sed)
- [5.Fair Score Transformer](#fst)
- [6.Performance measures](#measures)
- [7.Multi-Dimensinal Subset Scan](#mdss)

**Note:** This notebook should be run using with **Python 3.9.x** runtime. It requires service credentials for the following services:
  * Watson OpenScale <br/>
  * IBM Analytics Engine <br/>

## 2. Setup Envrionments <a name="setup"></a>

### 2.1 Package installation

*[Optional]* ignore warning messages to make output more clear.

In [None]:
import warnings
warnings.filterwarnings("ignore")

Install packages:

In [None]:
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1
!pip install --upgrade ibm-metrics-plugin --no-cache | tail -n 1

**Action: Restrat the kernel\!**

### 2.2 Configure credentials for WASTON OpenScale 
Configure credentials for WASTON OpenScale into the authenticator, which will be used in OpenScale client

In [None]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

WOS_CREDENTIALS = {
    "url": "<cluster-url>",
    "username": "<username>",
    "password": "<password>",
    "instance_id": "<openscale instance id>"
}

authenticator = CloudPakForDataAuthenticator(
    url=WOS_CREDENTIALS["url"],
    username=WOS_CREDENTIALS["username"],
    password=WOS_CREDENTIALS["password"],
    disable_ssl_verification=True
)

### 2.3 Configure credentials for IBM Analytics Engine - Spark

Make sure that the Apache Spark manager on IBM Analytics Engine is running, and then provide the following details:

- IAE_SPARK_DISPLAY_NAME: _Display Name of the Spark instance in IBM Analytics Engine_
- IAE_SPARK_JOBS_ENDPOINT: _Spark Jobs Endpoint for IBM Analytics Engine_
- IBM_CPD_VOLUME: _IBM Cloud Pak for Data storage volume name_
- IBM_CPD_USERNAME: _IBM Cloud Pak for Data username_
- IBM_CPD_APIKEY: _IBM Cloud Pak for Data API key_
- IAE_INSTANCE_ID: _IBM Analytics Engine spark instance id_

In [None]:
IAE_SPARK_DISPLAY_NAME = "<spark-engine-name>"
IAE_SPARK_JOBS_ENDPOINT = "<spark-job-endpoint-for-ibm-analytics-engine>"
IBM_CPD_VOLUME = "<ibm-cpd-volume>"
IBM_CPD_USERNAME = "<ibm-cloud-pak-for-data-username>"
IBM_CPD_APIKEY = "<ibm-cloud-pak-for-data-apikey>"
IAE_INSTANCE_ID = "<ibm-analytics-engine-spark-instance-id>"

The credential informations will be used to run spark job.

In [None]:
credentials = {
            "connection": {
                "endpoint": IAE_SPARK_JOBS_ENDPOINT,
                "location_type": "cpd_iae",
                "display_name": IAE_SPARK_DISPLAY_NAME,
                "instance_id": IAE_INSTANCE_ID,
                "volume": IBM_CPD_VOLUME
            },
            "credentials": {
                "username": IBM_CPD_USERNAME,
                "apikey":IBM_CPD_APIKEY
            }
        }

### 2.4 Configure resource setting for Spark job

To configure how much of your Spark Cluster resources this job can consume, edit the following values:

- max_num_executors: _Maximum Number of executors to launch for this session_
- min_executors: _Minimum Number of executors to launch for this session_
- executor_cores: _Number of cores to use for each executor_   
- executor_memory: _Amount of memory (in GBs) to use per executor process_
- driver_cores: _Number of cores to use for the driver process_
- driver_memory: _Amount of memory (in GBs) to use for the driver process_

These informations will be configured into spark job parameters.

In [None]:
spark_parameters = {
                "max_num_executors": 4,
                "min_executors": 1,
                "executor_cores": 1,
                "executor_memory": 1,
                "driver_cores": 1,
                "driver_memory": 1
}

### 2.5 Configure storage parameters for Spark job

In this notebook, DB2 is used as the storage source of input datasets. 
#### Storage Inputs

Please enter a name and description for your JDBC Storage

- JDBC_CONNECTION_NAME: _Custom display name for the JDBC Storage Connection_
- JDBC_CONNECTION_DESCRIPTION: _Custom description for the JDBC Storage Connection_

To connect to your JDBC storage, you must provide the following details:

 - JDBC_HOST: Hostname of the JDBC Connection
 - JDBC_PORT: Port of the JDBC Connection
 - JDBC_USE_SSL: Boolean Flag to indicate whether to use SSL while connecting.
 - JDBC_SSL_CERTIFICATE: SSL Certificate [Base64 encoded string] of the JDBC Connection. Ignored if JDBC_USE_SSL is False.
 - JDBC_DRIVER: Class name of the JDBC driver to use to connect.
 - JDBC_USERNAME: Username of the JDBC Connection
 - JDBC_PASSWORD: Password of the JDBC Connection
 - JDBC_DATABASE_NAME: Name of the Database to connect to.

In [None]:
JDBC_HOST = "<Hostname of the JDBC Connection>"
JDBC_PORT = "<Port of the JDBC Connection>"
JDBC_USE_SSL = "<Boolean Flag to indicate whether to use SSL while connecting.>"
JDBC_SSL_CERTIFICATE = "<SSL Certificate [Base64 encoded string] of the JDBC Connection. Ignored if JDBC_USE_SSL is False.>"
JDBC_DRIVER = "<Class name of the JDBC driver to use to connect.>"
JDBC_USERNAME = "<Username of the JDBC Connection>"
JDBC_PASSWORD = "<Password of the JDBC Connection>"
JDBC_DATABASE_NAME = "<Name of the Database to connect to.>"

jdbc_url = "jdbc:db2://{}:{}/{}".format(JDBC_HOST, JDBC_PORT, JDBC_DATABASE_NAME)

These informations will be configrued into storage_details session of spark job parameters.

In [None]:
storage_details = {
    "type": "jdbc",
    "connection": {
        "jdbc_driver": JDBC_DRIVER,
        "jdbc_url": jdbc_url,
        "use_ssl": JDBC_USE_SSL,
        "certificate": JDBC_SSL_CERTIFICATE,
        "location_type": "jdbc"
    },
    "credentials":{
        "username": JDBC_USERNAME,
        "password": JDBC_PASSWORD,
    }
}

#### Training table metadata

Each quality monitor could use its own training table from the database, so the table metadata will be addressed respectively.

### 2.6 Setup OpenScale client 
Setup a Python OpenScale client with above setting.

In [None]:
from ibm_watson_openscale import APIClient as OpenScaleAPIClient

client = OpenScaleAPIClient(
    service_url=WOS_CREDENTIALS['url'],
    service_instance_id=WOS_CREDENTIALS["instance_id"],
    authenticator=authenticator
)

client.version

## 3. Statistical Parity Difference <a name="spd"></a>

**Statistical Parity Difference** is a fairness metric that can be used to describe the fairness for the model predictions.
It is the difference between the ratio of favourable outcomes in unprivileged and privileged groups. It can
be computed from either the input dataset or the dataset output from a classifier (predicted dataset). A value
of 0 implies both groups have equal benefit, a value less than 0 implies higher benefit for the privileged group, and a value greater than 0 implies higher benefit for the unprivileged group.<br>
$$𝑃(𝑌=1|𝐷=unprivileged)−𝑃(𝑌=1|𝐷=privileged)$$

Take the German credit risk datasets as example, if user set
+ privileged group as Sex="male" 
+ unprivileged group as Sex="female"

and set
+ favourable label as Risk="No Risk"
+ unfavourable label as Risk="Risk"

then, the SPD result 
+ spd > 0 means the unpriviliage group Sex="female" has higher rate to be marked as favourable label "No Risk" than priviliage group Sex="male".
+ spd = 0 means the unpriviliage group Sex="female" has same rate to be marked as favourable label "No Risk" with priviliage group Sex="male".
+ spd < 0 means the unpriviliage group Sex="female" has lower rate to be marked as favourable label "No Risk" than priviliage group Sex="male".


### 3.1 Prepare input for Statistical Parity Difference
The quality monitor stores metadata in the training table.

- TRAIN_SCHEMA_NAME: _Schema name where training table is present_
- TRAIN_TABLE_NAME: _Name of the training table_

In [None]:
TRAIN_SCHEMA_NAME = "***"
TRAIN_TABLE_NAME = "***"

These informations will be configured into tables session of spark job paramters.

In [None]:
tables = [
    {
        "database": JDBC_DATABASE_NAME,
        "schema": TRAIN_SCHEMA_NAME,
        "table": TRAIN_TABLE_NAME,
        "type": "training"
    }
]

tables

### 3.2  Statistical Parity Difference Configurations

Setup configuration to compute *Statistical Parity Difference*,<br/>

Configure label and problem type in the overall section.
- **problem_type(str)**: `binary` and `multi-classification` is supported.
- **label_column(str)**: Column name of label in the data frame

Inside `fairness` as below, there are three sections which is required to configure.
- **metrics_configuration(dict)**: Configure *Statistical Parity Difference* as one of the metrics with name `FairnessMetricType.SPD.value`, and it requires a `features` property to describe which features the metric will be computed upon. *Statistical Parity Difference* is supported to run with individual features (eg. `[["a"],["b"]]`), but not suppored to run with intersectional features (eg. `[["a", "b"]]`).

- **protected_attributes(list)**: Describe privileged group defintion for features upon which this metric will be computed. Configure each feature with below information:
  - feature(str): Name of the feature, which should be same as configured in `features` of `metrics_configuration` section.
  - reference_group(list): List of feature values which make a sample privileged. 

- **favourable_label(list)**: A list of favourable labels or outcomes of the model.


**Note** that `label_column` used here is the new added `pred` column.<br/>

In [None]:
from ibm_metrics_plugin.common.utils.constants import FairnessMetricType, MetricGroupType
spd_config = {
            "problem_type": "***",
            "label_column" : "***",
            "fairness": {
                            "metrics_configuration": {
                                FairnessMetricType.SPD.value: {
                                    "features": [ ["***"] ]                                
                                }
                            },
                            "protected_attributes": [
                                {
                                    "feature": "***",
                                    "reference_group": ["***"]
                                }
                            ],
                            "favourable_label": ["***"]
                        }
        }

spd_config

### 3.3 Compute Statistical Parity Difference
IAE credentials and spark job parameters will be used here. the timeout is in seconds.

update all configurations into the spark job parameters.

In [None]:
job_params = {
            "spark_settings": spark_parameters,
            "arguments": {
                "monitoring_run_id": "my_monitoring_run_id",
                "storage": storage_details,
                "tables": tables,
                "metric_configuration": spd_config
            }
        }

job_params

In [None]:
response = client.ai_metrics.compute_metrics_as_job(credentials, job_params, timeout=600)
metrics = client.ai_metrics.get_job_output(response)

### 3.4 Check Statistical Parity Difference result
The feature and its statistical parity difference value will be stored as a pair under FairnessMetricType.

In [None]:
metrics

## 4. Smoothed Empirical Differential<a name="sed"></a>

**Smoothed Empirical Differential(SED)** is a fairness metric that can be used to describe the fairness for the model predictions. It is used to quantify the differential in the probability of favorable/unfavorable outcomes between intersecting groups divided by features. All intersecting groups are equal, there is no unprivileged or privileged groups. 

SED value is the minimum ratio of Dirichlet smoothed probability of favorable and unfavorable outcomes between different intersecting groups in the dataset. Its value is between 0 and 1, excluding 0 and 1. The bigger, the better.

Take the German credit risk datasets as example, assume:

+ the favorable outcomes of label column is "No Risk",
+ the unfavorable outcomes of label column is "Risk".

if user divide dataset by *feature "Sex"*，there will be two intersecting groups:
+ intersecting group Sex="male" 
+ intersecting group Sex="female"

and assume:

+ the Dirichlet smoothed probability of favorable outcomes "No Risk" in intersecting group "Sex"="male" is 0.2
+ the Dirichlet smoothed probability of unfavorable outcomes "Risk" in intersecting group "Sex"="male" is 0.8
+ the Dirichlet smoothed probability of favorable outcomes "No Risk" in intersecting group "Sex"="female" is 0.4
+ the Dirichlet smoothed probability of unfavorable outcomes "Risk" in intersecting group "Sex"="female" is 0.6

then, calculate the label differential between intersecting groups (*Note that it always chooses the smaller one as the numerator or the bigger one as the denominator*): 

+ the favorable outcomes' differential between intersecting group "Sex"="male" and "Sex"="female" will be 0.2/0.4=0.5
+ the unfavorable outcomes' differential between intersecting group "Sex"="male" and "Sex"="female" will be 0.6/0.8=0.75

then, calculate the differential between intersecting groups:
+ the differential between intersecting group "Sex"="male" and "Sex"="female" will be min(0.5, 0.75)=0.5

Since there are only two intersecting groups, so,

+ the final differentials of dataset will be 0.5.

*References: James R. Foulds, Rashidul Islam, Kamrun Naher Keya, Shimei Pan, "An Intersectional Definition of Fairness", Department of Information Systems, University of Maryland, Baltimore County, USA*


### 4.1 Prepare input for Smoothed Empirical Differential 
update the schema and table for Smoothed Empirical Differential in spark job parameters

In [None]:
TRAIN_SCHEMA_NAME = "***"
TRAIN_TABLE_NAME = "***"

In [None]:
tables = [
    {
        "database": JDBC_DATABASE_NAME,
        "schema": TRAIN_SCHEMA_NAME,
        "table": TRAIN_TABLE_NAME,
        "type": "training"
    }
]
tables

### 4.2 Smoothed Empirical Differential Configurations 

Configure label and problem type in the overall section.
- **problem_type(str)**: `binary` and `multi-classification` is supported.
- **label_column(str)**: Column name of label in the data frame.

Inside `fairness` as below, there are three sections which is required to configure.
- **metrics_configuration(dict)**: Configure *Smoothed Empirical Differential* as one of the metrics with name `FairnessMetricType.SED.value`, and it requires a `features` property to describes which features the metric will be computed upon. *Smoothed Empirical Differential* is supported to run with individual features (eg. `[["a"],["b"]]`) and with intersectional features (eg. `[["a", "b"]]`).

- **protected_attributes(list)**: Describe protected features upon which this metric will be computed. Configure each feature with such information:
  - feature(str): Name of the feature, which should be same as configured in `features` of `metrics_configuration` section.

- **favourable_label(list)**: A list of favourable labels or outcomes of the model.

Update the Smoothed Empirical Differential configurations, which will be configured into spark job parameters

In [None]:
sed_config = {
            "problem_type":"binary",
            "label_column" : "***",
            "fairness": {
                            "metrics_configuration": {
                                FairnessMetricType.SED.value: {
                                    "features": [ ["***"] ]                                
                                }
                            },
                            "protected_attributes": [
                                {
                                    "feature": "***"
                                }
                            ],
                            "favourable_label": ["***"]
                        }
        }

sed_config

### 4.3 Compuate Smoothed Empirical Differential 
update all configurations into the spark job parameters.

In [None]:
job_params = {
            "spark_settings": spark_parameters,
            "arguments": {
                "monitoring_run_id": "my_monitoring_run_id",
                "storage": storage_details,
                "tables": tables,
                "metric_configuration": sed_config
            }
        }

job_params

In [None]:
response = client.ai_metrics.compute_metrics_as_job(credentials, job_params, timeout=600)
metrics = client.ai_metrics.get_job_output(response)

### 4.4 Check Smoothed Empirical Differential result
The features and smoothed empirical differential values will be stored as a pair under FairnessMetricType.

In [None]:
metrics

## 5. Fair Score Transformer <a name="fst"></a>

**Fair Score Transformer** can be used as post-processing technique that transforms probability estimates ( or scores) of `probabilistic binary classication` model with respect to fairness goals like statistical parity or equalized odds. To use **Fair Score Transformer** in OpenScale, you need first train a **Fair Score Transformer** and then use it to transform scores.

*References: D. Wei, K. Ramamurthy, and F. Calmon, "Optimized Score Transformation for Fair Classification", International Conference on Artificial Intelligence and Statistics, 2020.* 

### 5.1 Prepare input for Fair Score Transformer

To train a Fair Score Transformer, below columns in the dataframe will be used:</br>
***estimate column***: contains the estimate values calculated by the trained classification model.</br>
***protected attribute column***: contains the corresponding protected attributes the trained classification model uses.</br>
***label column (optional)***: contains the ground true values of the estimates column. it is not required to train the transformer but required to compute accuray with the trained transformer.</br>

In [None]:
TRAIN_SCHEMA_NAME = "***"
TRAIN_TABLE_NAME = "***"

ESTIMATE_COLUMN = "***"
PROTECTED_ATTRIBUTE_COLUMN = "***"
LABEL_COLUMN = "***"

Update these informations to spark job parameters

In [None]:
tables = [
    {
        "database": JDBC_DATABASE_NAME,
        "schema": TRAIN_SCHEMA_NAME,
        "table": TRAIN_TABLE_NAME,
        "type": "training"
    }
]

### 5.2 Fair Score Transformer Configuration

Setup configuration to fit **Fair Score Transformer**. Inside `metrics_configuration` as below, specify the name of the transformer with `FairnessMetricType.FST.value`. To configure it, you need to provide `params` and `features` information as below. This notebook will transform scores with respect to the **Statistical Parity Difference** fairness goal (set `criteria` as `MSP`).

- **params**: Parameters of Fair Score Transformer
  - epsilon (float): Bound on mean statistical parity or mean equalized odds.
  - criteria (str): Optimize for mean statistical parity ("MSP") or mean equalized odds ("MEO").
  - Aprobabilistic (bool): Indicator of whether actual protected attribute values (False) or probabilistic estimates (True) are provided. Default False.
  - iterMax (float): Maximum number of ADMM iterations. Default 1e3.
- **features**: Columns definition in the dataframe
  - probabilities: Column name of probability estimates.
  - protected: Column name of protected attributes.


In [None]:
columns = {"probabilities": ESTIMATE_COLUMN, "protected": PROTECTED_ATTRIBUTE_COLUMN}
fst_configuration = {
    "fairness": {
        "metrics_configuration": {
            FairnessMetricType.FST.value: {
                "params": {
                    "epsilon": 0.01,
                    "criteria": "MSP",
                    "Aprobabilistic": False,
                    "iterMax": 1e3
                },
                "features": columns
            }
        }     
    }
}

fst_configuration

### 5.3 Fit Fair Score Transformer
update all configurations into the spark job parameters.

In [None]:
job_params = {
            "spark_settings": spark_parameters,
            "arguments": {
                "monitoring_run_id": "my_monitoring_run_id",
                "storage": storage_details,
                "tables": tables,
                "metric_configuration": fst_configuration
            }
        }

job_params

In [None]:
fst = client.ai_metrics.fit_transformer_as_job(credentials, job_params, timeout=600)

### 5.4 Use the Trained Fair Score Transformer

#### Compute transformed estimates with Fair Score Transformer
Trained transformer can be used to compute new probability estimates and it requires the exactly same columns as fitting phase.</br> 

**Note:** No matter what column name is used for the existing probability estimates, the new probability estimates column will be named as **r_transformed**.

In [None]:
# SPARK_DF = spark.createDataFrame("***")
# probs_df = fst.predict_proba(spark, SPARK_DF, columns, keep_cols=LABEL_COLUMN)
# probs_df .show()

#### Compute new labels based on transformed estimates with Fair Score Transformer

Trained transformer can also be used to compute new class labels based on transformed probability estimates, and it requires the exactly same columns as fitting phase. 

**Note:** No matter what column name is used for the `label` column, the new class labels column will be named as **r_transformed_thresh**.

In [None]:
# SPARK_DF = spark.createDataFrame("***")
# predict_df = fst.predict(spark, SPARK_DF, columns, keep_cols=LABEL_COLUMN)
# predict_df.show()

#### Save the trained Fair Score Transformer

In [None]:
# import pickle
# pickle.dump(fst, open("fst.pkl", "wb"))

## 6. Performance measures <a name="measures"></a>


#### Confusion matrix

*Confusion matrix* is used to measure the performance of the classification model. It has a table of 4 different combinations. 

|  | Confusion Matrix |  |
| :-: | :-: | :-: |
| Actual\Predicted | Negative | Positive |
| Negative | TN | FP |
| Positive | FN | TP |

There are two things to noticed in the above image: 

    Predicted values: Values that are predicted by the model. 
    Actual Value: Values that are actually in a datasets.
    
Taking binary classification for understanding the model. Positive points belong to a positive class and Negative points to negative class. So it can be understood by these 4 points.

    True Positive(TP): Values that are actually positive and predicted positive.
    False Positive(FP): Values that are actually negative but predicted to positive.
    False Negative(FN): Values that are actually positive but predicted to negative.
    True Negative (TN): Values that are actually negative and predicted to negative.
  

#### Performance mesures

Rate is a measure factor in a confusion matrix. It has 4 basic types:

    True Positive Rate(TPR): True Positive/All Positive. The bigger, the better.

![title](https://latex.codecogs.com/svg.image?TPR&space;=&space;\frac{TP}{P}) 

    False Positive Rate(FPR): False Positive/All Negative. The smaller, the better.

![title](https://latex.codecogs.com/svg.image?FPR&space;=&space;\frac{FP}{N}) 

    False Negative Rate(FNR): False Negative/All Positive. The smaller, the better.

![title](https://latex.codecogs.com/svg.image?FNR&space;=&space;\frac{FN}{P})

    True Negative Rate(TNR): True Negative/All Negative. The bigger, the better.

![title](https://latex.codecogs.com/svg.image?TNR&space;=&space;\frac{TN}{N}) 

and 3 variant types:

    False Discovery Rate(FDR): False Positive/(True Positive+False Positive). The smaller, the better.
![title](https://latex.codecogs.com/svg.image?FDR&space;=&space;\frac{FP}{TP&plus;FP})

    False Omission Rate(FOR): False Negative/(True Negative+False Negative). The smaller, the better.
![title](https://latex.codecogs.com/svg.image?FOR&space;=&space;\frac{FN}{TN&plus;FN})

    Error Rate(ER): (False Positive+False Negative)/(All Positive + All Negative). The smaller, the better.
![title](https://latex.codecogs.com/svg.image?ER&space;=&space;\frac{FP&plus;FN}{P&plus;N}) 


#### Performance mesure metrics

Base on performance measues, there are 12 metrics to measure the model fairness:

**false_positive_rate_difference**: Returns the difference in FPR for the unprivileged and privileged groups. 
![title](https://latex.codecogs.com/svg.image?FPR_{D=unprivileged}-FPR_{D=privileged})

since the samller the FPR, the better the performance, so: 
 
    A value greater than 0 indicates the privileged group has advantage in performance. 
    A value lower than 0 indicates the unprivileged group has advantage in performance. 
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.

**false_positive_rate_ratio**: Returns the ratio of FPR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?\frac{FPR_{D=unprivileged}}{FPR_{D=privileged}})

since the samller the FPR, the better the performance, so: 
 
    A value greater than 1 indicates the privileged group has advantage in performance. 
    A value lower than 1 indicates the unprivileged group has advantage in performance. 
    A value of 1 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 1 the better.

**false_negative_rate_difference**: Returns the difference in FNR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?FDR_{D=unprivileged}-FDR_{D=privileged})

since the samller the FNR, the better the performance, so: 
 
    A value greater than 0 indicates the privileged group has advantage in performance. 
    A value lower than 0 indicates the unprivileged group has advantage in performance. 
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.

**false_negative_rate_ratio**: Returns the ratio of FNR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?\frac{FNR_{D=unprivileged}}{FNR_{D=privileged}})

since the samller the FNR, the better the performance, so: 
 
    A value greater than 1 indicates the privileged group has advantage in performance. 
    A value lower than 1 indicates the unprivileged group has advantage in performance. 
    A value of 1 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 1 the better.

**false_discovery_rate_difference**: Returns the difference in FDR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?FDR_{D=unprivileged}-FDR_{D=privileged})

since the samller the FDR, the better the performance, so: 
 
    A value greater than 0 indicates the privileged group has advantage in performance. 
    A value lower than 0 indicates the unprivileged group has advantage in performance. 
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.

**false_discovery_rate_ratio**: Returns the ratio of FDR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?\frac{FDR_{D=unprivileged}}{FDR_{D=privileged}})

since the samller the FDR, the better the performance, so: 
 
    A value greater than 1 indicates the privileged group has advantage in performance. 
    A value lower than 1 indicates the unprivileged group has advantage in performance. 
    A value of 1 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 1 the better.

**false_omission_rate_difference**: Returns the difference in FOR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?FOR_{D=unprivileged}-FOR_{D=privileged})

since the samller the FOR, the better the performance, so: 
 
    A value greater than 0 indicates the privileged group has advantage in performance. 
    A value lower than 0 indicates the unprivileged group has advantage in performance. 
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.

**false_omission_rate_ratio**: Returns the ratio of FOR for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?\frac{FOR_{D=unprivileged}}{FOR_{D=privileged}})

since the samller the FOR, the better the performance, so: 
 
    A value greater than 1 indicates the privileged group has advantage in performance. 
    A value lower than 1 indicates the unprivileged group has advantage in performance. 
    A value of 1 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 1 the better.

**error_rate_difference**: Returns the difference in ER for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?ER_{D=unprivileged}-ER_{D=privileged})

since the samller the ER, the better the performance, so: 
 
    A value greater than 0 indicates the privileged group has advantage in performance. 
    A value lower than 0 indicates the unprivileged group has advantage in performance. 
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.

**error_rate_ratio**: Returns the ratio of ER for the unprivileged and privileged groups. 

![title](https://latex.codecogs.com/svg.image?\frac{ER_{D=unprivileged}}{ER_{D=privileged}}) 

since the samller the ER, the better the performance, so: 
 
    A value greater than 1 indicates the privileged group has advantage in performance. 
    A value lower than 1 indicates the unprivileged group has advantage in performance. 
    A value of 1 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 1 the better.

**average_odds_difference**: Returns the average of the difference in FPR and TPR for the unprivileged and privileged groups.

![title](https://latex.codecogs.com/svg.image?\frac{FPR_{D=unprivileged}-FPR_{D=privileged}&plus;TPR_{D=unprivileged}-TPR_{D=privileged}}{2})

since the samller the FPR, the better the performance, but the bigger the TPR, the better the performane, so: 
 
    A value not equal to 0 indicates the unprivileged group and the privileged group have different performances.
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.

**average_abs_odds_difference**: Returns the average of the absolute difference in FPR and TPR for the unprivileged and privileged groups.

![title](https://latex.codecogs.com/svg.image?\frac{|FPR_{D=unprivileged}-FPR_{D=privileged}|&plus;|TPR_{D=unprivileged}-TPR_{D=privileged}|}{2})

since absolute value is used, it could not be a negative number, so: 
 
    A value greater than 0 indicates the unprivileged group and the privileged group have different performances.
    A value of 0 indicates the unprivileged group and the privileged group have same performance. no bias. 
    The closer to 0 the better.
  

### Configure Performance Measures

To setup configurations to compute Performance Measures, the Top level parameters:

- problem_type(enum): problem type. Possible values are "binary", "multiclass", "regression".
- label_column: the column where OpenScale to get the label value.
- fairness(dict(dict)): parameters for computing fairness metrics.

Parameters in fairness body:
- metrics_configuration(dict(dict)): metrics configuration. Format is a dict where the keys are FairnessMetricType and the values is a dict that includes metircs to calculated.
- protected_attributes: column list that will be kept and protected when processing data. 
- favourable_label: the positive outcomes in OpenScale, which could be find in label_column and predict_column.
- unfavourable_label: the negative outcomes in OpenScale, which could be find in label_column and predict_column.

Metric parameters in metrics_configuration:
- features(list): the column based on which to find out the unprivileged/privileged groups. it must be one at a time, or calculation will fail.
- predict_column: the column where OpenScale get the model prediction output.


Attributes prameters in protected_attributes:
- feature: feature column that need to be protect.
- reference_group: privileged groups at the systematic advantage, which could be find in feature column.
- monitored_group: unprivileged groups at the systematic advantage, which could be find in feature column.

### 6.1 Prepare input to compute Performance Measures
The quality monitor stores metadata in the training table.

- TRAIN_SCHEMA_NAME: _Schema name where training table is present_
- TRAIN_TABLE_NAME: _Name of the training table_

In [None]:
TRAIN_SCHEMA_NAME = "******"
TRAIN_TABLE_NAME = "******"

In [None]:
tables = [
    {
        "database": JDBC_DATABASE_NAME,
        "schema": TRAIN_SCHEMA_NAME,
        "table": TRAIN_TABLE_NAME,
        "type": "training"
    }
]
tables

### 6.2 Configure metrics of Performance measures

In [None]:
from ibm_metrics_plugin.common.utils.constants import FairnessMetricType, MetricGroupType

measure_config = {
            "problem_type": "binary",
            "label_column": "***",
            "fairness": {
                "metrics_configuration": {
                    FairnessMetricType.MEASURES.value: {
                        "average_odds_difference": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "average_abs_odds_difference": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "false_negative_rate_difference": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "false_negative_rate_ratio": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "false_positive_rate_difference": {
                            "features": [["***"]],  
                            "predict_column": "***"
                        },
                        "false_positive_rate_ratio": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "false_discovery_rate_ratio": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "false_discovery_rate_difference": {
                            "features": [["***"]],  
                            "predict_column": "***"
                        },
                        "false_omission_rate_difference": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "false_omission_rate_ratio": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "error_rate_difference": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        },
                        "error_rate_ratio": {
                            "features": [["***"], ["***"]],
                            "predict_column": "***"
                        }
                    }
                },
                "protected_attributes": [
                    {
                        "feature": "***",
                        "reference_group": ["***"],
                        "monitored_group": ["***"]
                    },
                    {
                        "feature": "***",
                        "reference_group": ***,
                        "monitored_group": ***
                    }
                ],
                "favourable_label": ["***"],
                "unfavourable_label": ["***"]
            }
}

measure_config

### 6.3 Compute perforamnce measures
update all configurations into the spark job parameters.

In [None]:
job_params = {
            "spark_settings": spark_parameters,
            "arguments": {
                "monitoring_run_id": "my_monitoring_run_id",
                "storage": storage_details,
                "tables": tables,
                "metric_configuration": measure_config
            }
        }

job_params

In [None]:
response = client.ai_metrics.compute_metrics_as_job(credentials, job_params)
metrics = client.ai_metrics.get_job_output(response)

### 6.4 Check Performance Measures result
The features and performance measures values will be stored as a pair under FairnessMetricType.

In [None]:
metrics

## 7. Multi-Dimensinal Subset Scan <a name="mdss"></a>

Multi-Dimensional Subset Scan defines a general bias scan method to detect and identify which subgroup(s) of features have statistically signicant predictive bias for a probabilistic binary classier.

References: Zhang, Z., & Neill, D. B. (2016). Identifying significant predictive bias in classifiers. arXiv preprint arXiv:1611.08292.

### 7.1 Prepare input for Multi-Dimensinal Subset Scan
The quality monitor stores metadata in the training table.

- TRAIN_SCHEMA_NAME: _Schema name where training table is present_
- TRAIN_TABLE_NAME: _Name of the training table_

In [None]:
TRAIN_SCHEMA_NAME = "***"
TRAIN_TABLE_NAME = "***"

In [None]:
tables = [
    {
        "database": JDBC_DATABASE_NAME,
        "schema": TRAIN_SCHEMA_NAME,
        "table": TRAIN_TABLE_NAME,
        "type": "training"
    }
]
tables

### 7.2  Multi-Dimensinal Subset Scan Configurations

Setup configuration to compute *Multi-Dimensinal Subset Scan*,<br/>

#### Multi-Dimensinal Subset Scan parameters:

Configure label in the overall section.
- **label_column(str)**: Column name of label in the data frame

Inside `fairness` as below, there are three sections which is required to configure.
- **metrics_configuration(dict)**: Configure *Multi-Dimensinal Subset Scan* as one of the metrics with name `FairnessMetricType.MDSS.value`, and it requires five properties, all of them have default values.   
`expectation_column` describe which column is the expectation column. By default use the mean label column of the entire population.      
`penalty` describe penalty coefficient. By default is 1.   
`num_iters` describe number of iteration. By default is 1.     
`direction` describe the desired direction (positive or negative). By default is "positive".           
`scoreFunction` describe which scoring function can be used to score subset of records. By default is "Bernoulli". For scoring function "BerkJones", it has to be binary classification where the expectations is constant across all data records.   
`alpha` the alpha threshold that will be used to compute the score when scoreFunction is Bernoulli. By default is 0.2.   

- **favourable_label(list)**: A list of favourable labels or outcomes of the model.

These informations will be configured into metric_configuration session of spark job parameters

In [None]:
from ibm_metrics_plugin.common.utils.constants import FairnessMetricType, MetricGroupType

mdss_config = {
    "label_column": "***",
    "fairness": {
        "metrics_configuration": {
            FairnessMetricType.MDSS.value: {
                "expectation_column": "SCORES",
                "penalty": 1,
                "num_iters": 1,
                "direction": "negative",
                "scoreFunction": "Bernoulli",
                "alpha": 0.4
            }
        },
        "favourable_label": [1]
    }
}

mdss_config

### 7.3 Perform Multi-Dimensinal Subset Scan
IAE credentials and spark job parameters will be used here.

In [None]:
job_params = {
            "spark_settings": spark_parameters,
            "arguments": {
                "monitoring_run_id": "my_monitoring_run_id",
                "storage": storage_details,
                "tables": tables,
                "metric_configuration": mdss_config
            }
        }

job_params

In [None]:
response = client.ai_metrics.compute_metrics_as_job(credentials, job_params, timeout=600)
metrics = client.ai_metrics.get_job_output(response)

### 7.4 Check Multi-Dimensinal Subset Scan result
The subset and scores for Multi-Dimensinal Subset Scan will be stored as a pair under FairnessMetricType.

In [None]:
metrics