<a href="https://colab.research.google.com/github/Sundragon1993/tensorflow_advanced/blob/main/Fairness_Exercise_2_Remediate_Bias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fairness Exercise 2: Remediate Bias

**Learning Objectives:**
* Remediate subgroup bias in the toxic text classifier by upweighting negative examples.
* Re-evaluate the revised model to confirm successful remediation using Fairness Indicators and the What-If tool

**Prerequisites**

This exercise builds on [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab). It is strongly recommended that you complete **Fairness Exercise 1** prior to working through this exercise.

## Overview

In [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), you trained a toxicity classifier on the Civil Comments dataset and used Fairness Indicators to identify some unintended bias issues related to gender. In this exercise, you'll apply remediation techniques and retrain the model to mitigate this bias. You'll then use Fairness Indicators and the What-If tool to evaluate the results and confirm that the remediation efforts were successful.

## Setup

First, run the cell below to install Fairness Indicators. 

**NOTE:** You **MUST RESTART** the Colab runtime after doing this installation, either by clicking the **RESTART RUNTIME** button at the bottom of this cell or by selecting **Runtime->Restart runtime...** from the menu bar above.

In [None]:
!pip install fairness-indicators \
  "absl-py==0.8.0" \
  "pyarrow==0.15.1" \
  "apache-beam==2.17.0" \
  "avro-python3==1.9.1" \
  "tfx-bsl==0.21.4" \
  "tensorflow-data-validation==0.21.5"

Collecting fairness-indicators
  Downloading https://files.pythonhosted.org/packages/69/0a/402f2654951250b638b3a75eff7c5af2cae0bd33583a051d9f13205627cf/fairness_indicators-0.26.0-py3-none-any.whl
Collecting absl-py==0.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/3c/0d/7cbf64cac3f93617a2b6b079c0182e4a83a3e7a8964d3b0cc3d9758ba002/absl-py-0.8.0.tar.gz (102kB)
[K     |████████████████████████████████| 112kB 7.0MB/s 
[?25hCollecting pyarrow==0.15.1
[?25l  Downloading https://files.pythonhosted.org/packages/6c/32/ce1926f05679ea5448fd3b98fbd9419d8c7a65f87d1a12ee5fb9577e3a8e/pyarrow-0.15.1-cp36-cp36m-manylinux2010_x86_64.whl (59.2MB)
[K     |████████████████████████████████| 59.2MB 76kB/s 
[?25hCollecting apache-beam==2.17.0
[?25l  Downloading https://files.pythonhosted.org/packages/46/80/b561617b7820c5607ed96e624e0b380dc613dcde70d5f39bc30c4345f5c0/apache_beam-2.17.0-cp36-cp36m-manylinux1_x86_64.whl (3.0MB)
[K     |████████████████████████████████| 3.0MB 37.1MB/s 
[?

Next, import all the dependencies we'll use in this exercise, which include Fairness Indicators, TensorFlow Model Analysis (tfma), and the What-If tool (WIT):

In [None]:
%tensorflow_version 2.x
import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

Run the following code to download and import the training and validation datasets. By default, the following code will load the preprocessed data (see [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab) for more details). If you prefer, you can enable the `download_original_data` checkbox at right to download the original dataset and preprocess it as described in the previous section (this may take 5-10 minutes).

In [None]:
download_original_data = False #@param {type:"boolean"}

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
  # text column and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')

Downloading data from https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord
Downloading data from https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord


Next, train the original model from [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), which we'll use as the baseline model for this exercise:

In [None]:
#@title Run this cell to train the baseline model from Exercise 1
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'

FEATURE_MAP = {
    # Label:
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    # Text:
    TEXT_FEATURE:  tf.io.FixedLenFeature([], tf.string),

    # Identities:
    'sexual_orientation':tf.io.VarLenFeature(tf.string),
    'gender':tf.io.VarLenFeature(tf.string),
    'religion':tf.io.VarLenFeature(tf.string),
    'race':tf.io.VarLenFeature(tf.string),
    'disability':tf.io.VarLenFeature(tf.string),
}

def train_input_fn():
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  train_dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).map(parse_function).batch(512)
  return train_dataset

BASE_DIR = tempfile.gettempdir()

model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))

embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    optimizer=tf.optimizers.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    n_classes=2,
    model_dir=model_dir)

classifier.train(input_fn=train_input_fn, steps=1000)

INFO:tensorflow:Using default config.


INFO:tensorflow:Using default config.


INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210117-184441', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210117-184441', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...


INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210117-184441/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210117-184441/model.ckpt.


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...


INFO:tensorflow:loss = 58.808598, step = 0


INFO:tensorflow:loss = 58.808598, step = 0


INFO:tensorflow:global_step/sec: 25.3466


INFO:tensorflow:global_step/sec: 25.3466


INFO:tensorflow:loss = 56.941917, step = 100 (3.950 sec)


INFO:tensorflow:loss = 56.941917, step = 100 (3.950 sec)


INFO:tensorflow:global_step/sec: 28.3964


INFO:tensorflow:global_step/sec: 28.3964


INFO:tensorflow:loss = 47.683044, step = 200 (3.525 sec)


INFO:tensorflow:loss = 47.683044, step = 200 (3.525 sec)


INFO:tensorflow:global_step/sec: 30.0361


INFO:tensorflow:global_step/sec: 30.0361


INFO:tensorflow:loss = 55.866592, step = 300 (3.329 sec)


INFO:tensorflow:loss = 55.866592, step = 300 (3.329 sec)


INFO:tensorflow:global_step/sec: 28.2857


INFO:tensorflow:global_step/sec: 28.2857


INFO:tensorflow:loss = 55.982487, step = 400 (3.535 sec)


INFO:tensorflow:loss = 55.982487, step = 400 (3.535 sec)


INFO:tensorflow:global_step/sec: 29.3726


INFO:tensorflow:global_step/sec: 29.3726


INFO:tensorflow:loss = 41.726414, step = 500 (3.404 sec)


INFO:tensorflow:loss = 41.726414, step = 500 (3.404 sec)


INFO:tensorflow:global_step/sec: 28.7573


INFO:tensorflow:global_step/sec: 28.7573


INFO:tensorflow:loss = 45.51366, step = 600 (3.473 sec)


INFO:tensorflow:loss = 45.51366, step = 600 (3.473 sec)


INFO:tensorflow:global_step/sec: 28.9493


INFO:tensorflow:global_step/sec: 28.9493


INFO:tensorflow:loss = 51.450493, step = 700 (3.454 sec)


INFO:tensorflow:loss = 51.450493, step = 700 (3.454 sec)


INFO:tensorflow:global_step/sec: 29.5232


INFO:tensorflow:global_step/sec: 29.5232


INFO:tensorflow:loss = 47.50316, step = 800 (3.394 sec)


INFO:tensorflow:loss = 47.50316, step = 800 (3.394 sec)


INFO:tensorflow:global_step/sec: 28.4508


INFO:tensorflow:global_step/sec: 28.4508


INFO:tensorflow:loss = 47.661552, step = 900 (3.513 sec)


INFO:tensorflow:loss = 47.661552, step = 900 (3.513 sec)


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...


INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210117-184441/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210117-184441/model.ckpt.


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...


INFO:tensorflow:Loss for final step: 51.000168.


INFO:tensorflow:Loss for final step: 51.000168.


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7fd5462ad160>

In the next section, we'll apply bias-remediation techniques on our data and then train a revised model on the updated data.

## Remediate Bias

To remediate bias in our model, we'll first need to define the remediation metrics we'll use to gauge success and choose an appropriate remediation technique. Then we'll retrain the model using the technique we've selected.

### Define the remediation metrics

Before we can apply bias-remediation techniques to our model, we first need to define what successful remediation looks like in the context of our particular problem. As we saw in [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), there are often tradeoffs that come into play when optimizing a model (for example, adjustments that decrease false positives may increase false negatives), so we need to choose the evaluation metrics that best align with our priorities.

For our toxicity classifier, we've identified that our primary concern is ensuring that gender-related comments are not disproportionately misclassified as toxic, which could result in constructive discourse being suppressed. So here, we will define successful remediation as a **decrease in the FPR (false-positive rate) for gender subgroups relative to the overall FPR**.

### Choose a remediation technique

To mitigate false-positive rate for gender subgroups, we want to help the model "unlearn" any false correlations it's learned between gender-related terminology and toxicity. We've determined that this false correlation likely stems from an insufficient number of training examples in which gender terminology was used in nontoxic contexts. 

One excellent way to remediate this issue would be to add more nontoxic examples to each gender subgroup to balance out the dataset, and then retrain on the amended data. However, we've already trained on all the data we have, so what can we do? This is a common problem ML engineers face. Collecting additional data can be costly, resource-intensive, and time-consuming, and as a result, it may just not be feasible in certain circumstances.

One alternative solution is to simulate additional data by *upweighting* the existing examples in the disproportionately underrepresented group (increasing the loss penalty for errors for these examples) so they carry more weight and are not as easily overwhelmed by the rest of the data.

Let's update the input fuction of our model to implement upweighting for nontoxic examples belonging to one or more gender subgroups. In the `UPDATES FOR UPWEIGHTING` section of the code below, we've increased the `weight` values for nontoxic examples that contain a `gender` value of `transgender`, `female`, or `male`:

In [None]:
def train_input_fn_with_remediation():
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
  
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
  
    # BEGIN UPDATES FOR UPWEIGHTING
    # Up-weighting non-toxic examples to balance toxic and non-toxic examples
    # for gender slice.
    #
    values = parsed_example['gender'].values
    # 'toxicity' label zero represents the example is non-toxic.
    if tf.equal(parsed_example[LABEL], 0):
      # We tuned the upweighting hyperparameters, and found we got good 
      # results by setting `weight`s of 0.4 for `transgender`, 
      # 0.5 for `female`, and 0.7 for `male`.
      # NOTE: `other_gender` is not upweighted separately, because all examples
      # tagged with `other_gender` were also tagged with one of the other
      # values below
      if tf.greater(tf.math.count_nonzero(tf.equal(values, 'transgender')), 0):
        parsed_example['weight'] = tf.constant(0.4)
      if tf.greater(tf.math.count_nonzero(tf.equal(values, 'female')), 0):
        parsed_example['weight'] = tf.constant(0.5)
      if tf.greater(tf.math.count_nonzero(tf.equal(values, 'male')), 0):
        parsed_example['weight'] = tf.constant(0.7)
        
    return (parsed_example,
            parsed_example[LABEL])
  # END UPDATES FOR UPWEIGHTING

  train_dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).map(parse_function).batch(512)
  return train_dataset

### Retrain the model

Now, let's retrain the model with our upweighted examples:

In [None]:
BASE_DIR = tempfile.gettempdir()
  
model_dir_with_remediation = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))

embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier_with_remediation = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    n_classes=2,
    optimizer=tf.optimizers.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    model_dir=model_dir_with_remediation)

classifier_with_remediation.train(input_fn=train_input_fn_with_remediation, steps=1000)

INFO:tensorflow:Using default config.


INFO:tensorflow:Using default config.


INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210117-184528', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210117-184528', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...


INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210117-184528/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210117-184528/model.ckpt.


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...


INFO:tensorflow:loss = 64.947586, step = 0


INFO:tensorflow:loss = 64.947586, step = 0


INFO:tensorflow:global_step/sec: 14.244


INFO:tensorflow:global_step/sec: 14.244


INFO:tensorflow:loss = 67.70737, step = 100 (7.032 sec)


INFO:tensorflow:loss = 67.70737, step = 100 (7.032 sec)


INFO:tensorflow:global_step/sec: 15.6842


INFO:tensorflow:global_step/sec: 15.6842


INFO:tensorflow:loss = 51.013687, step = 200 (6.383 sec)


INFO:tensorflow:loss = 51.013687, step = 200 (6.383 sec)


INFO:tensorflow:global_step/sec: 16.3235


INFO:tensorflow:global_step/sec: 16.3235


INFO:tensorflow:loss = 62.358994, step = 300 (6.113 sec)


INFO:tensorflow:loss = 62.358994, step = 300 (6.113 sec)


INFO:tensorflow:global_step/sec: 15.7264


INFO:tensorflow:global_step/sec: 15.7264


INFO:tensorflow:loss = 59.535213, step = 400 (6.358 sec)


INFO:tensorflow:loss = 59.535213, step = 400 (6.358 sec)


INFO:tensorflow:global_step/sec: 16.1847


INFO:tensorflow:global_step/sec: 16.1847


INFO:tensorflow:loss = 46.94717, step = 500 (6.176 sec)


INFO:tensorflow:loss = 46.94717, step = 500 (6.176 sec)


INFO:tensorflow:global_step/sec: 16.6326


INFO:tensorflow:global_step/sec: 16.6326


INFO:tensorflow:loss = 49.001816, step = 600 (6.012 sec)


INFO:tensorflow:loss = 49.001816, step = 600 (6.012 sec)


INFO:tensorflow:global_step/sec: 15.4248


INFO:tensorflow:global_step/sec: 15.4248


INFO:tensorflow:loss = 58.11805, step = 700 (6.483 sec)


INFO:tensorflow:loss = 58.11805, step = 700 (6.483 sec)


INFO:tensorflow:global_step/sec: 16.3842


INFO:tensorflow:global_step/sec: 16.3842


INFO:tensorflow:loss = 53.743984, step = 800 (6.103 sec)


INFO:tensorflow:loss = 53.743984, step = 800 (6.103 sec)


INFO:tensorflow:global_step/sec: 15.3171


INFO:tensorflow:global_step/sec: 15.3171


INFO:tensorflow:loss = 53.192184, step = 900 (6.529 sec)


INFO:tensorflow:loss = 53.192184, step = 900 (6.529 sec)


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...


INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210117-184528/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210117-184528/model.ckpt.


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...


INFO:tensorflow:Loss for final step: 57.424736.


INFO:tensorflow:Loss for final step: 57.424736.


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7fd54639aa58>

## Recompute fairness metrics

Now that we've retrained the model, let's recompute our fairness metrics. First, export the model:

In [None]:
def eval_input_receiver_fn():
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')

  receiver_tensors = {'examples': serialized_tf_example}

  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])

  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

tfma_export_dir_with_remediation = tfma.export.export_eval_savedmodel(
  estimator=classifier_with_remediation,
  export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model_with_remediation'),
  eval_input_receiver_fn=eval_input_receiver_fn)

Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Signatures INCLUDED in export for Classify: None


INFO:tensorflow:Signatures INCLUDED in export for Classify: None


INFO:tensorflow:Signatures INCLUDED in export for Regress: None


INFO:tensorflow:Signatures INCLUDED in export for Regress: None


INFO:tensorflow:Signatures INCLUDED in export for Predict: None


INFO:tensorflow:Signatures INCLUDED in export for Predict: None


INFO:tensorflow:Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']


INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']






INFO:tensorflow:Restoring parameters from /tmp/train/20210117-184528/model.ckpt-1000


INFO:tensorflow:Restoring parameters from /tmp/train/20210117-184528/model.ckpt-1000


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets written to: /tmp/tfma_eval_model_with_remediation/temp-1610909343/assets


INFO:tensorflow:Assets written to: /tmp/tfma_eval_model_with_remediation/temp-1610909343/assets


INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model_with_remediation/temp-1610909343/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model_with_remediation/temp-1610909343/saved_model.pb


Next, run the fairness evaluation using TFMA:

In [None]:
tfma_eval_result_path_with_remediation = os.path.join(BASE_DIR, 'tfma_eval_result_with_remediation')

slice_selection = 'gender'
compute_confidence_intervals = False

# Define slices that you want the evaluation to run on.
slice_spec = [
    tfma.slicer.SingleSliceSpec(), # Overall slice
    tfma.slicer.SingleSliceSpec(columns=['gender']),
]

# Add the fairness metrics.
add_metrics_callbacks = [
  tfma.post_export_metrics.fairness_indicators(
      thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],
      labels_key=LABEL
      )
]

eval_shared_model_with_remediation = tfma.default_eval_shared_model(
    eval_saved_model_path=tfma_export_dir_with_remediation,
    add_metrics_callbacks=add_metrics_callbacks)

validate_dataset = tf.data.TFRecordDataset(filenames=[validate_tf_file])

# Run the fairness evaluation.
with beam.Pipeline() as pipeline:
  _ = (
      pipeline
      | 'ReadData' >> beam.io.ReadFromTFRecord(validate_tf_file)
      | 'ExtractEvaluateAndWriteResults' >>
       tfma.ExtractEvaluateAndWriteResults(
                 eval_shared_model=eval_shared_model_with_remediation,
                 slice_spec=slice_spec,
                 compute_confidence_intervals=compute_confidence_intervals,
                 output_path=tfma_eval_result_path_with_remediation)
  )

eval_result_with_remediation = tfma.load_eval_result(output_path=tfma_eval_result_path_with_remediation)

Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model_with_remediation/1610909343/variables/variables


INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model_with_remediation/1610909343/variables/variables


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.


Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


## Load evaluation results

Run the following two cells to load results in the What-If tool and Fairness Indicators. 

In the What-If tool, we'll load 1,000 examples with the corresponding predictions returned from both the baseline model and the remediated model.

#### **WARNING: When you launch the What-If tool widget below, the left panel will display the full text of individual comments from the Civil Comments dataset. Some of these comments include profanity, offensive statements, and offensive statements involving identity terms. If this is a concern, run the Alternative cell at the end of this section instead of the two code cells below, and skip question #4 in the following [Exercise](#scrollTo=QrecUFHfBCyC).**

In [None]:
DEFAULT_MAX_EXAMPLES = 1000

# Load 100000 examples in memory. When first rendered, What-If Tool only
# displays 1000 of these examples to ensure data loads successfully for most
# browser/machine configurations. 
def wit_dataset(file, num_examples=100000):
  dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).take(num_examples)
  return [tf.train.Example.FromString(d.numpy()) for d in dataset]

wit_data = wit_dataset(train_tf_file)

# Configure WIT with 1000 examples, the FEATURE_MAP we defined above, and
# a label of 1 for positive (toxic) examples and 0 for negative (nontoxic)
# examples
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
    classifier, FEATURE_MAP).set_compare_estimator_and_feature_spec(
    classifier_with_remediation, FEATURE_MAP).set_label_vocab(['0', '1']).set_target_feature(LABEL)
wit = WitWidget(config_builder)

Feature,"[[displayAttributionHeader(models, index)]]"


In Fairness Indicators, we'll display the remediated model's evaluation results on the validation set.

In [None]:
# Link Fairness Indicators widget with WIT widget above,
# so that clicking a slice in FI below will load its data in WIT above.
event_handlers={'slice-selected':
              wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result_with_remediation,
                                      slicing_column=slice_selection,
                                      event_handlers=event_handlers)

In [None]:
#@title Alternative: Run this cell only if you intend to skip the What-If tool exercises (see Warning above)
# Link Fairness Indicators widget with WIT widget above,
# so that clicking a slice in FI below will load its data in WIT above.
widget_view.render_fairness_indicator(eval_result=eval_result_with_remediation,
                                      slicing_column=slice_selection)

## Exercise: Analyze the results

Use the What-If Tool and Fairness Indicators widgets above to answer the following questions.

#### **1. In [Fairness Exercise 1: Explore the Model](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), our baseline model had an FPR of 0.28 overall and FPRs of 0.51 and 0.47 for `male` and `female` examples, respectively. In our revised model, what are the FPRs for `male` and `female` subgroups? How do these values compare to the overall FPR?** 

#### Solution

Click below for the solution.

When we evaluated our model against the validation set, we got an FPR of 0.28 for `male` and 0.24 for `female`. The overall FPR was 0.23.

![FPR results for the revised model displayed in Fairness Indicators. The "male" and "female" FPR values in the table are circled, showing an FPR of 0.28 for "male" and 0.24 for "female", and an overall FPR of 0.23.](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/fairness_indicators_colab2_exercise1.png)

The FPR for `male` is now approximately 20% higher than the overall rate, and the FPR for `female` is now approximately 5% lower than the overall rate. This is a significant improvement over our previous model, where the FPRs for `male` and `female` were +83% and +69% higher, respectively, than the overall FPR. 

**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours.*

#### **2. What other metrics should we audit to confirm gender subgroup biases have been successfully remediated? What are the results on these metrics?**

#### Solution

Click below for the solution.

We should also review FNR. 

A model optimized solely to decrease FPR could learn to always predict the negative class ("nontoxic"), which would result in a FPR of 0. However, this would cause the FNR to skyrocket because every actual positive ("toxic") example would be misclassified and a false negative. 

While our primary metric for evaluating remediation is FPR, we still want to make sure we're OK with any tradeoff in increased FNR that we incur to decrease FPR.

If we take a look at FNR results for the revised model, we see that the overall FNR is 0.34, `male` FNR is 1% lower at 0.33, and `female` FNR is 12% higher at 0.38. So we can confirm that our subgroup FNRs are not dramatically higher than overall FNR, and overall FNR itself is not sky-high.

**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours.*

![False negative rate results for gender subgroups displayed in the Fairness Indicators widget. Overall FNR is 0.33, "male" FNR is 1.08485% lower at 0.33358, and "female" FNR is 11.86052% higher at 0.38](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/fairness_indicators_colab2_exercise2.png)

#### **3. Do you see any areas where further improvement is needed?** 

#### Solution

Click below for one possible solution.

If we hover over the `other_gender` slice, as shown above, we see that there are only 6 examples in this slice. This is an extremely small number of examples in comparison to the `male` and `female` groups, which each have over 15,000 examples. 

![FNR results for gender subgroups displayed in the Fairness Indicators widget. A pop-up is displayed above the "other gender" slice, which shows an Example Count of 6 for this subgroup.](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/fairness_indicators_colab2_exercise3.png)

**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours shown above.*

With an `other_gender` slice this small, we can't make any statistically significant assertions about the model's performance on this subgroup (changing the classification of just one example would cause a swing of 16.6% in FNR or FPR). Upweighting (**toxic samples**) is not sufficient here; we're going to need to add more examples to the `other_gender` subgroup that the model can learn from.

#### **4. Compare the performance of the baseline model and the revised model on the `female` subgroup as follows:**

Click on the bar of the _female_ slice in the Fairness Indicators widget to load the corresponding individual female examples in the What-If Tool widget above. Create a scatterplot that plots toxicity scores for the baseline model (**Inference Score 1**) against toxicity scores for the revised model (**Inference Score 2**), with each example color-coded by ground-truth label (**toxicity**).

#### **What trends can you identify from this graph?**

#### Solution

Click below for a solution.

Here's our graph, with toxicity scores for the baseline model plotted along the x-axis, and toxicity scores for the revised model plotted along the y-axis. Actual toxic examples are colored red, and actual nontoxic examples are colored blue.

**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours.*

![Scatterplot in the What-If tool, plotting toxicity score of the baseline model along the x-axis ("Scatter | X-Axis" set to "Inference score 1") and toxicity score of the revised model along the y-axis ("Scatter | X-Axis" set to "Inference score 2", with "Color By" set to "toxicity" so that examples are color-coded by their actual toxicity labels. The relationship between the two scores is generally linear, with a few clusters of negative-example outliers circled where toxicity score is significantly lower for the revised model.](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/wit_colab2_exercise4.png)

The relationship between the two scores is generally linear, but we can see a few clusters of blue outliers (circled above) where the revised model predicts a significantly lower toxicity score than the baseline model. We can extrapolate that the revised model does a better job of predicting low toxicity scores for a percentage of nontoxic `female` examples (though there's still room for further improvement).