### What-If Tool toxicity text model comparison

Copyright 2019 Google LLC.
SPDX-License-Identifier: Apache-2.0

This notebook shows use of the [What-If Tool](https://pair-code.github.io/what-if-tool) to compare two text models that determine sentence toxicity, one of which has had some debiasing performed during training.

This notebook loads two pretrained toxicity models from [ConversationAI](https://github.com/conversationai/unintended-ml-bias-analysis) and compares them on the [wikipedia comments dataset](https://figshare.com/articles/Wikipedia_Talk_Labels_Toxicity/4563973).

This notebook also shows how the What-If Tool can be used on non-TensorFlow models. In this case, these models are keras models that do not use tensorflow Examples as an input format. By writing a simple wrapper for WitWidget, these models can be analyzed in the What-If Tool.

##WARNING: Some text examples in this notebook include profanity, offensive statments, and offensive statments involving identity terms. Please feel free to avoid using this notebook.


In [0]:
#@title Install latest TensorFlow and the What-If Tool widget if running in colab {display-mode: "form"}

# If running in colab then pip install, otherwise no need.
try:
  import google.colab
  !pip install --upgrade tf-nightly witwidget
except Exception:
  pass

In [0]:
#@title Download the pretrained keras model files
!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/cnn_wiki_tox_v3_model.h5 -o ./cnn_wiki_tox_v3_model.h5
!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/cnn_wiki_tox_v3_hparams.h5 -o ./cnn_wiki_tox_v3_hparams.h5
!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/cnn_wiki_tox_v3_tokenizer.pkl -o ./cnn_wiki_tox_v3_tokenizer.pkl

!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/cnn_debias_tox_v3_model.h5 -o ./cnn_debias_tox_v3_model.h5
!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/cnn_debias_tox_v3_hparams.h5 -o ./cnn_debias_tox_v3_hparams.h5
!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/cnn_debias_tox_v3_tokenizer.pkl -o ./cnn_debias_tox_v3_tokenizer.pkl

!curl -L https://storage.googleapis.com/what-if-tool-resources/computefest2019/wiki_test.csv -o ./wiki_test.csv

In [0]:
#@title Load the keras models
from keras.models import load_model
import cPickle as pkl

model1 = load_model('cnn_wiki_tox_v3_model.h5')
with open('cnn_wiki_tox_v3_tokenizer.pkl', 'rb') as f:
  tokenizer1 = pkl.load(f)
tokenizer1.oov_token = None # quick fix for version issues

model2 = load_model('cnn_debias_tox_v3_model.h5')
with open('cnn_debias_tox_v3_tokenizer.pkl', 'rb') as f:
  tokenizer2 = pkl.load(f)
tokenizer2.oov_token = None # quick fix for version issues

In [0]:
#@title Extend the WitWidget class so that it infers using keras models
# For this demo, we do not implement infer_mutants since there is no 
# input features except for a single string
import numpy as np
import tensorflow as tf
import pandas as pd
import witwidget.notebook.visualization as visualization
import tensorboard.plugins.interactive_inference.utils.common_utils as common_utils
import tensorboard.plugins.interactive_inference.utils.inference_utils as inference_utils
from keras.preprocessing.sequence import pad_sequences
PADDING_LEN = 250

def prep_texts(texts, tokenizer):
  text_sequences = tokenizer.texts_to_sequences(texts)
  return pad_sequences(text_sequences, maxlen=PADDING_LEN)

def examples_to_model_in(examples, tokenizer):
  texts = [ex.features.feature['comment'].bytes_list.value[0] for ex in examples]
  model_ins = prep_texts(texts, tokenizer)
  return model_ins

def wrap_model_predictions(preds, serving_bundle):
  preds = common_utils.convert_prediction_values(preds, serving_bundle)
  preds = inference_utils.wrap_inference_results(preds)
  preds = inference_utils.json_format.MessageToJson(
    preds, including_default_value_fields=True)
  preds = visualization.json.loads(preds)
  return preds

WitWidget = visualization.WitWidget
WitConfigBuilder = visualization.WitConfigBuilder

class WitWidgetKeras(WitWidget):
  """ Overwrites infer method such that WIT works with keras toxicity models
  """

  def __init__(self, config_builder, height=1000):
    super(WitWidgetKeras, self).__init__(config_builder, height=height)
    # create a dummy serving bundle
    self.serving_bundle = inference_utils.ServingBundle(
      self.config.get('inference_address'),
      self.config.get('model_name'),
      self.config.get('model_type'),
      self.config.get('model_version'),
      self.config.get('model_signature'),
      self.config.get('uses_predict_api'),
      self.config.get('predict_input_tensor'),
      self.config.get('predict_output_tensor'),
      self.estimator_and_spec.get('estimator'),
      self.estimator_and_spec.get('feature_spec'))
  
  def infer(self):
    indices_to_infer = sorted(self.updated_example_indices)
    examples_to_infer = [
        self.json_to_proto(self.examples[index]) for index in indices_to_infer]
    infer_objs = []
    model_ins = examples_to_model_in(examples_to_infer, tokenizer1)
    preds = model1.predict(model_ins)
    infer_objs.append(wrap_model_predictions(preds, self.serving_bundle))
    model_ins = examples_to_model_in(examples_to_infer, tokenizer2)
    preds = model2.predict(model_ins)
    infer_objs.append(wrap_model_predictions(preds, self.serving_bundle))
    self.updated_example_indices = set()
    inferences = {
      'inferences': {'indices': indices_to_infer, 'results': infer_objs},
      'label_vocab': self.config.get('label_vocab')}
    visualization.output.eval_js("""inferenceCallback('{inferences}')""".format(
          inferences=visualization.json.dumps(inferences)))
    
  def infer_mutants(self, info):
    visualization.output.eval_js("""inferMutantsCallback('{json_mapping}')""".format(
      json_mapping=visualization.json.dumps([])))

# Converts a dataframe into a list of tf.Example protos.
def df_to_examples(df, columns=None):
  examples = []
  if columns == None:
    columns = df.columns.values.tolist()
  for index, row in df.iterrows():
    example = tf.train.Example()
    for col in columns:
      if df[col].dtype is np.dtype(np.int64):
        example.features.feature[col].int64_list.value.append(int(row[col]))
      elif df[col].dtype is np.dtype(np.float64):
        example.features.feature[col].float_list.value.append(row[col])
      elif row[col] == row[col]:
        example.features.feature[col].bytes_list.value.append(row[col].encode('utf-8'))
    examples.append(example)
  return examples

# Converts a dataframe column into a column of 0's and 1's based on the provided test.
# Used to force label columns to be numeric for binary classification using a TF estimator.
def make_label_column_numeric(df, label_column, test):
  df[label_column] = np.where(test(df[label_column]), 1, 0)

In [0]:
#@title Read the dataset from CSV and process it for model {display-mode: "form"}

# Set the path to the CSV containing the dataset to train on.
csv_path = 'wiki_test.csv'

# Set the column names for the columns in the CSV. If the CSV's first line is a header line containing
# the column names, then set this to None.
csv_columns = None

# Read the dataset from the provided CSV and print out information about it.
df = pd.read_csv(csv_path, names=csv_columns, skipinitialspace=True)
df = df[['is_toxic', 'comment']]

# Remove non ascii characters
comments = df['comment'].values
proc_comments = []
for c in comments:
  try:
    proc_comments.append(c.decode('unicode_escape').encode('ascii', 'ignore').strip())
  except:
    proc_comments.append('')

df = df.assign(comment=proc_comments)

label_column = 'is_toxic'
make_label_column_numeric(df, label_column, lambda val: val)

examples = df_to_examples(df)

To better see comment text in the What-If Tool, on the "Datapoint Editor" tab, click the button pointed to by the red arrow in the image below.
![](https://drive.google.com/uc?id=1Qq7mkhDe4TxM__UCAwF9B2ECzZCXmHwx)


In [0]:
#@title Invoke What-If Tool for the data and two models {display-mode: "form"}

num_datapoints = 3000  #@param {type: "number"}
tool_height_in_px = 700  #@param {type: "number"}

# Setup the tool with the test examples and the trained classifier
config_builder = WitConfigBuilder(examples[:num_datapoints])
# Need to call this so we have inference_address and model_name initialized
config_builder = config_builder.set_estimator_and_feature_spec('', '')
comfig_builder = config_builder.set_compare_estimator_and_feature_spec('', '')
wv = WitWidgetKeras(config_builder, height=tool_height_in_px)

#### Exploration ideas

- Organize datapoints by setting X-axis scatter to "inference score 1" and Y-axis scatter to "inference score 2" to see how each datapoint differs in score between the original model (1) and debiased model (2). Points off the diagonal have differences in results between the two models.
  - Are there patterns of which datapoints don't agree between the two models?
  - If you set the ground truth feature dropdown in the "Performance + Fairness" tab to "is_toxic", then you can color or bin the datapoints by "inference correct 1" or "inference correct 2". Are there patterns of which datapoints are incorrect for model 1? For model 2?

You may want to focus on terms listed [here](https://github.com/conversationai/unintended-ml-bias-analysis/blob/master/unintended_ml_bias/bias_madlibs_data/adjectives_people.txt)

In [0]:
#@title Add a feature column for each identity term to indicate if it exists in the comment
!wget https://raw.githubusercontent.com/conversationai/unintended-ml-bias-analysis/master/unintended_ml_bias/bias_madlibs_data/adjectives_people.txt

import re

with open('adjectives_people.txt', 'r') as f:
  segments = f.read().strip().split('\n')
print(segments)

# Tag every sentence with an identity term
comments = df['comment'].values
seg_anns = {}
selected_segments = segments
for s in selected_segments:
  is_seg = []
  for c in comments:
    if re.search(s, c):
      is_seg.append(1)
    else:
      is_seg.append(0)
  seg_anns[s] = is_seg

for seg_key, seg_ann in seg_anns.iteritems():
  df[seg_key] = pd.Series(seg_ann, index=df.index)

examples = df_to_examples(df)