# Exercise set 9: Bias detection and correction
In this set you will practice: bias detection, correction, and mitigation.
Again you will use the [Toxicity classification dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data) from the other exercises.
Furthermore you should download the dataset provided by Kiritchenko & Mohammed 2018 [(data)](https://saifmohammad.com/WebDocs/EEC/Equity-Evaluation-Corpus.zip) for testing biases in text classification systems 


The classifier we will test is based on the Universal Sentence Encoder that we used in [exercise set 8](https://github.com/ulfaslak/sds_tddl_2020/blob/master/exercises/week8_exercises_transferlearning.ipynb).


## 9.0 Create function that takes in a train and test data and trains the transfer learning model from last weeks exercises. 
  - **Hint** initialize the hublayer outside the function, and use the `tf.keras.models.clone_model()` function to avoid downloading the layer everytime you reinitialize.

## 9.1 Estimate Differential Bias 
Here we shall look at both individual classification bias and proportional classification bias.
1. Train classifier on Toxicity dataset.
2. Estimate Differential Biases for each of the minority populations. i.e. white black asian jewish etc columns. 
  - You need to set a threshold of how many percent of the annotators who aggreed on the Minority group (e.g. >0.5)
  - Using the testset you should construct a confusion matrix for each minority.  
  - Report the Accuracy, Precision, Recall and F1 score.
3. Test the *"Classify&Count"* method for estimating proportion for the general popolation and then for the subpopulations of each minority group.
  - See which groups have most error. 
  - Report Proportional classification accuracy using the pearsons product moment correlation (np.coercoef), and the root mean square deviation (RMSD). 



In [0]:
# load dataset
import pandas as pd
path2tox_data = '/content/drive/My Drive/lm/toxic_train.csv'
tox_df = pd.read_csv(path2tox_data)

tox_df['label'] = (tox_df.target>0.5).astype(int)
print(tox_df.shape)
# subsample data to allow faster prototyping
# df = df.sample(5000) # simple solution
# stratified solution where we subsample from each meta data column to get a higher variance.
strat_sample_cols = list(tox_df.columns[3:23])+['physical_disability',
       'psychiatric_or_mental_illness', 'transgender', 'white']
samples = []
n = 500
for col in strat_sample_cols:
    binary = pd.DataFrame((tox_df[col]>0.5).astype(int))
    samples+=[j for _,j in binary.groupby(col).apply(lambda x: x.sample(min(len(x),n//2))).index]
idx = list(set(samples))
df = tox_df.iloc[idx].copy()

sample = df.groupby('label').apply(lambda x: x.sample(500))
sample_texts = sample.comment_text.values
print(df.shape,tox_df.shape)


(1804874, 46)
(11138, 46) (1804874, 46)


In [0]:
# Solution 9.0
## Initialize hublayer outside of train_function.
# Make basemodel to avoid downloading everytime.
# Add USE layer to model  
# Define train_transfer_use function

## 9.2 Correct the Bias using "Ideal Method" (i.e. that labelled and unlabelled are drawn randomly from the same distribution).
Here we need to split the data into 3 sets: train, eval, and test. Because we need a decent amount of samples in the evaluation set, we need to use more of the "expensive" labelled data. Luckily we have a large toxicity dataset referenced under `tox_df` if you use my loading cell. 
  1. Train a new classifier on train data. 
  2. Estimate aggregrate confusion matrix, and then for individual minotiry groups on the evaluation data.
  3. Apply classifier to the rest of the data (i.e. the test set) and correct the predictions as described in the lecture and Hopkins and King (2010:235). $P(D)=\frac{T P-F P}{T P+F P} * \hat{P}+\frac{F N}{F N+T N} * \hat{N}$ where P(D) is the probability of a document category. $\hat{P}$ and $\hat{N}$ is positive predictions and negative predictions.
  4. Report the proportional classification error as above, and comment on the improvement.

In [0]:
## Solution 9.2 Bias Correction.

## 9.3. Compare to the direct estimation method suggested by Hopkins & King 2010 and Jerzak, King, and Strezhnev 2020. 
Since the methods proposed in the above papers, is based on other feature representation schemes, it is not entirely meaningful as a comparison to the *Classify-and-count" method. Instead we use the same feature representation as the classifier, i.e. the Universal Sentence Encoder, and estimate the equation.

As referenced from the text by Hopkins and King:

"*think of P(D) as the unknown “regression coefficients”  $\beta$, P(S|D) as the “explanatory variables” matrix X, and P(S) as the “dependentvariable” Y.*"

Where P(D) is the probability of a document category (observed as no. of Positive Labels in the training set). P(S) is the probability of the document feature representation (again observed in the training data). P(S|D) is the probability of the Document Representation given Document Category D, which should be estimated using the standard linear regression calculations. I.e. solve for $\beta$: 

$\beta=\left(X^{\prime} X\right)^{-1} X^{\prime} y$

So what you need to do is the following:
1. Encode the texts from the data to define your x_train and x_test.
2. Estimate P(S|D) i.e. $\beta$. **hint**: use the `np.linalg.inv` function, the `.T` and the `.dot` function.
3. Estimate Proportions of each document in the test data by taking the dot product between P(S) and $\beta$. 
4. Aggregate to estimate proportions.
  - Report overall and individual minority proportions estimated compared to the test. 
  - Report measures for Proporational classification (rmsd and correlation).



In [0]:

# Solution 9.3: Direct estimation
## Encode text using the Universal Feature Encoder (hint: use the "hublayer-only"-model you clone when training your classifier)

## Exercise 9.4: Bias detection using critical test cases
This is about making the bias of the model visible, by questioning specific dimensions (e.g. race, gender, age).. Method is demonstrated in the paper: ["Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems"](https://arxiv.org/abs/1805.04508) by Kiritchenko & Mohammad 2018:. [data](https://saifmohammad.com/WebDocs/EEC/Equity-Evaluation-Corpus.zip)
You will use this data to test a pretrained system. Then you will practice curate your own specific "bias-detection-scheme", by combining the methods you have learned so far: keyword exploration using the [King, Lam and Robert 2017](https://gking.harvard.edu/publications/computer-assisted-keyword-and-document-set-discovery-fromunstructured-text)'s method - exercise set for week 7, universal dependence parsing with stanfordnlp now known as the [`stanza`](https://stanfordnlp.github.io/stanza/) package. 
You will use this to create what is known as data augmentation scheme which can be used both for "bias detection" as well as "bias mitigation". 

First are some helper functions for preparing the data and models you should use. 



## Loading bias detection data

In [0]:
## Download the equity evaluation corpus
### linux commandline version
link = 'https://saifmohammad.com/WebDocs/EEC/Equity-Evaluation-Corpus.zip'
! wget {link}
path = link.split('/')[-1]
dir_to_extr = 'bias_dataset'
import os
if not os.path.isdir(dir_to_extr):
  os.mkdir(dir_to_extr)

#! unzip {path} -d {dir_to_extr]
os.system('unzip %s -d %s'%(path,dir_to_extr))

--2020-04-02 09:24:29--  https://saifmohammad.com/WebDocs/EEC/Equity-Evaluation-Corpus.zip
Resolving saifmohammad.com (saifmohammad.com)... 192.185.17.122
Connecting to saifmohammad.com (saifmohammad.com)|192.185.17.122|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1669592 (1.6M) [application/zip]
Saving to: ‘Equity-Evaluation-Corpus.zip.7’


2020-04-02 09:24:30 (1.44 MB/s) - ‘Equity-Evaluation-Corpus.zip.7’ saved [1669592/1669592]



In [0]:
## Download the equity evaluation corpus
### Python version
import requests
link = 'https://saifmohammad.com/WebDocs/EEC/Equity-Evaluation-Corpus.zip'
session = requests.session()
session.headers = '' # for some reason they block explicit python requests.
response = session.get(link)
with open('Equity-Evaluation-Corpus.zip','wb') as f:
    f.write(response.content)
import zipfile
zip_ref = zipfile.ZipFile('Equity-Evaluation-Corpus.zip', 'r')
dir_to_extr = 'bias_dataset'
import os
if not os.path.isdir(dir_to_extr):
    os.mkdir(dir_to_extr)
zip_ref.extractall(dir_to_extr)
zip_ref.close()

In [0]:
import pandas as pd
dir_to_extr = 'bias_dataset/Equity-Evaluation-Corpus'
bias_df = pd.read_csv(dir_to_extr+'/Equity-Evaluation-Corpus.csv')

## Loading the deepmoji model. 
As transfer learning is about using pretrained models, one has to be flexible in relation to the choice of deep learning framework. A working model of the DeepMoji is implemented by the [HuggingFace team](https://huggingface.co/welcome) under the name [TorchMoji](https://github.com/huggingface/torchMoji), which is basically a Pytorch implementation adapted from the python 2.7 Keras implementation made by Bjarke Felbo (https://github.com/bfelbo/DeepMoji/tree/master/deepmoji). 

The script will do the following:
- Clone the github repo. 
- Download the model weights 
- install dependencies hereunder the `emoji` python package.
- Add the torchmoji to the python syspath for easy import. 
- Load the tokenizer that deepmoji depends on.
- Load the model.
- Define a helper function for translating "literal-emojies" to unicode emojies.

In [0]:
## clone the repository
! git clone https://github.com/huggingface/torchMoji.git
## download the pretrained model's weights using their script
import os
cwd = os.getcwd()
os.chdir('torchMoji')
! python scripts/download_weights.py

import os
#os.chdir('torchMoji')
# navigate to the torchmoji folder
## install dependencies
#! pip install -e 
! pip install emoji

Cloning into 'torchMoji'...
remote: Enumerating objects: 143, done.[K
remote: Total 143 (delta 0), reused 0 (delta 0), pack-reused 143[K
Receiving objects: 100% (143/143), 2.41 MiB | 4.19 MiB/s, done.
Resolving deltas: 100% (49/49), done.
About to download the pretrained weights file from https://www.dropbox.com/s/q8lax9ary32c7t9/pytorch_model.bin?dl=0#
The size of the file is roughly 85MB. Continue? [y/n]
y
Downloading...
Running system call: wget https://www.dropbox.com/s/q8lax9ary32c7t9/pytorch_model.bin?dl=0# -O /content/torchMoji/model/pytorch_model.bin
--2020-04-02 12:00:35--  https://www.dropbox.com/s/q8lax9ary32c7t9/pytorch_model.bin?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.1, 2620:100:6021:1::a27d:4101
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/q8lax9ary32c7t9/pytorch_model.bin [following]
--2020-04-02 12:00:35--  https://www.dropbox.com/s

In [0]:
# add to sys.path
import sys
base_path = '' # change if you have downloaded folder elsewhere.
base_path = 'torchMoji/' ## path to the torchmoji directory
sys.path.insert(0, base_path)

In [0]:
from torchmoji.sentence_tokenizer import SentenceTokenizer
# load the deepmoji encoder that transforms text to emojies.
from torchmoji.model_def import torchmoji_emojis
from torchmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH
import json,csv, numpy as np
import warnings; warnings.simplefilter('ignore')


## set the max context length
max_token = 30 ## This will not work for longer texts,
################# here you should consider splitting each text into smaller segments.

# Load vocab (i.e. the index of each word in the vector representation)
with open(VOCAB_PATH, 'r') as f:
    vocabulary = json.load(f)

# initialize tokenizer
sentence_tokenizer = SentenceTokenizer(vocabulary, max_token)
# load model
model = torchmoji_emojis(PRETRAINED_PATH)

### Load emoji translater to map output dimensions of the DeepMoji to unicode Emojies.

In [0]:
# Change working directory back to normal. 
os.chdir(cwd)
with open(base_path+'data/emoji_codes.json') as f:
    emoji_desc = json.load(f)
print(list(emoji_desc.items())[0:10])
import emoji
def translate_emoji(emoji_descr):
    if emoji_descr in emoji.unicode_codes.EMOJI_ALIAS_UNICODE:
        return emoji.unicode_codes.EMOJI_ALIAS_UNICODE[emoji_descr]
    if emoji_descr in emoji.unicode_codes.EMOJI_UNICODE:
        return emoji.unicode_codes.EMOJI_UNICODE[emoji_descr]
    return emoji_descr
to_emoji = [translate_emoji(desc) for i,desc in sorted(emoji_desc.items(),key=lambda x: int(x[0]))]
to_emoji_desc = [desc for i,desc in sorted(emoji_desc.items(),key=lambda x: int(x[0]))]
## index 
to_emoji[0],to_emoji_desc[0]

[('0', ':joy:'), ('1', ':unamused:'), ('2', ':weary:'), ('3', ':sob:'), ('4', ':heart_eyes:'), ('5', ':pensive:'), ('6', ':ok_hand:'), ('7', ':blush:'), ('8', ':heart:'), ('9', ':smirk:')]


('😂', ':joy:')

### Now we are ready to encode the text as emojis using the pretrained model.


## 9.5.1: Investigate the bias of the [DeepMoji](https://deepmoji.mit.edu/) classifier from the paper ["Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm"](https://arxiv.org/pdf/1708.00524.pdf) using the Kiritchenko & Mohammed 2018 [dataset](https://saifmohammad.com/WebDocs/EEC/Equity-Evaluation-Corpus.zip)

The dataset contains identical sentences changing only the name of the person being referenced. 

Dataset used is referenced under the `bias_df`.

|    | ID                    | Sentence                | Template                               | Person   | Gender   | Race             | Emotion   | Emotion word   |
|---:|:----------------------|:------------------------|:---------------------------------------|:---------|:---------|:-----------------|:----------|:---------------|
|  0 | 2018-En-mystery-05498 | Alonzo feels angry.     | <person subject> feels <emotion word>. | Alonzo   | male     | African-American | anger     | angry          |
|  1 | 2018-En-mystery-11722 | Alonzo feels furious.   | <person subject> feels <emotion word>. | Alonzo   | male     | African-American | anger     | furious        |
|  2 | 2018-En-mystery-11364 | Alonzo feels irritated. | <person subject> feels <emotion word>. | Alonzo   | male     | African-American | anger     | irritated      |


The deepmoji model is referenced under as `model`.

And the tokenizer is referenced as `sentence_tokenizer`.

You should now do the following:

1. Tokenization. Tokenize the documetns in the `bias_dataset`.
  - Use the `sentence_tokenizer` defined above to tokenize the documents.

  - see example in the torchmoji examples [e.g.](https://github.com/huggingface/torchMoji/blob/master/examples/encode_texts.py) folder for help.

  - Inspect the tokenized documents to see the format.
  - Try to convert them back using <code>vocabulary</code> variable defined earlier. **- Hint this means reversing the vocabulary dictionary.**
2. DeepMoji encoding
  - Encode the tokenized sentences and wrap it in a function.
  - Hint: Do a forward pass of the model on the tokenized data. Check [here](https://github.com/huggingface/torchMoji/blob/master/examples/encode_texts.py) for help 

  - For larger datasets and with longer sentences encoding is problematic if not done in batches. 
  - Write a for loop that takes only 256 tokenized documents at a time and concatenate them to a dataframe in the end.
  - Use the <code>to_emoji list </code> as columns in the dataframe
3. Join DeepMoji Encoding with the `bias_df`.
  -  Join the output of Deepmoji with the bias dataframe columns (Race, Gender and Emotion)
  - Make sure Race count and Gender counts are equal after join.
4. Investigate if there are significant differences in relation to **Race** (Race column).
  - See which types of emojies are most changed by a change in race or gender.
  - See which *Emotions* (Emotion column) have largest difference in encoding in relation to different races. 
    - I.e. Groupby Emotion and Race and calculate absolute difference in emoji encoding. 
    - hint: first groupby emotion and race, calculate mean, then diff, then abs and then sum.


In [0]:
## Solution 9.5.1 Tokenization

In [0]:
## Solution 9.5.2 DeepMoji Encoding

In [0]:
# Solution 9.5.3 Join with bias_df

In [0]:
# Solution 9.5.4 Analyzing how Gender and Race alone changes the DeepMoji encoding

## Exercise 9.6.1: Create your own test case by substituting minority identifiers in the toxicity dataset.

The key to the bias test was substituting using the same sentences but with different subjects in the sentence, (Alonzo, Alan, He, she) etc. We can construct a similar dataset by *Augmenting* the toxicity dataset (referenced as `df`).
- First we create our minority identifiers that we want to substitute / remove. Could have been done iteratively using Exploration methods like (word similarity search or active learning style keyword discovery from most predictive features). 
  - Instead we will pick most predictive phrases from each minority category in the data. 
    1. Because we will use it for our data augmentation scheme we want slightly more information than just words. We therefore tokenize and process documents using the standfordnlp package. ```! pip install stanza
import stanza
stanza.download('en') # download English model
nlp = stanza.Pipeline('en') # initialize English neural pipeline```
    2. Aply the nlp pipeline to all documents.
    3. Extract word and wordtype pairs from all documents. i.e. a document will look like this: `[(w, wtyp), (w1,wtyp) ... (wi,wtyp)]`. Remember to lowercase.
    4. Use the `bow_to_sparse` helper function to create an index using the most prevalent word,wordtype pairs and transform the documents to bows. The function returns `sparse_matrix,index`, which denotes a matrix of word_pair counts, and the corresponding index of each wordpair. 
    5. Train a classifier (logistic regression) for each minority column.
    6. Extract most predictive features (i.e. `.coef_`). 
    7. Go through the phrases and pick the most useful (at least 10).
  - Do the above for at least 3 different minorities.  

- Now we want to Remove the minority identifiers and see how our model does.
  - Write a function that takes a list of identifiers and replaces them with a pattern. 
    - `def change_identifier(identifiers,replace_pattern=''):` 
  - Apply the function `change_identifier` function to the texts to create your *"augmented"* dataset.
  - Train two models:
    - One on all the texts. 
    - One with on the *augmented* the texts where all minority identifiers are 
  - Report the differences in predictions. 



In [0]:
! pip install stanza
import stanza
stanza.download('en') # download English model
nlp = stanza.Pipeline('en') # initialize English neural pipeline

In [0]:
# SOlution 9.6.1 APply stanza pipeline

In [0]:
# SOlution 9.6.2 Extract word pairs

In [0]:
## Helper function bow_to_sparse.
import scipy.sparse as sp
from collections import Counter
def bow_to_sparse(docs,max_vocab_size=32768):
    c = Counter()
    bows = []
    for doc in docs:
      bow = Counter(doc)
      bows.append(bow)
      c.update(bow)
    w2i = {w:num for num,(w,count) in enumerate(c.most_common()[0:max_vocab_size])}

    idx = sorted(w2i,key=lambda x: w2i[x])
    vocab_size = len(idx)
    X = sp.dok_matrix((len(docs),vocab_size), dtype=np.int32)
    for num in range(len(docs)):
        bow = bows[num]
        for w,count in bow.items():
            try:
              wi = w2i[w]
            except:
              continue
            X[num,wi]=count
    print(X.shape)
    X = X.tocsr()
    print(X.shape)
    return X,idx


In [0]:
# Solution 9.6.3 Get minority identifiers using a variation over the King, Roberts and Lam 2017 method.

# 9.7: Data augmentation to mitigate bias. 
Here you will create synthetic data to *mitigate* the minority biases by training the model on data where the different minority identifiers are used interchangebly.

The strategy is to create new synthetic data by substituting an identifier from e.g. "black" identifiers with "white" identifiers, and we want to only substitute columns with the same Wordtyp. 

- Define a function that takes Two Sets of identifiers and substitutes identifiers from each set: 
  1. Makes a copy of the training BoWs (i.e. `.copy()` function). Runs through each identifer in set 1, locates the rows where it is active and makes a copy of these rows. 
  2. Sets identifer column to 0.
  3. For each identifier in set 2 that match the same wordtype, make a copy of the rows. Set identifier column to 1 and append to list. 
  4. Concatenate these sparsematrices to one sparse matrix using the `scipy.sparse.vstack` function.

- Apply the function to create extra synthethic data using all identifier sets.
- Train a classifier using both the old and new synthetic data and investigate its differential bias.



In [0]:
# Solution 9.7.2 Dataaugmentation
# Substitution function

# Create synthetic dataset

# Train Classifier.

## Extra
- Implement a simple semi-supervised learning classifier. 

- Compute the lift curve. What does the graph tell you about the Saturation of your classifier?


