# Jigsaw Unitended Bias in Toxic Classification - Kaggle competition

## Description of a problem

*Can you help detect toxic comments and minimize unintended model bias? That's your challenge in this competition.*

*In this competition, challenge is to build a model that recognizes toxicity in comments and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. Develop strategies to reduce unintended bias in machine learning models, and you'll help the Conversation AI team, and the entire industry, build models that work well for a wide range of conversations.*

## Evaluation

This competition use a newly developed metric that combines several submetrics to balance overall performance with various aspects of unintended bias.

First, lets define each submetric.

#### Overall AUC
This is the ROC-AUC for the full evaluation set.

#### Bias AUCs:
To measure unintended bias, we again calculate the ROC-AUC, this time on three specific subsets of the test set for each identity, each capturing a different aspect of unintended bias.

#### Subgroup AUC
Here, we restrict the data set to only the examples that mention the specific identity subgroup. A low value in this metric means the model does a poor job of distinguishing between toxic and non-toxic comments that mention the identity.

#### BPSN (Background Positive, Subgroup Negative) AUC: 
Here, we restrict the test set to the non-toxic examples that mention the identity and the toxic examples that do not. A low value in this metric means that the model confuses non-toxic examples that mention the identity with toxic examples that do not, likely meaning that the model predicts higher toxicity scores than it should for non-toxic examples mentioning the identity.

#### BNSP (Background Negative, Subgroup Positive) AUC:
Here, we restrict the test set to the toxic examples that mention the identity and the non-toxic examples that do not. A low value here means that the model confuses toxic examples that mention the identity with non-toxic examples that do not, likely meaning that the model predicts lower toxicity scores than it should for toxic examples mentioning the identity.




## Generalized Mean of Bias AUCs

To combine the per-identity Bias AUCs into one overall measure, we calculate their generalized mean as defined below:


<center>
$M_{p}(m_{s})  =  (\frac{1}{N}\sum_{s=1}^{N} m^{p}_{s})^{\frac{1}{p}}$
</center>

where:
* $M_{p}$ - the pth-power mean function
* $m_{s}$ - the bias metric m calculated for subgroup s
* N = number of identity subgroups

For this competition, we use a p value of -5 to encourage improvements of the models for the identity subgroups with the lowest model performance.

### Final metric

We combine the overall AUC with the generalized mean of the Bias AUCs to calculate the final model score:

<center>
$score = 0.25 * AUC_{overall} + \sum_{a=1}^{A}(0.25*M_{p}(m_{s,a}))$
</center>

where, 

* A = 3 (number of submetrics)
* $ m_{s,a}$ = bias metric for identity subgroup s using submetric a




In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import sys
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.layers import Dense, Input
from keras.layers import Embedding, Bidirectional, LSTM
from keras.models import Model, Sequential
from keras.initializers import Constant

from sklearn.metrics import roc_auc_score


import time
import os
print(os.listdir("../input"))



Using TensorFlow backend.


['glove-global-vectors-for-word-representation', 'jigsaw-unintended-bias-in-toxicity-classification']


In [2]:
tic = time.time()

identity_columns = [
    'male', 'female', 'homosexual_gay_or_lesbian', 'christian', 'jewish',
    'muslim', 'black', 'white', 'psychiatric_or_mental_illness']

df_raw = pd.read_csv('../input/jigsaw-unintended-bias-in-toxicity-classification/train.csv',usecols= ['target'] + ['comment_text'] + identity_columns)

toc = time.time()
print("Run of this cell took: " + str(round(toc-tic)) + " seconds")

Run of this cell took: 10 seconds


In [3]:
df_raw.head()

Unnamed: 0,target,comment_text,black,christian,female,homosexual_gay_or_lesbian,jewish,male,muslim,psychiatric_or_mental_illness,white
0,0.0,"This is so cool. It's like, 'would you want yo...",,,,,,,,,
1,0.0,Thank you!! This would make my life a lot less...,,,,,,,,,
2,0.0,This is such an urgent design problem; kudos t...,,,,,,,,,
3,0.0,Is this something I'll be able to install on m...,,,,,,,,,
4,0.893617,haha you guys are a bunch of losers.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [4]:
#setting target column to either 0 or 1
threshold = 0.5
for col in ['target'] + identity_columns:
    df_raw[col][df_raw[col] < threshold] = 0
    df_raw[col][df_raw[col] >= threshold] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


In [5]:
#extract dependent and independent variables

comment_text = df_raw['comment_text']

target = df_raw['target']

In [6]:
#tokenize comment_text (dependent variable)

tic = time.time()

#choose vocabulary and max sequence size
VOCABULARY_SIZE = 20000 
MAX_SEQUENCE_LENGTH = 150

#create instance of keras Tokenizer class
tokenizer = Tokenizer(num_words=VOCABULARY_SIZE)
tokenizer.fit_on_texts(comment_text)

# pad sequences to MAX_SEQUENCE_LENGTH
comment_text = pad_sequences(tokenizer.texts_to_sequences(comment_text),MAX_SEQUENCE_LENGTH)

toc = time.time()

print("Run of this cell took: " + str(round(toc-tic)) + " seconds")
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))


Run of this cell took: 194 seconds
Found 397708 unique tokens.


In [7]:
# separating data into train/validation sets

train_text = comment_text[:1700000]
train_target = target[:1700000]

validate_text = comment_text[1700000:]
validate_target = target[1700000:]

print("Number of training examples is : " + str(train_target.shape[0]))
print("Number of validation examples is : " + str(validate_target.shape[0]))


Number of training examples is : 1700000
Number of validation examples is : 104874


### Preparing the pretrained embedding layer (GloVe)

In [8]:
EMBEDDING_DIM = 200

In [9]:
#Next, we compute an index mapping words to known embeddings, by parsing the data dump of pre-trained embeddings:
EMBEDDINGS_PATH = '../input/glove-global-vectors-for-word-representation/glove.6B.200d.txt'
embeddings_index = {}
f = open(EMBEDDINGS_PATH)
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))


Found 400000 word vectors.


In [10]:
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector


In [11]:
#load this embedding_matrix into a layer
embedding_layer = Embedding(len(word_index) + 1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False)


In [12]:
model = Sequential()

#adding layers
embedding = model.add(embedding_layer)
model.add(Bidirectional(LSTM(166,activation='tanh')))
model.add(Dense(80,activation='tanh'))
model.add(Dense(1,activation='sigmoid'))

model.compile(loss='binary_crossentropy',optimizer='adam')


In [13]:
#fit the model
tic = time.time()

model.fit(train_text,train_target,batch_size=1024*4,epochs=7)

toc = time.time()

print("Run of this cell took: " + str(round(toc-tic)) + " seconds")


Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7
Run of this cell took: 3457 seconds


### Calculate train and validation score

#### Class for evaluation metric


In [14]:
class JigsawEvaluator:

    def __init__(self, y_true, y_identity, power=-5, overall_model_weight=0.25):
        self.y = (y_true >= 0.5).astype(int)
        self.y_i = (y_identity >= 0.5).astype(int)
        self.n_subgroups = self.y_i.shape[1]
        self.power = power
        self.overall_model_weight = overall_model_weight

    @staticmethod
    def _compute_auc(y_true, y_pred):
        try:
            return roc_auc_score(y_true, y_pred)
        except ValueError:
            return np.nan

    def _compute_subgroup_auc(self, i, y_pred):
        mask = self.y_i[:, i] == 1
        return self._compute_auc(self.y[mask], y_pred[mask])

    def _compute_bpsn_auc(self, i, y_pred):
        mask = self.y_i[:, i] + self.y == 1
        return self._compute_auc(self.y[mask], y_pred[mask])

    def _compute_bnsp_auc(self, i, y_pred):
        mask = self.y_i[:, i] + self.y != 1
        return self._compute_auc(self.y[mask], y_pred[mask])

    def compute_bias_metrics_for_model(self, y_pred):
        records = np.zeros((3, self.n_subgroups))
        for i in range(self.n_subgroups):
            records[0, i] = self._compute_subgroup_auc(i, y_pred)
            records[1, i] = self._compute_bpsn_auc(i, y_pred)
            records[2, i] = self._compute_bnsp_auc(i, y_pred)
        return records

    def _calculate_overall_auc(self, y_pred):
        return roc_auc_score(self.y, y_pred)

    def _power_mean(self, array):
        total = sum(np.power(array, self.power))
        return np.power(total / len(array), 1 / self.power)

    def get_final_metric(self, y_pred):
        bias_metrics = self.compute_bias_metrics_for_model(y_pred)
        bias_score = np.average([
            self._power_mean(bias_metrics[0]),
            self._power_mean(bias_metrics[1]),
            self._power_mean(bias_metrics[2])
        ])
        overall_score = self.overall_model_weight * self._calculate_overall_auc(y_pred)
        bias_score = (1 - self.overall_model_weight) * bias_score
        return overall_score + bias_score


In [15]:
# identity_columns = [
#     'male', 'female', 'homosexual_gay_or_lesbian', 'christian', 'jewish',
#     'muslim', 'black', 'white', 'psychiatric_or_mental_illness']

# TRAIN SCORE
# calculate in on first n examples (it should be close enough approximation)

tic = time.time()
n = 100000

y_true = train_target.values[:n]
y_identity = df_raw[identity_columns].iloc[:n].values
y_pred = model.predict(train_text[:n],batch_size=1024)

# evaluate
evaluator = JigsawEvaluator(y_true, y_identity)
auc_test_score = evaluator.get_final_metric(y_pred)
print(auc_test_score)

toc = time.time()
print("Run of this cell took: " + str(round(toc-tic)) + " seconds")


  """


0.9202662695278806
Run of this cell took: 17 seconds


In [16]:
# VALIDATION SCORE

tic = time.time()

y_true = validate_target.values
y_identity = df_raw[identity_columns].iloc[1700000:].values
y_pred = model.predict(validate_text,batch_size=1024)

# evaluate
evaluator = JigsawEvaluator(y_true, y_identity)
auc_test_score = evaluator.get_final_metric(y_pred)
print(auc_test_score)


toc = time.time()
print("Run of this cell took: " + str(round(toc-tic)) + " seconds")


  """


0.9071720077108385
Run of this cell took: 17 seconds


### Train a little bit more on valdiation set

In [17]:
tic = time.time()

model.fit(validate_text,validate_target,batch_size=1024*4,epochs=5)

toc = time.time()

print("Run of this cell took: " + str(round(toc-tic)) + " seconds")


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Run of this cell took: 151 seconds


### Make predictions on test set

In [18]:
df_test = pd.read_csv('../input/jigsaw-unintended-bias-in-toxicity-classification/test.csv')
submission = pd.read_csv('../input/jigsaw-unintended-bias-in-toxicity-classification/sample_submission.csv', index_col='id')


In [19]:
tic = time.time()

submission['prediction'] = model.predict(pad_sequences(tokenizer.texts_to_sequences(df_test['comment_text']),MAX_SEQUENCE_LENGTH),batch_size=1024)[:, 0]
submission.to_csv('submission.csv')

toc = time.time()
print("Run of this cell took: " + str(round(toc-tic)) + " seconds")


Run of this cell took: 21 seconds


## History

#### First try:

Text preprocessing: no text preprocessed (only default tokenization)

Model Architecture -> vocabulary size : 20000 sequence size = 150

Layers:
Embedding(vocabulary_size,300))
Bidirectional(LSTM(166)))
(Dense(80,activation='tanh'))
(Dense(1,activation='sigmoid'))

optimizer = adam, epochs = 5

| Set  |Score|  
|---|---|
|  Train | 0.91871 | 
|  Validate | 0.90468 |  
|   Test|  0.90673 |

More detailed diagnostics on validation set:

|  Identity | Subgroup AUC  | BPSN  | BNSP  | 
|---|---|---|---|---|
| male  | 0.901 |0.886   | 0.958 |
| female  |  0.897 | 0.898 | 0.951  | 
| homosexual_gay_or_lesbian  |  0.842 |  0.795 | 0.971  |  
|  christian |  0.920 |0.930  | 0.941  | 
| jewish  | 0.867  | 0.904 | 0.926  |   
|  muslim | 0.860  | 0.842 | 0.962  |
| black  | 0.825  |  0.800 | 0.965  |
| white  | 0.831  | 0.805  |  0.964 |   
| psychiatric_or_mental_illness  | 0.872  | 0.850  | 0.963  |
<br>
* BPSN is the lowest score. A low value in this metric means that the model confuses non-toxic examples that mention the identity with toxic examples that do not, likely meaning that the model predicts higher toxicity scores than it should for non-toxic examples mentioning the identity.
* model does good on BNSP scores meaning that model doesn't confuse toxic examples that mention the identity with non-toxic examples that do not