<br/>

# Part 1: A Good Teacher Is Helpful For All Students -- Custom Loss

<br/>

Basically, the finally score is an average of 4 AUC. 3 of them only take into account parts of the dataset that depending on whether the comment mentions word like 'gay' and whether it's toxic.

![metric.png](Pictures/metric.png)![metric2.png](Pictures/metric2.png)
<br/>
Because of the 4 AUC average evaluation metric, we try to make a custom loss fuction instead of just using the binary cross entropy. 

## There are 2 main change of the loss fuction:

### 1. weight each sample

The main idea is:<br/>
Each sample participates in some of these AUC. **A sample that participates in 3 terms is more important than a sample that participates in 2 terms** since giving a bad score to that sample affects the overall score more.

What we do is as following:
We calculate the weight of each sample base on how many AUC they belong to.


In [None]:
# Overall
weights = np.ones((len(train_df),)) / 4

# Subgroup
weights += (train_df[identity_columns].fillna(0).values>=0.5).sum(axis=1).astype(bool).astype(np.int) / 4

# Background Positive, Subgroup Negative
weights += (( (train_df['target'].values>=0.5).astype(bool).astype(np.int) +
   (train_df[identity_columns].fillna(0).values<0.5).sum(axis=1).astype(bool).astype(np.int) ) > 1 ).astype(bool).astype(np.int) / 4

# Background Negative, Subgroup Positive
weights += (( (train_df['target'].values<0.5).astype(bool).astype(np.int) +
   (train_df[identity_columns].fillna(0).values>=0.5).sum(axis=1).astype(bool).astype(np.int) ) > 1 ).astype(bool).astype(np.int) / 4

# for later normalization the loss
loss_weight = 1.0 / weights.mean()

### 2. Auxiliary Target 
<br/>
Actually, the competition dataset not only have the toxicity column that can be treated as target, there are other columns that is highly correlated to target too.
There are 6 more columns that is highly correlated with target column:

**['severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat','sexual_explicit']**

For more explaination of the dataset, please check [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).

What's more the official baseline (score about 0.87) just convert toxicity >= 0.5 to 1 and others 0, which may lose some useful information for the model. So we also use the toxicity probability as auxiliary target.

In [None]:
y_columns = ['target'] #0/1
y_aux_columns = \
['target_prob','target_prob','severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat','sexual_explicit']
# two target_prob is for adjusting the weight of aux_columns

In [None]:
def custom_loss(preds,targets,weights):
    ''' Define custom loss function for weighted BCE on 'target' column '''
    bce_loss_1 = nn.BCEWithLogitsLoss(weight=weights)(preds[:,0],targets[:,0]) #weighted y_columns
    bce_loss_2 = nn.BCEWithLogitsLoss()(preds[:,1:],targets[:,1:]) # y_aux_columns
    return ((bce_loss_1 * loss_weight)*0.60 + bce_loss_2*0.40)*2 

 It turn out that not only the custom loss fuction works well (boost LSTM model AUC:0.930 -> 0.934 when doing experiment), when we use the custom loss for fine-tuning Bert & GPT2, it also gives the model great boost. 