In [193]:
import pandas as pd
import numpy as np
from utils import myutils, myutils_analysis
from sklearn.metrics import classification_report

## Qualitative analysis
In this notebook, we attempt to qualitatively analyse the baseline and hierarchical model performances,
and observe what the two models perform better and worse at.
The actual findings and text examples extracted can be found in the paper.
### Setup
We load in texts and labels of the test subset of the dataset, and the predictions of the transformer baseline and our hierarchical approach

In [223]:
text, labels = myutils.read_data('../data/test_int_label.csv', 'text', 'c')
hier_predictions = pd.read_csv('../label_preds/label_pred_singleMLM.csv.csv')
hier_predictions = hier_predictions.prediction
baseline_predictions = pd.read_csv('../label_preds/label_pred_baseline.csv')
baseline_predictions = baseline_predictions.prediction

### We look at the distribution of labels and predicted labels of the hierarchical model
While the distribution seems similar, the predictions are not entirely correct.

In [28]:
print([len(labels[labels == i]) for i in range(12)])
print([len(hier_predictions[hier_predictions == i]) for i in range(12)])

[3030, 16, 73, 205, 192, 57, 182, 119, 18, 14, 21, 73]
[2918, 25, 82, 202, 196, 63, 192, 184, 36, 17, 27, 58]


## Analysis of the hierarchical model
In the following sections, we will dive into how the hierarchical model performed and take a closer look at the texts therein.
The actual analyses and examples can be found in the paper.

### Descriptive attacks (2.1)
We will take a deeper look at the most appearing non-sexist label; the 3rd fine-grained category(not counting non-sexist),
which is "descriptive attacks" 2.1.

In [107]:
label_3_correct = text[hier_predictions==3][labels==3]
label_3_wrong = text[hier_predictions==3][labels!=3]
label_3_wrong_but_sexist = text[hier_predictions==3][labels!=3][labels!=0]
print(f'Label 3 correctly predicted: {len(label_3_correct)}')
print(f'Label 3 wrongly predicted: {len(label_3_wrong)}')
print(f'Label 3 detected sexism but mislabelled: {len(label_3_wrong_but_sexist)}')
print(f'Wrongly detected as sexist: {len(label_3_wrong)-len(label_3_wrong_but_sexist)}')
print(f'Percentage of detected sexism but mislabelled: {len(label_3_wrong_but_sexist)/len(label_3_wrong)}')
label_3_wrong_but_sexist_golds = labels[hier_predictions==3][labels!=3][labels!=0]
print(f'How many were mislabelled as each category: {[len(label_3_wrong_but_sexist_golds[label_3_wrong_but_sexist_golds == i]) for i in range(12)]}')
label_3_wrong

Label 3 correctly predicted: 81
Label 3 wrongly predicted: 121
Label 3 detected sexism but mislabelled: 69
Percentage of detected sexism but mislabelled: 0.5702479338842975
Wrongly detected as sexist: 52
How many were mislabelled of each category: [0, 0, 0, 0, 14, 9, 2, 26, 2, 2, 2, 12]


44      Yes, it is new. What remains constant is that ...
319     Jesus if looks could kill: she could kill my c...
386                Don't pay attention to what women say.
1114    TFM? Anti male agenda is very much at large. M...
1377    Wives and girlfriends cheating is violence aga...
1522    It's TRUE. Never let a woman rule over you. I ...
1724    Just like they did to Stormy. People now see t...
1762    Average man works 8 hours more thats a full da...
1770    wymyn shouldn't be allowed to vote, let alone ...
2717    It's sad. A few years ago, it was all positive...
3014    Most women are Leftists. and all left paradigm...
3300    She got trolled so hard, women shuoldn't be al...
Name: text, dtype: object

### Aggressive and emotive attacks (2.2)
We take a deeper look at the second-most appearing non-sexist label; the 4th fine-grained category(not counting non-sexist),
"Aggressive and emotive attacks" 2.2.

In [82]:
label_4_correct = text[hier_predictions==4][labels==4]
label_4_wrong = text[hier_predictions==4][labels!=4]
label_4_wrong_but_sexist = text[hier_predictions==4][labels!=4][labels!=0]
label_4_wrong_and_not_sexist = text[labels==0][hier_predictions==4]
print(f'Label 4 correctly predicted: {len(label_4_correct)}')
print(f'Label 4 wrongly predicted: {len(label_4_wrong)}')
print(f'Label 4 detected sexism but mislabelled: {len(label_4_wrong_but_sexist)}')
print(f'Wrongly detected as sexist: {len(label_4_wrong)-len(label_4_wrong_but_sexist)}')
print(f'Percentage of detected sexism but mislabelled: {len(label_4_wrong_but_sexist)/len(label_4_wrong)}')
label_4_wrong_but_sexist_golds = labels[hier_predictions==4][labels!=4][labels!=0]
print(f'How many were mislabelled as each category: {[len(label_4_wrong_but_sexist_golds[label_4_wrong_but_sexist_golds == i]) for i in range(12)]}')

label_4_wrong_and_not_sexist

Label 4 correctly predicted: 94
Label 4 wrongly predicted: 102
Label 4 detected sexism but mislabelled: 76
Percentage of detected sexism but mislabelled: 0.7450980392156863
Wrongly detected as sexist: 26
How many were mislabelled of each category: [0, 2, 12, 12, 0, 2, 45, 0, 0, 0, 1, 2]


131     A false dichotomy. Apparently we believe that ...
519     Never LTR club girls. Fucking stupid. I dont e...
525     See how her first reaction is to try to hurt h...
582     HITLERY is the evilest woman on the planet. pe...
632     This sounds like it was written by a feminist ...
970     One, she is 15, uglier than me, femlet, fat, a...
1152    I really hate when females want to turn into m...
1187    Are they trying to poop on the floor? Nasty as...
1350      We don't hate women... we hate *you* fuckstick.
1881    To minimize the percentage of deleterious muta...
2290    Do we really care what this witch thinks? WHY ...
2494    I don't care if women wear makeup. It's worthl...
2751    I don't care if she's Trump's daughter.., she'...
2763    I am not an expert on trannies, but they are l...
2800    [USER] also, I have never and would never call...
2897    Like the "stunning and brave" woman that """st...
2950                                   Damn ugly she-ape!
3044          

## Hierarchical vs baseline
We compare and analyze the texts which the hierarchical model has predicted correctly where the baseline predicted wrongly, and vice versa.

We start by diving into the labels which the baseline most frequently got right and the hierarchical model got wrong.

In [230]:
hierarch_hit_baseline_miss = [text[hier_predictions==i][baseline_predictions!=i][labels==i] for i in range(1,12)]
hierarch_miss_baseline_hit = [text[baseline_predictions==i][hier_predictions!=i][labels==i] for i in range(1,12)]
print('Number of times hierarchical labeled a text correctly and baseline labeled it wrong:')
print('(first index represents label 1.1 and last represents 4.2)')
print([len(hierarch_hit_baseline_miss[i]) for i in range(11)])
print(f'Sum: {sum([len(hierarch_hit_baseline_miss[i]) for i in range(11)])}')
print('Number of times baseline labeled a text correctly and hierarchical labeled it wrong:')
print('(first index represents label 1.1 and last represents 4.2)')
print([len(hierarch_miss_baseline_hit[i]) for i in range(11)])
print(f'Sum: {sum([len(hierarch_miss_baseline_hit[i]) for i in range(11)])}')

Number of times hierarchical labeled a text correctly and baseline labeled it wrong:
(first index represents label 1.1 and last represents 4.2)
[1, 2, 17, 28, 4, 6, 8, 1, 0, 1, 6]
Sum: 74
Number of times baseline labeled a text correctly and hierarchical labeled it wrong:
(first index represents label 1.1 and last represents 4.2)
[1, 10, 41, 21, 15, 27, 22, 1, 0, 3, 3]
Sum: 144


### Confusions of the hierarchical model - 2.2
We observed in the previous code cell that the hierarchical model most frequently confuses label 2.2 where baseline gets it right.
We take a closer look at those texts:

In [246]:
print('Label, number of times 2.1 was confused as the label:')
print(hier_predictions[baseline_predictions==3][labels==3][hier_predictions!=3].value_counts())
pd.concat([text[baseline_predictions==3][labels==3][hier_predictions!=3],hier_predictions[baseline_predictions==3][labels==3][hier_predictions!=3]], axis=1)

Label and number of times 2.1 was confused as the label:
7     16
0     15
4      3
5      2
8      2
10     2
11     1
Name: prediction, dtype: int64


Unnamed: 0,text,prediction
110,"Fuck sorry to hear that bro, don't think about...",7
170,I hope there will be a time women take cobras'...,0
233,My girlfriend is mentally retarded. Or if you ...,4
289,And the vast majority of those disdainful indi...,0
470,So you run back to daddy whenever you need hel...,7
478,"The West treats women like small children, i h...",0
484,"It's the advanced version of the ""let's take a...",7
545,No way she's a feminist. She's repairing a veh...,0
605,Another explanation is these women are herding...,7
724,Well said. Single mothers would not get this p...,7


### Confusions of the hierarchical model - 1.2
We observed in the previous code cell that the hierarchical model frequently confuses label 1.2 where baseline gets it right.
We take a closer look at those texts:

In [247]:
print('Label, number of times 1.2 was confused as that label:')
print(hier_predictions[baseline_predictions==2][labels==2][hier_predictions!=2].value_counts())
pd.concat([text[baseline_predictions==2][labels==2][hier_predictions!=2],hier_predictions[baseline_predictions==2][labels==2][hier_predictions!=2]], axis=1)

Label, number of times 1.2 was confused as that label:
1    4
0    4
4    2
Name: prediction, dtype: int64


Unnamed: 0,text,prediction
115,You should've hit her.,1
610,LOL more like the bitch doesn't want to be on ...,4
914,Can't hit girls though......but.... a c@#t pun...,1
1227,Yeah the only way a man can satisfy those wome...,1
1645,I d love to read tomorrow that Susan Collins g...,0
2026,"They need to throw her under the prison, disgu...",0
2037,"What is there to think about? Kill her, kill t...",4
2794,Dress up as Santa Claus this #Halloween .... I...,0
2889,What a bloody simp he must had his penis cut o...,1
3016,GERMANY: Teenage daughter of high-ranking EU o...,0
