# Test biaslyze with the toxic comments dataset

Data source: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append('/home/tobias/Repositories/biaslyze/')

In [3]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score

## Load and prepare data

In [4]:
df = pd.read_csv("../data/jigsaw-toxic-comment-classification/train.csv"); df.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


In [5]:
# make the classification problem binary
df["target"] = df[["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]].sum(axis=1) > 0

## Train a BoW-model

In [6]:
clf = make_pipeline(TfidfVectorizer(min_df=10, max_features=10000, stop_words="english"), LogisticRegression())

In [7]:
clf.fit(df.comment_text, df.target)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [8]:
train_pred = clf.predict(df.comment_text)
print(accuracy_score(df.target, train_pred))

0.9605755431751384


## Test LIME based bias detection with keywords

In [9]:
from biaslyze.evaluators import LimeBiasEvaluator
from biaslyze.bias_detectors import LimeKeywordBiasDetector

  from .autonotebook import tqdm as notebook_tqdm


In [10]:
bias_detector = LimeKeywordBiasDetector(
    bias_evaluator=LimeBiasEvaluator(n_lime_samples=2000),
    n_top_keywords=10,
    use_tokenizer=True
)

In [11]:
test_texts = df.comment_text.sample(200)
detection_res = bias_detector.detect(texts=test_texts, predict_func=clf.predict_proba)

2023-03-07 18:20:56.374 | INFO     | biaslyze.concept_detectors:detect:33 - Started keyword-based concept detection on 200 texts...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:05<00:00, 37.22it/s]
2023-03-07 18:21:01.753 | INFO     | biaslyze.concept_detectors:detect:49 - Done. Found 48 texts with protected concepts.
2023-03-07 18:21:01.754 | INFO     | biaslyze.evaluators:evaluate:42 - Started bias detection on 48 samples...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [01:39<00:00,  2.08s/it]


In [12]:
detection_res.summary()

Detected 4 samples with potential issues.
    Potentially problematic concepts detected: [('gender', 3), ('nationality', 1)]
    Based on keywords: [('mother', 2), ('his', 1), ('english', 1)].


In [13]:
detection_res.details(group_by_concept=True)

Concept: gender
[{'reason': ['his'],
  'text': 'has determined for that individual this particular trial for his '
          'betterment in the end, whether in this life or in the life to come. '
          'That is also why we say that G-d'},
 {'reason': ['mother'],
  'text': "I'm back mother trucker, did you miss me? 94.192.243.100"},
 {'reason': ['mother'],
  'text': 'I Should be Surprised\n'
          '\n'
          "But I'm not. Its exactly the kind of spineless response I "
          "anticipated from you. Look Schumin, I don't care about you "
          "remotely. I'm not here to bring you some sort of pain, I don't care "
          'anything, at all about you. What I do care about, and care about '
          'deeply is that the number one-stop shop for knowledge on the '
          'internet is even fractionally run by someone like you. This is a '
          "serious project, and attitudes like yours aren't needed. You are "
          "one of these people who wouldn't last a day 

In [14]:
test_df = df.iloc[test_texts.index]

In [15]:
test_df[test_df.target].reset_index().comment_text[4]

"Stupid people and words don't mix. Most people never read a book once they quit going to school and enter the adult world. This article proves it wholeheartedly."

## Testing a sentiment analysis model from huggingface

In [16]:
from transformers import pipeline
from torch.utils.data import Dataset


classifier = pipeline(
    model="distilbert-base-uncased-finetuned-sst-2-english",
    top_k=None,
    padding=True,
    truncation=True
)

In [17]:
class MyDataset(Dataset):
    def __init__(self, data):
        super().__init__()
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):
        return self.data[i]


def predict_sentiment(texts):
    data = MyDataset(texts)
    proba = []
    for res in classifier(data):
        proba_array = []
        for p in sorted(res, key=lambda d: d['label'], reverse=True):
            proba_array.append(p.get("score"))
        proba.append(np.array(proba_array))
    return np.array(proba) / np.array(proba).sum(axis=1)[:,None]

In [60]:
bias_detector = LimeKeywordBiasDetector(
    bias_evaluator=LimeBiasEvaluator(n_lime_samples=500),
    n_top_keywords=10,
    use_tokenizer=True
)

In [61]:
test_texts = df.comment_text.sample(50)
detection_res = bias_detector.detect(texts=test_texts, predict_func=predict_sentiment)

2023-03-07 18:51:17.965 | INFO     | biaslyze.concept_detectors:detect:33 - Started keyword-based concept detection on 50 texts...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 56.40it/s]
2023-03-07 18:51:18.858 | INFO     | biaslyze.concept_detectors:detect:49 - Done. Found 14 texts with protected concepts.
2023-03-07 18:51:18.859 | INFO     | biaslyze.evaluators:evaluate:42 - Started bias detection on 14 samples...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [10:18<00:00, 44.15s/it]


In [62]:
detection_res.summary()

Detected 2 samples with potential issues.
    Potentially problematic concepts detected: [('religion', 1), ('gender', 1)]
    Based on keywords: [('muslim', 1), ('he', 1)].


In [63]:
detection_res.details(group_by_concept=True)

Concept: religion
[{'reason': ['muslim'], 'text': 'Fuck my stupid muslim ass!!!!!!!!!'}]
Concept: gender
[{'reason': ['he'], 'text': '"But he does say ""NOICE!""...\n\n"'}]


## !! Very Experimental !!: Test masked language model based bias detection with keywords

In [55]:
from biaslyze.bias_detectors import MaskedKeywordBiasDetector
from biaslyze.concept_detectors import KeywordConceptDetector

In [56]:
bias_detector = MaskedKeywordBiasDetector(n_resample_keywords=15, use_tokenizer=True)

In [57]:
detection_res = bias_detector.detect(
    texts=df.comment_text.sample(1000),
    predict_func=predict_sentiment
)

2023-03-07 18:37:55.292 | INFO     | biaslyze.concept_detectors:detect:33 - Started keyword-based concept detection on 1000 texts...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:23<00:00, 41.72it/s]
2023-03-07 18:38:19.267 | INFO     | biaslyze.concept_detectors:detect:49 - Done. Found 238 texts with protected concepts.
2023-03-07 18:38:19.268 | INFO     | biaslyze.evaluators:evaluate:98 - Started bias detection on 238 samples...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 238/238 [11:05<00:00,  2.79s/it]


In [58]:
detection_res.summary()

Detected 23 samples with potential issues.
    Potentially problematic concepts detected: [('gender', 21), ('religion', 1), ('nationality', 1)]
    Based on keywords: [('he', 7), ('his', 7), ('her', 4), ('man', 4), ('guys', 3), ('boy', 2), ('guy', 2), ('she', 2), ('christian', 2), ('husband', 1), ('son', 1), ('woman', 1), ('women', 1), ('men', 1), ('english', 1)].


In [59]:
detection_res.details()

''whatever you say gay boy. atleast i have a life and im not fucking unchanging shit on wikiepedia'' might contain bias ['gender']; reasons: ['boy', 'boy']
''He was a great player, he needs a much better page, many average players have several sections.'' might contain bias ['gender']; reasons: ['he']
''Yeah, darn right I am hard to block.  This is to fun.  You guys should have left well enough alone.  You helped bring this on Johntx.  Now the pages that you love will suffer because of it.  Tough beans.'' might contain bias ['gender']; reasons: ['guys']
''"
Just like I've been saying, you guys disagree.  — (Talk) "'' might contain bias ['gender']; reasons: ['guys']
''THE GUYS WHO INTERESTED IN THE ARTICLE TO BE IMPROVED ARE 
MUHAMMAD YUSUF ATTARI,SHAHIBA,MADINA MADINA AND ETC'' might contain bias ['gender']; reasons: ['guys']
''Ok, ok, I admit it. I'm that other guy you blocked and I created a new account. I'm sorry, please don't block me...'' might contain bias ['gender']; reasons: ['