# Toxic comment classification, BERT solution

## Description

There is a data with comments and their classification in column "toxic". It is need to build a prediction model with F1 score more than 0.75

## Data downloading

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from tqdm.notebook import tqdm
tqdm.pandas()
!pip install transformers -q

In [None]:
df = pd.read_csv('/content/drive/MyDrive/colab_notebooks/13-toxic-comments/Data/toxic_comments.csv')

In [None]:
df.sample(3)

Unnamed: 0,text,toxic
30223,Good job. It's now much better referenced. I...,0
59775,Circumcision in the 1880s (back in the days wh...,0
87624,"Congrats\nSorry for the lateness of this, but ...",0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159571 entries, 0 to 159570
Data columns (total 2 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   text    159571 non-null  object
 1   toxic   159571 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 2.4+ MB


In [None]:
print('Share of  toxic comments in all data: {:.1%}'. format(df['toxic'].mean()))

Share of  toxic comments in all data: 10.2%


In [None]:
RANDOM = 22
df_sample = df.sample(10000, random_state=RANDOM)
print('Share of toxic comments in the data sample: {:.1%}'. format(df_sample['toxic'].mean()))

Share of toxic comments in the data sample: 10.7%


### Conclusion

To check models it was created DataFrame `df_sample`, it has the same percent of toxic comments as the original data

## BERT models

In [None]:
def is_toxic(text, tokenizer, model):
    logits = model(
        **tokenizer(text,
                    return_tensors        ='pt', # if remove it will return lists instead of PyTorch tensors
                    padding               =True, # add paddings (empty tokens) to the sentences to make up the length
                    truncation            =True, # trunc if embendings more than 512
                    max_length            =512,  
                    add_special_tokens    =True, # For tokens not appearing in the original vocabulary, it is designed that they should be replaced with a special token [UNK], which stands for unknown token
                    return_attention_mask =True, # generate the attention mask
                    )
        ).logits
    return logits.argmax().item()

Function `is_toxic(text, tokenizer, model)` is taking a text, getting a prediction and return the result

### autonlp-toxic-new-30516963

https://huggingface.co/abhishek/autonlp-toxic-new-30516963

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("abhishek/autonlp-toxic-new-30516963")
model = AutoModelForSequenceClassification.from_pretrained("abhishek/autonlp-toxic-new-30516963")

Some model test

In [None]:
text_index_20097 = df.loc[20097]['text']
print(text_index_20097)
text_index_147654 = df.loc[147654]['text']
print(text_index_147654)

been hrrassing people. FUCK YOU FAGGOT
Lebanese general election, 2009 

Good job on the edit to the Lebanese general elections, 2009. Way to WP:Be Bold!!!!!!! If you need any help doing stuff on Wikipedia feel free to contact me on my Talk page.


In [None]:
encoded_input = tokenizer(text_index_20097, return_tensors='pt')
output = model(**encoded_input)
print('text_index_20097 is toxic:', output.logits.argmax().item())

encoded_input = tokenizer(text_index_147654, return_tensors='pt')
output = model(**encoded_input)
print('text_index_147654 is toxic:', output.logits.argmax().item())

text_index_20097 is toxic: 1
text_index_147654 is toxic: 0


Prediction

In [None]:
df_sample['prediction'] = df_sample['text'].progress_apply(lambda x: is_toxic(x, tokenizer, model))

  0%|          | 0/10000 [00:00<?, ?it/s]

In [None]:
f1_score(df_sample['toxic'], df_sample['prediction'])

0.8693009118541033

### roberta_toxicity_classifier

https://huggingface.co/SkolkovoInstitute/roberta_toxicity_classifier

In [None]:
from transformers import RobertaTokenizer, RobertaForSequenceClassification

tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')

Downloading:   0%|          | 0.00/780k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/794 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/478M [00:00<?, ?B/s]

Some weights of the model checkpoint at SkolkovoInstitute/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Some model test

In [None]:
text_index_20097 = df.loc[20097]['text']
print(text_index_20097)
text_index_147654 = df.loc[147654]['text']
print(text_index_147654)

been hrrassing people. FUCK YOU FAGGOT
Lebanese general election, 2009 

Good job on the edit to the Lebanese general elections, 2009. Way to WP:Be Bold!!!!!!! If you need any help doing stuff on Wikipedia feel free to contact me on my Talk page.


In [None]:
encoded_input = tokenizer(text_index_20097, return_tensors='pt')
output = model(**encoded_input)
print('text_index_20097 is toxic:', output.logits.argmax().item())

encoded_input = tokenizer(text_index_147654, return_tensors='pt')
output = model(**encoded_input)
print('text_index_147654 is toxic:', output.logits.argmax().item())

text_index_20097 is toxic: 1
text_index_147654 is toxic: 0


Prediction

In [53]:
df_sample['prediction'] = df_sample['text'].progress_apply(lambda x: is_toxic(x, tokenizer, model))

  0%|          | 0/10000 [00:00<?, ?it/s]

In [54]:
f1_score(df_sample['toxic'], df_sample['prediction'])

0.8715244487056567

### distilbert-base-uncased-finetuned-sst-2-english

https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english

In [55]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Some model test

In [None]:
text_index_20097 = df.loc[20097]['text']
print(text_index_20097)
text_index_147654 = df.loc[147654]['text']
print(text_index_147654)

In [None]:
encoded_input = tokenizer(text_index_20097, return_tensors='pt')
output = model(**encoded_input)
print('text_index_20097 is toxic:', output.logits.argmax().item())

encoded_input = tokenizer(text_index_147654, return_tensors='pt')
output = model(**encoded_input)
print('text_index_147654 is toxic:', output.logits.argmax().item())

It is seems to be that model is not suitable for toxic comments, or the logits were changed. So, let try model with argmin() or just 1 - argmax()

In [58]:
df_sample_small = df.sample(1000, random_state=RANDOM)
df_sample_small['toxic'].sum() / len(df_sample_small)

0.104

Prediction

In [59]:
df_sample_small['prediction'] = df_sample_small['text'].progress_apply(lambda x: 1 - is_toxic(x, tokenizer, model))

  0%|          | 0/1000 [00:00<?, ?it/s]

In [None]:
f1_score(df_sample_small['toxic'], df_sample_small['prediction'])

## Custom classification model, based on BERT embeddings

https://github.com/UKPLab/sentence-transformers

In [None]:
!pip install -U sentence-transformers -q

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

In [None]:
df_train = df[~df.isin(df_sample)].dropna()

In [64]:
df_train['embedding'] = df_train['text'].progress_apply(lambda x: model.encode(x))

  0%|          | 0/149571 [00:00<?, ?it/s]

In [65]:
model_logistic = LogisticRegression(random_state=RANDOM,
                                      solver='liblinear')

In [69]:
model_logistic.fit(df_train['embedding'].tolist(), df_train['toxic'])

LogisticRegression(random_state=22, solver='liblinear')

In [70]:
df_sample['embedding'] = df_sample['text'].progress_apply(lambda x: model.encode(x))

  0%|          | 0/10000 [00:00<?, ?it/s]

In [71]:
f1_score(df_sample['toxic'], model_logistic.predict(df_sample['embedding'].tolist()))

0.7156348373557188

# Conclusion

To check score models it was created DataFrame `df_sample`, it has the same percent of toxic comments as the original data. It was created function `is_toxic(text, tokenizer, model)` which is taking a text, splitting it, getting a prediction and return the result.

3 BERT models were considered:
- https://huggingface.co/abhishek/autonlp-toxic-new-30516963
- https://huggingface.co/SkolkovoInstitute/roberta_toxicity_classifier
- https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english

And one classification model that showed a low score.

The best F1 score 0.87 (and the best speed) on the sample data was shown by model "abhishek/autonlp-toxic-new-30516963".