<a href="https://colab.research.google.com/github/cusiandrea/datascienceportfolio/blob/main/Toxic_Comments_Filter_portfolio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Project: Toxic Comment Filter

Build a model that can filter user comments based on the degree of language maliciousness:



- Preprocess the text by eliminating the set of tokens that do not make significant contribution at the semantic level
- Transform the text corpus into sequences
- Build a Deep Learning model including recurrent layers for a multilabel classification task
- In prediction time, the model must return a vector containing a 1 or a 0 at each label (toxic, severe_toxic, obscene, threat, insult, identity_hate). In this way, a non-toxic comment will be classified by a vector of only 0s [0,0,0,0,0]. Otherwise, a toxic comment will exhibit at least a 1 among the 6 labels.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd
BASE_URL = "https://s3.eu-west-3.amazonaws.com/profession.ai/datasets/"
df = pd.read_csv(BASE_URL+"Filter_Toxic_Comments_dataset.csv")

In [None]:
df.head()

Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate,sum_injurious
0,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0,0
1,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0,0
2,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0,0
3,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0,0
4,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0,0


In [None]:
df[df['sum_injurious']==2]

Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate,sum_injurious
51,GET FUCKED UP. GET FUCKEEED UP. GOT A DRINK T...,1,0,1,0,0,0,2
58,My Band Page's deletion. You thought I was gon...,1,0,1,0,0,0,2
79,Hi! I am back again!\nLast warning!\nStop undo...,1,0,0,1,0,0,2
86,"Would you both shut up, you don't run wikipedi...",1,0,0,0,1,0,2
168,"You should be fired, you're a moronic wimp who...",1,0,0,0,1,0,2
...,...,...,...,...,...,...,...,...
159253,what do you mean \n\nwhy don't you keep your n...,1,0,1,0,0,0,2
159334,"Horse's ass \n\nSeriously, dude, what's that h...",1,0,1,0,0,0,2
159449,I think he is a gay fag!!!,1,0,0,0,0,1,2
159514,YOU ARE A MISCHIEVIOUS PUBIC HAIR,1,0,0,0,1,0,2


In [None]:
len(df)

159571

## Exploring and preprocessing

The dataset is **strongly imbalanced**, considering that 143.346 samples out of the total 159.571 are labeled as non-toxic. Toxic and non-toxic comments are saved in two different dataframes. In particualar, toxic comment labeled as *severe*, *threat* and *identity hate* are significantly fewer than the other toxic comments.

In [None]:
df['sum_injurious'].value_counts()

sum_injurious
0    143346
1      6360
3      4209
2      3480
4      1760
5       385
6        31
Name: count, dtype: int64

In [None]:
df_nontoxic =df[df['sum_injurious']==0]
df_nontoxic.shape

(143346, 8)

In [None]:
df_toxic =df[df['sum_injurious']>0]
df_toxic.shape

(16225, 8)

In [None]:
df_toxic[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].sum()

toxic            15294
severe_toxic      1595
obscene           8449
threat             478
insult            7877
identity_hate     1405
dtype: int64

As useful tools such as *RandomOverSampling* or *SMOTE* from *Imbalanced Learn* library are not suitable for multilabel classification problems, I proceed with a manual oversampling in order to balance toxic and non toxic comments, and also to reduce the quantity difference among the different kind of toxic comment.

While preseving every comment labeled, at least, *toxic* (there are 15.294 of them), I resample 15.000 comments for the *other labels*. These comments are randomly duplicated in dataframes to append to the *toxic==*1 dataframe.

In [None]:
resample_list = ['severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
df_toxic_resampled = df_toxic[df_toxic['toxic']==1]
for i in resample_list:
  df_1 = df_toxic[df_toxic[i]==1]
  df_2 = df_1.sample(n=15000, random_state=0, replace=True)
  df_toxic_resampled = pd.concat([df_toxic_resampled, df_2], ignore_index=True)

The generally toxic comment are now over 90.000. The difference among labels is still significant but, proportionally, much less imbalanced than before.

In [None]:
df_toxic_resampled.shape

(90294, 8)

In [None]:
df_toxic_resampled[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].sum()

toxic            86363
severe_toxic     28754
obscene          69427
threat           18604
insult           68297
identity_hate    26341
dtype: int64

The same number of non toxic commment are then sampled in a smaller dataset.

In [None]:
df_nontoxic_small = df_nontoxic.sample(n=df_toxic_resampled.shape[0], random_state=0)

In [None]:
df_nontoxic_small

Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate,sum_injurious
118677,"""I do not understand this sentence: """"Several ...",0,0,0,0,0,0,0
136088,"""\nThanks. I don't really mind the attacks. My...",0,0,0,0,0,0,0
52079,""", 29 October 2007 (UTC)\n\nThis is a """"spinou...",0,0,0,0,0,0,0
8219,2010 Formula One season,0,0,0,0,0,0,0
3084,"""Welcome!\n\n \n\nHello, , to Wikipedia! I'm ,...",0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...
44069,Away kit \n\nSomeone needs 2 update the away k...,0,0,0,0,0,0,0
23319,"Well, Tangobot and SQLBot have both been keepi...",0,0,0,0,0,0,0
140538,"""\n\nOrphaned fair use image (Image:Hansan7.pn...",0,0,0,0,0,0,0
77058,"""\nSpeedy It's always the band name, and never...",0,0,0,0,0,0,0


The resampled toxic and non toxic dataframe are therefore merged in a new, **balanced**, dataframe.

In [None]:
df_balanced = pd.concat([df_nontoxic_small, df_toxic_resampled])
df_balanced = df_balanced.reset_index(drop=True)
df_balanced

Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate,sum_injurious
0,"""I do not understand this sentence: """"Several ...",0,0,0,0,0,0,0
1,"""\nThanks. I don't really mind the attacks. My...",0,0,0,0,0,0,0
2,""", 29 October 2007 (UTC)\n\nThis is a """"spinou...",0,0,0,0,0,0,0
3,2010 Formula One season,0,0,0,0,0,0,0
4,"""Welcome!\n\n \n\nHello, , to Wikipedia! I'm ,...",0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...
180583,Fuck you you stupid and gay bastard who thinls...,1,1,1,0,1,1,5
180584,Islam is the fastest growing religion in the w...,1,0,0,0,1,1,3
180585,. I am a stupid whore who sucks dicks all day....,1,1,1,0,1,1,5
180586,So that's the bullshit they teach you in Engla...,1,0,1,0,1,1,4


The comments are saved in the X feature variable, while all the labels are saved in y in order to split the data between training and test sets.

In [None]:
X = df_balanced['comment_text'].values
y = df_balanced[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].values

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

The comments are processed using **TextVectorization**, that allows me to create tokens and the vocabulary, remove useless charachters and produce token sequences with padding.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization

In [None]:
vectorizer = TextVectorization(max_tokens=1000000,
                               standardize='lower_and_strip_punctuation',
                               split='whitespace',
                               output_mode='int')

In [None]:
vectorizer.adapt(X_train)

In [None]:
sequences = vectorizer(X_train)
sequences

<tf.Tensor: shape=(144470, 1403), dtype=int64, numpy=
array([[  101,     3, 12750, ...,     0,     0,     0],
       [  605,    89,    26, ...,     0,     0,     0],
       [    2,   448,   148, ...,     0,     0,     0],
       ...,
       [    2,    15,    23, ...,     0,     0,     0],
       [  268,    13,    12, ...,     0,     0,     0],
       [ 3851,  5694,   130, ...,     0,     0,     0]])>

In [None]:
vocabulary_size = len(vectorizer.get_vocabulary())
vocabulary_size

168597

In [None]:
max_len = sequences.shape[1]

## Model

In order to classify the comments, I define a Recurrent Neural Network, with an input Embedding layer, a LSTM layer and a 6 sized Dense layer, in order to have a 6 element array as output.

The model is then fitted with an early stopping callback and a learning rate of 0.0001.

Among all the tested models, this is the best performing in both validation and testing.



In [None]:
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, SimpleRNN, Embedding, LSTM, GRU, Bidirectional, TimeDistributed, LayerNormalization
from tensorflow.keras.backend import clear_session
from tensorflow.keras.callbacks import EarlyStopping, LearningRateScheduler

In [None]:
clear_session()
model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim = 128, input_shape=(max_len,), mask_zero=True))
model.add(LSTM(64,activation='tanh'))
model.add(Dense(6, activation='sigmoid'))
model.summary()

In [None]:
my_opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=my_opt,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
callback_es = EarlyStopping(monitor='val_accuracy', min_delta=0.001, patience=2, verbose=1, restore_best_weights=True)

In [None]:
model.fit(sequences, y_train, epochs=25, validation_split=0.1, shuffle=True, callbacks=[callback_es])

Epoch 1/25
[1m4064/4064[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 30ms/step - accuracy: 0.8877 - loss: 0.3886 - val_accuracy: 0.8003 - val_loss: 0.2133
Epoch 2/25
[1m4064/4064[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 30ms/step - accuracy: 0.7825 - loss: 0.1925 - val_accuracy: 0.9289 - val_loss: 0.1421
Epoch 3/25
[1m4064/4064[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m122s[0m 30ms/step - accuracy: 0.9328 - loss: 0.1233 - val_accuracy: 0.9205 - val_loss: 0.1045
Epoch 4/25
[1m4064/4064[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m121s[0m 30ms/step - accuracy: 0.9013 - loss: 0.0888 - val_accuracy: 0.8740 - val_loss: 0.0871
Epoch 4: early stopping
Restoring model weights from the end of the best epoch: 2.


<keras.src.callbacks.history.History at 0x7fa17c1e7fa0>

In [None]:
#model.save("/content/drive/MyDrive/Colab Notebooks/Reti Neurali Ricorrenti/Progetto/model_def.keras")

In [None]:
#model = load_model("")

## Model evaluating

The model is now tested on the test set, followed by the classification report and the multi-label confusion matrix.

In [None]:
test_sequences = vectorizer(X_test)
test_sequences

<tf.Tensor: shape=(36118, 1403), dtype=int64, numpy=
array([[    4,    22,   101, ...,     0,     0,     0],
       [  255,   572,     3, ...,     0,     0,     0],
       [ 2129,  1037, 20613, ...,     0,     0,     0],
       ...,
       [   75,   303,    14, ...,     0,     0,     0],
       [ 1315,    11,  1060, ...,     0,     0,     0],
       [  372,    35,    84, ...,     0,     0,     0]])>

The model scores 0.94 in accuracy

In [None]:
model.evaluate(test_sequences, y_test)

[1m1129/1129[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 8ms/step - accuracy: 0.9364 - loss: 0.1434


[0.14531107246875763, 0.9341879487037659]

Predictions array are saved and rounded to integer values

In [None]:
predictions = model.predict(test_sequences)

[1m1129/1129[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 7ms/step


In [None]:
import numpy as np

In [None]:
predictions = np.rint(predictions)

In particular, as reported in the multilabel classification report, the model has good precision and recall performances in assigning the labels.

Looking at the aggregate metrics, the model perform well considering micro, macro, weighted averages, while the samples average hints that there is room for improvement for this model.

In [None]:
from sklearn.metrics import classification_report, multilabel_confusion_matrix

In [None]:
labels = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
print(classification_report(y_test, predictions, target_names=labels))

               precision    recall  f1-score   support

        toxic       0.95      0.96      0.95     17406
 severe_toxic       0.77      0.80      0.78      5817
      obscene       0.93      0.95      0.94     14025
       threat       0.93      0.80      0.86      3806
       insult       0.88      0.94      0.91     13718
identity_hate       0.89      0.74      0.81      5333

    micro avg       0.90      0.91      0.91     60105
    macro avg       0.89      0.87      0.88     60105
 weighted avg       0.90      0.91      0.90     60105
  samples avg       0.43      0.44      0.43     60105



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
ml = multilabel_confusion_matrix(y_test, predictions)
print('ML confusion matrix')
for i in range(len(labels)):
  print(labels[i])
  print(ml[i])

ML confusion matrix
toxic
[[17834   878]
 [  720 16686]]
severe_toxic
[[28890  1411]
 [ 1159  4658]]
obscene
[[21061  1032]
 [  670 13355]]
threat
[[32077   235]
 [  746  3060]]
insult
[[20593  1807]
 [  874 12844]]
identity_hate
[[30303   482]
 [ 1380  3953]]


I defined a simple filter function that returns whether a string is toxic or not, and, if toxic, details on the toxicity are printed.

In [None]:
def toxicity_filter(comment):
  pred = np.rint(model.predict(vectorizer(comment)))
  print(pred[0])
  if np.sum(pred) == 0:
    print('NOT toxic comment')
  else:
    print('Toxicity report:')
    for i in range(len(labels)):
      if pred[0][i]==1:
        print(labels[i],' comment')
  return

A few examples, taken from a dataset found online (https://www.kaggle.com/datasets/reihanenamdari/youtube-toxicity-data)

In [None]:
comment = ['I agree with the protestor']
toxicity_filter(comment)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[0. 0. 0. 0. 0. 0.]
NOT toxic comment


In [None]:
comment = ['CNN mother fuckers....fuck you peace of shit!']
toxicity_filter(comment)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1. 0. 1. 0. 1. 0.]
Toxicity report:
toxic  comment
obscene  comment
insult  comment
