# Predict toxic comments

by Bill Wan, Tom Tsang\
2023/12/02

In [5]:
import pandas as pd
from myst_nb import glue
import os

## Introduction

The digital age, accelerated by the ongoing pandemic, has led to an increase in online activity. A significant 84% of young Canadians are now actively using social media platforms like Facebook for communication. However, this increased online presence has also led to a rise in the propagation of extreme ideologies and hate speech.

This project aims to mitigate the impact of such toxic behavior by detecting and reporting harmful comments on social media. While the industry has seen a trend towards using neural networks for such tasks, with an estimated market size of US$ 26.4 billion by 2024, these models often lack interpretability. [1]

In contrast, this project will utilize a decision tree model, which is known for its interpretability and ease of understanding. Decision trees provide a clear visualization of the decision-making process, making it easier to understand why a particular prediction was made. This transparency is particularly beneficial in areas like toxic comment classification, where understanding the reasoning behind a prediction can provide valuable insights.

By employing a decision tree model, this project aims to create a balance between effective detection of toxic comments and interpretability of the model. This approach will not only contribute to a safer online environment but also promote healthier and more respectful interactions in the digital world.

## Method

### Data

The dataset utilized in this project is sourced from Kaggle's Toxic Comment Classification Challenge 2018 (https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge). It comprises a substantial number of Wikipedia comments that have been manually labeled by human raters to identify toxic behavior. The comments are categorized into six groups: toxic, severe toxic, obscene, threat, insult, and identity hate.

Upon initial examination of the data, a predominant observation is that the majority of comments are clean, i.e., they have target values of all 0. Additionally, certain classes exhibit an insufficient number of comments. For instance, while there are over 15,000 comments labeled as toxic, there are fewer than 500 labeled as threats. The low volume of comments in specific classes poses a challenge, as it can lead models to overfit and yield lower accuracy for the validation and test sets.

This imbalance in class distribution moght  impact the performance of machine learning models, particularly in cases with limited samples for certain classes e.g. overfitting problem, or failure to detect the threat comments.

<iframe src="../results/figures/class_distribution.html" width="600" height="420"></iframe>

### Analysis
We employed a combination of Decision Tree, LightGBM, and Neural Network models in our analysis. We chose these models due to their diverse complexity levels and interpretability, aiming to identify the optimal model that strikes a balance between the two. Feature engineering techniques, including extracting comment length and sentiment scores, were implemented to enhance accuracy. Additionally, we conducted feature preprocessing, such as scaling for numeric features and using count vectorization for the comment text.

Recognizing the highly skewed nature of the data, we addressed the imbalance through Data Augmentation. We utilized the nlpaug package to artificially increase the data volume in minority classes by substituting and swapping synonyms [2]. The augmented comments retained the same target labels as the original ones. The class distribution is visualized in the graph below.

Given the ample size of the dataset, we opted for a train-validation split rather than cross-validation, allocating 80% to the training set and 20% to the validation set. We ensured that the validation set comprehensively covered every target class.

This analysis was conducted using Python.

<iframe src="../results/figures/augmented_class_distribution.html" width="600" height="420"></iframe>

The chart below shows the roc-auc score and the f1 score of the model. In this case, we are particularly concerned about the f1 score, since our models aim to find out the comments that are toxic (or other target classes). But of course, we would not hope to get too much false positives as well.

After comparing the models performance, given that the model fitting time, and the model performance (in the table below), we believe the decision tree gives us an acceptable model. 

In [4]:
validation_score = pd.read_csv('../results/tables/validation_result.csv')
glue("Validation Results", validation_score)

Unnamed: 0,model,label,val_roc_auc,val_f1
0,decision_tree,toxic,0.892512,0.873373
1,decision_tree,severe_toxic,0.922888,0.768508
2,decision_tree,obscene,0.934865,0.908407
3,decision_tree,threat,0.921289,0.756894
4,decision_tree,insult,0.912911,0.871493
5,decision_tree,identity_hate,0.923639,0.781373
6,lightgbm,toxic,0.885844,0.866789
7,lightgbm,severe_toxic,0.886573,0.528753
8,lightgbm,obscene,0.900282,0.862038
9,lightgbm,threat,0.932733,0.43374


In addition, advantageous to the best performing model, the neural network, it allows us to understand why our model will make such prediction, which would be helpful, in cases when we need to explain to someone that why we consider their comments are toxic.

To achieve that, we used the feature importance of the decision tree to show the top 10 features that contribute to the prediction for each class. Note that there are inappropriate language in the graph, so that we did not show it here. But it the graphs are in the github repository.

### Evaluation of the model

In [11]:
dir = '../results/tables'
labels = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
for class_name in labels:
    scores_file = os.path.join(dir, f'{class_name}_test_scores.csv')
    cm_file = os.path.join(dir, f'{class_name}_test_confusion_matrix.csv')
    scores_df = pd.read_csv(scores_file)
    cm_df = pd.read_csv(cm_file)

    print(f'Scores for {class_name} class:')
    glue(f'{class_name}_scores', scores_df)
    print(f'Confusion Matrix for {class_name} class:')
    glue(f'{class_name}_cm', cm_df)

Scores for toxic class:


Unnamed: 0,ROC-AUC,F1 Score
0,0.794911,0.461141


Confusion Matrix for toxic class:


Unnamed: 0,0,1
0,48801,9087
1,1542,4548


Scores for severe_toxic class:


Unnamed: 0,ROC-AUC,F1 Score
0,0.676766,0.171393


Confusion Matrix for severe_toxic class:


Unnamed: 0,0,1
0,62527,1084
1,231,136


Scores for obscene class:


Unnamed: 0,ROC-AUC,F1 Score
0,0.825093,0.489632


Confusion Matrix for obscene class:


Unnamed: 0,0,1
0,55711,4576
1,1011,2680


Scores for threat class:


Unnamed: 0,ROC-AUC,F1 Score
0,0.667213,0.200837


Confusion Matrix for threat class:


Unnamed: 0,0,1
0,63333,434
1,139,72


Scores for insult class:


Unnamed: 0,ROC-AUC,F1 Score
0,0.773075,0.409719


Confusion Matrix for insult class:


Unnamed: 0,0,1
0,55633,4918
1,1277,2150


Scores for identity_hate class:


Unnamed: 0,ROC-AUC,F1 Score
0,0.698036,0.283784


Confusion Matrix for identity_hate class:


Unnamed: 0,0,1
0,62200,1066
1,418,294


We used our Decision Tree model to predict the test data. The ROC-AUC score was reasonable and acceptable. However the F1 score, especially for less common classes like 'threat', wa notably low. Despite our efforts to augment the data, the inherent complexity of text data poses challenges, particularly when dealing with limited training data. While the model performs reasonably well in identifying toxic comments, the difficulties arise in capturing the nuanced relations within the dataset.

### Discussion and Learnings:

Classifying comments is hard because they don't follow a strict pattern, and there are so many different words. It's even tougher when there are only a few comments for certain categories. e have explored various strategies such as feature engineering and data augmentation to address this issue, but it's still hard to catch all the different types of comments.

So, it might be a good idea to explore the integration of pre-trained models like GloVe embeddings. Leveraging such embeddings could enhance our model's understanding of each word and potentially improve its performance in handling the intricacies of diverse comments.

## Reference

1. https://openreview.net/pdf?id=Lp-QFq2QRXA
2. https://github.com/makcedward/nlpaug
