Toxic Comment Classification

The purpose of this project is to perform binary classification on natural language to determine if it is safe or unsafe in nature.

Data Processing

This project uses a combination of two Kaggle competition datasets, namely:

In total these datasets consisted of around 2 million internet comments from a variety of sources, with around 10% of this data classified as unsafe. The two datasets were cleaned, and recombined and then balanced out to form a new unbiased dataset, which consists of around 360,000 comments with a 50/50 balance between safe and unsafe classes.

Classification

Four different models were experimented with in this project

Random Forests
Logistic Regression
Multilayer Perceptron
Deep Learning

The results of these techniques were then compared. Additionally a few natural language processing (NLP) techniques were applied to some of the models. These NLP techniques consisted of the following,

Alphanumeric tokens only
Stopword removal
Nouns only

Results

The following table summarizes the results achieved by the best models from each model category.

Method	Accuracy	Precision	Recall
Random Forests	0.854074	0.854281	0.854085
Logistic Regression	0.870507	0.871464	0.870392
Deep Learning	0.890919	0.891543	0.891011

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
data_processing.ipynb		data_processing.ipynb
deep_learning.ipynb		deep_learning.ipynb
neural_networks.ipynb		neural_networks.ipynb
random_forests.ipynb		random_forests.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

data_processing.ipynb

data_processing.ipynb

deep_learning.ipynb

deep_learning.ipynb

neural_networks.ipynb

neural_networks.ipynb

random_forests.ipynb

random_forests.ipynb

Repository files navigation

Toxic Comment Classification

Data Processing

Classification

Results

About

Releases

Packages

Languages

RouzbehMajidi/toxic-comment-classification

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification

Data Processing

Classification

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages