Toxic Comment Classification Challenge

This repository has the solutions we developed for he Toxic Comment Classification Challenge on Kaggle. The challenge was to predict different tags for online comments. The possible tags for a comment were:

toxic
severe_toxic
obscene
threat
insult
identity_hate

Solution

Our approach was to implement three different sequence models to compare their performance. The three models were an implementation of BERT, a bidirectional GRU followed by a Capsule layer, and a baseline model with an LSTM layer. The mean column-wise ROC AUC score on Kaggle for each of the models was:

BERT: 0.98437
CapsuleNet: 0.9765
Baseline: 0.9739

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
bert_reimplementations		bert_reimplementations
capsule_net		capsule_net
data		data
models		models
notebooks		notebooks
plots		plots
toxic_comments		toxic_comments
.gitignore		.gitignore
README.md		README.md
avg_auroc.py		avg_auroc.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification Challenge

Solution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification Challenge

Solution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages