# Toxic Comment Classification Challenge
Identify and classify toxic online comments

![Toxic Comments](https://storage.googleapis.com/kaggle-media/competitions/jigsaw/003-avatar.png)

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

The [Conversation AI](https://conversationai.github.io/) team, a research initiative founded by [Jigsaw](https://jigsaw.google.com/) and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion). So far they’ve built a range of publicly available models served through the [Perspective API](https://perspectiveapi.com/), including toxicity. But the current models still make errors, and they don’t allow users to select which types of toxicity they’re interested in finding (e.g. some platforms may be fine with profanity, but not with other types of toxic content).

In this competition, you’re challenged to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s [current models](https://github.com/conversationai/unintended-ml-bias-analysis). You’ll be using a dataset of comments from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

_Disclaimer: the dataset for this competition contains text that may be considered profane, vulgar, or offensive._

Dataset Description
-------------------

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

*   `toxic`
*   `severe_toxic`
*   `obscene`
*   `threat`
*   `insult`
*   `identity_hate`

You must create a model which predicts a probability of each type of toxicity for each comment.

File descriptions
-----------------

*   **train.csv** - the training set, contains comments with their binary labels
*   **test.csv** - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring.
*   **sample\_submission.csv** - a sample submission file in the correct format
*   **test\_labels.csv** - labels for the test data; value of `-1` indicates it was not used for scoring; (**Note:** file added after competition close!)

Usage
-----

The dataset under [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/), with the underlying comment text being governed by [Wikipedia's CC-SA-3.0](https://creativecommons.org/licenses/by-sa/3.0/)

Link: https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge

In [None]:
import pandas as pd
import numpy as np
from fastai.text.all import *
from tqdm.notebook import tqdm
from sklearn.model_selection import train_test_split

In [None]:
%load_ext nb_black

In [None]:
sample_submission_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/sample_submission.csv"
).set_index("id")
sample_submission_df

In [None]:
test_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/test.csv"
).set_index("id")
test_df

In [None]:
test_labels_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/test_labels.csv"
).set_index("id")
test_labels_df

In [None]:
train_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/train.csv"
).set_index("id")
train_df

In [None]:
train_df[
    ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
].mean()

# Train

In [None]:
BATCH_SIZE = 64

## Toxic

In [None]:
toxic_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "toxic"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

In [None]:
toxic_dls.train.show_batch()

In [None]:
# https://docs.fast.ai/tutorial.text.html
learn = text_classifier_learner(toxic_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
learn.show_results()

In [None]:
learn.save("toxic")

## Severe toxic

In [None]:
severe_toxic_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "severe_toxic"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

In [None]:
learn = text_classifier_learner(
    severe_toxic_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy
)
learn

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
learn.show_results()

In [None]:
learn.save("severe_toxic")

## Obscene

In [None]:
obscene_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "obscene"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

In [None]:
learn = text_classifier_learner(obscene_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
learn.show_results()

In [None]:
learn.save("obscene")

# Threat

In [None]:
threat_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "threat"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

In [None]:
learn = text_classifier_learner(threat_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
learn.show_results()

In [None]:
learn.save("threat")

# Insult

In [None]:
insult_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "insult"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

In [None]:
learn = text_classifier_learner(insult_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
learn.show_results()

In [None]:
learn.save("insult")

# Identity hate

In [None]:
identity_hate_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "identity_hate"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

In [None]:
learn = text_classifier_learner(
    identity_hate_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy
)
learn

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
learn.show_results()

In [None]:
learn.save("identity_hate")