# Toxic Comment Classification Challenge

<font color=red>**Warning:**</font>
The content of many of these comments are incredibly inappropriate and will be offensive. In the [View results](#view-results) section we'll be exploring some of the accuracies of the model predictions which will expose some of these terrible comments.

See https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

In [1]:
# Reload modules before executing user code
%reload_ext autoreload
# Reload all modules (except those excluded by %aimport)
%autoreload 2
# Show plots within this notebook
%matplotlib inline

## Load training and test data into pandas dataframes

In [2]:
PATH='download/'
test_csv = f'{PATH}test.csv'
train_csv = f'{PATH}train.csv'
sample_submission_csv = f'{PATH}sample_submission.csv'

In [3]:
import pandas as pd

train_df = pd.read_csv(train_csv, na_filter=False)
test_df = pd.read_csv(test_csv, na_filter=False)
submission_df = pd.read_csv(sample_submission_csv, nrows=0) # copy column headers

## Explore the data

Examine the labels and how the data is organized within this framework. Let's first add a "none" column to represent comments with no labels.

In [4]:
label_cols = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
train_df['none'] = 1-train_df[label_cols].max(axis=1)

In [5]:
train_df.loc[train_df['threat'] == 1].head(1)

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate,none
79,003217c3eb469ba9,Hi! I am back again!\nLast warning!\nStop undo...,1,0,0,1,0,0,0


We can see that the labels are all in the same scale and won't need to be standardized. Notice how a comment can have multiple labels (e.g., the comment above is both toxic and a threat). Let's now take a quick look at the shape of our dataset to get an idea of scale:

In [6]:
train_df.shape

(159571, 9)

There are 159571 rows, which is essentially the number of comments in our dataset. There are 8 columns, 6 of which correspond to the individual labels we will be using for comment characterization. But how many comments actually fall into each of these categories?

In [7]:
train_df.describe()

Unnamed: 0,toxic,severe_toxic,obscene,threat,insult,identity_hate,none
count,159571.0,159571.0,159571.0,159571.0,159571.0,159571.0,159571.0
mean,0.095844,0.009996,0.052948,0.002996,0.049364,0.008805,0.898321
std,0.294379,0.099477,0.223931,0.05465,0.216627,0.09342,0.302226
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,1.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,1.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,1.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0


There are some interesting statistics here. First, it's clear that zeros dominate our dataset. That can be seen a number of ways, like how each quartile from 25-75 percent are 0, and how the mean in each column is close to zero. Because they are zeros and ones, the mean can be easily converted into a percentage (of ones) by multiplying by 100 (i.e., 9.58% of comments are toxic, 1.0% of comments are severely toxic, 5.3% are obscene, etc.). Lastly, let's do a quick check to make sure none of the values in our dataset are null:

In [8]:
train_df.info() # verify that are no missing values in our dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159571 entries, 0 to 159570
Data columns (total 9 columns):
id               159571 non-null object
comment_text     159571 non-null object
toxic            159571 non-null int64
severe_toxic     159571 non-null int64
obscene          159571 non-null int64
threat           159571 non-null int64
insult           159571 non-null int64
identity_hate    159571 non-null int64
none             159571 non-null int64
dtypes: int64(7), object(2)
memory usage: 11.0+ MB


Great! We have no null values (like NaNs) in our dataset, so it's already very clean. Now that we have a good grip on what we are working with, it'll be good for us to figure out how we wish to actually analyze the data.

## Methods for analysis

This looks like a multilabel text classification problem, which can be solved in a variety of ways.

**(1) Problem transformation methods**

Problem transformation transforms the multilabel input into a representation suitable for single-label classification methods.

* **Binary Relevance** - Independently train one binary classifier for each label. The drawback of this method is that it does not take into account label correlation.

* **Label Powerset** - Generate a new class for every combination of labels and then use multiclass classification. Unlike binary relevance, this method takes into account label correlation, but it leads to a large number of classes and fewer examples per class. 

* **Classifier Chains** - Based on Binary Relevance but predictions of binary classifiers are cascaded along a chain as additional features. This method takes into account label correlation but the order of classifiers in the chain changes results.

**(2) Algorithm adaptation methods**

Algorithm adaption extends existing single-label classifier algorithms to handle multilabel data directly.

- - - - - -

We are going to use the Binary Relevance method in order to classify comments into the different categories. We encourage readers to look into the other methods as well.

## Separate target features (y) from input features (X) 


Use sklearn.model_selection.train_test_split to split training data into validation and train. 

In [9]:
from sklearn.model_selection import train_test_split 

X = train_df['comment_text']
y = train_df[['obscene','insult','toxic','severe_toxic','identity_hate','threat']]

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1)
X_test = test_df['comment_text']

## Create a TF-IDF matrix

Count how many times each word appears in the comments (term frequency) and multiply it by the context-adjusted weight of each word (inverse document frequency). Better explained here: https://www.quora.com/How-does-TfidfVectorizer-work-in-laymans-terms

Here we are transforming our input data into a TF-IDF matrix, aka a document term matrix.

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer

# TODO: Research TfidfVectorizer and figure out what would be useful parameters to pass
vectorizer = TfidfVectorizer()

X_train_docterm = vectorizer.fit_transform(X_train)

# transform validation and test data to have the same shape
# TODO: Ultimately we are going to want to use cross-validation so we don't lose out on training data
X_valid_docterm = vectorizer.transform(X_valid)
X_test_docterm = vectorizer.transform(X_test)

In [11]:
# examine the vocabulary and document-term matrix together
dt_matrix = pd.DataFrame(X_train_docterm.toarray(), columns=vectorizer.get_feature_names())

In [12]:
dt_matrix.shape

(127656, 165609)

In [13]:
dt_matrix.head(1).loc[:, (dt_matrix.head(1) != 0).any(axis=0)]

Unnamed: 0,allowed,attack,be,blocked,but,comments,definitely,editor,editors,if,...,that,the,their,they,this,to,ve,while,will,won
0,0.200271,0.184008,0.154242,0.299486,0.088722,0.159946,0.21692,0.165467,0.156545,0.171023,...,0.066082,0.049339,0.266064,0.223291,0.07312,0.161087,0.133756,0.15515,0.105123,0.18268


## Problem transformation

Train one binary classifier for each label. This is called binary relevance. 

In [14]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

logreg = LogisticRegression(C=4, dual=True)

In [15]:
# Train and check the prediction accuracy on the validation dataset
for label in y_train:
    y_train_col = y_train[label]
    logreg.fit(X_train_docterm, y_train_col)
    y_valid_col = y_valid[label]
    y_pred = logreg.predict(X_valid_docterm)
    print("Validation accuracy for {} comments is {}".format(label, accuracy_score(y_valid_col, y_pred)))  

Validation accuracy for obscene comments is 0.9786620711264296
Validation accuracy for insult comments is 0.9722387592041359
Validation accuracy for toxic comments is 0.9609901300328999
Validation accuracy for severe_toxic comments is 0.9904120319598935
Validation accuracy for identity_hate comments is 0.9924800250665831
Validation accuracy for threat comments is 0.9972426758577472


In [16]:
# Make predictions on test dataset
for label in y_train:
    # predict_proba returns two probabilities: the probability that it has the label or does not
    y_prob_test = logreg.predict_proba(X_test_docterm)[:, 1] # we only want the probability that it has this label
    submission_df[label] = y_prob_test
    print("Prediction for {} comments is {}".format(label, y_prob_test)) 

Prediction for obscene comments is [0.08198686 0.00013583 0.00096841 ... 0.00041707 0.00052045 0.00222638]
Prediction for insult comments is [0.08198686 0.00013583 0.00096841 ... 0.00041707 0.00052045 0.00222638]
Prediction for toxic comments is [0.08198686 0.00013583 0.00096841 ... 0.00041707 0.00052045 0.00222638]
Prediction for severe_toxic comments is [0.08198686 0.00013583 0.00096841 ... 0.00041707 0.00052045 0.00222638]
Prediction for identity_hate comments is [0.08198686 0.00013583 0.00096841 ... 0.00041707 0.00052045 0.00222638]
Prediction for threat comments is [0.08198686 0.00013583 0.00096841 ... 0.00041707 0.00052045 0.00222638]


## View results

<font color=red>**Warning:**</font>
The content of many of these comments are incredibly inappropriate and will be offensive. In this section we'll be exploring some of the accuracies of the model predictions which will expose some of these terrible comments.

In [17]:
# Prepare submission
submission_df['id'] = test_df['id'].tolist()
submission_df.head(1)

Unnamed: 0,id,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,00001cee341fdb12,0.081987,0.081987,0.081987,0.081987,0.081987,0.081987


### Look at a comment with a high probability of being labeled toxic

In [18]:
print(test_df.loc[submission_df['toxic'] > 0.9].head(3))

                    id                                       comment_text
277   00795a46cc1c7816  T IS PEOPLE LIKE YOU THAT MAKE WIKIPEDIA HORRI...
4318  075a19ec9c94796e  WHY ARE YOU BLOCKED ME, YOU LITTHE... BECAUSE ...
8067  0d8d8d475b4b676a                                DIE! DIE! DIE! DIE!


Double-check that the probability these comments are labeled toxic is about 0.9.

In [19]:
print(submission_df.loc[submission_df['id'] == '00795a46cc1c7816'][['toxic']])

       toxic
277  0.99468


Check proabilities for all the labels for this comment.

In [20]:
print(submission_df.loc[submission_df['id'] == '00795a46cc1c7816'][['severe_toxic', 'obscene', 'threat', 'insult']])

     severe_toxic  obscene   threat   insult
277       0.99468  0.99468  0.99468  0.99468


TODO: Figure out why these probabilities are all the same. That's a sign the there's something wrong with the nerual net.

### Look at a comment with a low probability of being labeled toxic

In [21]:
print(test_df.loc[submission_df['toxic'] < 0.05].head(3))

                 id                                       comment_text
1  0000247867823ef7  == From RfC == \n\n The title is fine as it is...
2  00013b17ad220c46  " \n\n == Sources == \n\n * Zawe Ashton on Lap...
3  00017563c3f7919a  :If you have a look back at the source, the in...


In [22]:
print(test_df.loc[submission_df['id'] == '0000247867823ef7'].comment_text.values)

['== From RfC == \n\n The title is fine as it is, IMO.']


## Save results to CSV for submission

In [23]:
submission_df.to_csv('submission.csv', index=False)