# Using CapsNet and Keras to Identify Toxic Online Comments

[Nick Walsh](https://twitter.com/thenickwalsh) 
-- Developer Evangelist, [Datmo](https://datmo.com)

### Background

The purpose of this notebook is to showcase the use of **Capsule Layers** / **CapsNet**, particularly in solving an NLP problem, where contextuality and superposition of data is important. You can read more CapsNets here [1](https://towardsdatascience.com/capsule-neural-networks-are-here-to-finally-recognize-spatial-relationships-693b7c99b12), [2](https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc), [3](https://en.wikipedia.org/wiki/Capsule_neural_network). In this case, we'll be attempting to classify online comments as toxic or nontoxic. Toxic comments will also have a further classification, delineating between 1 of a few different "styles" of toxicity mentioned below. We'll be using **Datmo**'s [open source CLI](https://github.com/datmo/datmo) to help us get our environments sorted out and to track our experiment results and repository states along the way. 

The specific goal of this model, as mentioned in the original [Kaggle competition](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) prompt, is to create a "multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate". The __training and test datasets__ are comprised of comments from Wikipedia’s talk page edits.

This notebook is a fork of the final [CapsNet+GRU kernel](https://www.kaggle.com/chongjiujjin/capsule-net-with-gru) submitted to the competition by [chongjiujjin](https://www.kaggle.com/chongjiujjin/).

---

## Training the Model

We can easily start importing libraries without worrying about installing them on our system, since Datmo uses Docker under the hood to setup an environment for our notebook within a container.

In [86]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
from keras.layers import Dense,Input,LSTM,Bidirectional,Activation,Conv1D,GRU
from keras.callbacks import Callback
from keras.layers import Dropout,Embedding,GlobalMaxPooling1D, MaxPooling1D, Add, Flatten
from keras.preprocessing import text, sequence
from keras.layers import GlobalAveragePooling1D, GlobalMaxPooling1D, concatenate, SpatialDropout1D
from keras import initializers, regularizers, constraints, optimizers, layers, callbacks
from keras.callbacks import EarlyStopping,ModelCheckpoint
from keras.models import Model
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
print(os.listdir("input"))

# Any results you write to the current directory are saved as output.

['train.csv', 'crawl-300d-2M.vec', 'glove.840B.300d.txt', 'test.csv', 'test_labels.csv']


In NLP, an _embedding file_ contains dense quantified relationships between words, conveying more value than sparse techniques such as 'bag of words'. We can leverage these relationships in an _embedding layer_ later on in our model training.

In [28]:
EMBEDDING_FILE = 'input/glove.840B.300d.txt'

train= pd.read_csv('input/train.csv')
test = pd.read_csv('input/test.csv')

In [29]:
# Extract the comment text from the input data (which contains indexing IDs that aren't meaningful to us)
train["comment_text"].fillna("fillna")
test["comment_text"].fillna("fillna")
X_train = train["comment_text"].str.lower()
y_train = train[["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]].values

X_test = test["comment_text"].str.lower()

In [9]:
# Define model hyperparameters
max_features=100000
maxlen=200
embed_size=300

In [10]:
# Define optimization metric scoring function
class RocAucEvaluation(Callback):
    def __init__(self, validation_data=(), interval=1):
        super(Callback, self).__init__()

        self.interval = interval
        self.X_val, self.y_val = validation_data

    def on_epoch_end(self, epoch, logs={}):
        if epoch % self.interval == 0:
            y_pred = self.model.predict(self.X_val, verbose=0)
            score = roc_auc_score(self.y_val, y_pred)
            print("\n ROC-AUC - epoch: {:d} - score: {:.6f}".format(epoch+1, score))

In [32]:
# Tokenize input for use in model (NLP technique)
tok=text.Tokenizer(num_words=max_features,lower=True)
tok.fit_on_texts(list(X_train)+list(X_test))
X_train=tok.texts_to_sequences(X_train)
X_test=tok.texts_to_sequences(X_test)
x_train=sequence.pad_sequences(X_train,maxlen=maxlen)
x_test=sequence.pad_sequences(X_test,maxlen=maxlen)

In [12]:
# Load the GloVe embeddings
embeddings_index = {}
with open(EMBEDDING_FILE,encoding='utf8') as f:
    for line in f:
        values = line.rstrip().rsplit(' ')
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs


In [13]:
word_index = tok.word_index
#prepare embedding matrix
num_words = min(max_features, len(word_index) + 1)
embedding_matrix = np.zeros((num_words, embed_size))
for word, i in word_index.items():
    if i >= max_features:
        continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector

In [14]:
# Define model architecture. This implementation is written entirely using Keras.

from keras.layers import K, Activation
from keras.engine import Layer
from keras.layers import Dense, Input, Embedding, Dropout, Bidirectional, GRU, Flatten, SpatialDropout1D
gru_len = 128
Routings = 5
Num_capsule = 10
Dim_capsule = 16
dropout_p = 0.25
rate_drop_dense = 0.28

def squash(x, axis=-1):
    # s_squared_norm is really small
    # s_squared_norm = K.sum(K.square(x), axis, keepdims=True) + K.epsilon()
    # scale = K.sqrt(s_squared_norm)/ (0.5 + s_squared_norm)
    # return scale * x
    s_squared_norm = K.sum(K.square(x), axis, keepdims=True)
    scale = K.sqrt(s_squared_norm + K.epsilon())
    return x / scale


# A Capsule Implement with Pure Keras
class Capsule(Layer):
    def __init__(self, num_capsule, dim_capsule, routings=3, kernel_size=(9, 1), share_weights=True,
                 activation='default', **kwargs):
        super(Capsule, self).__init__(**kwargs)
        self.num_capsule = num_capsule
        self.dim_capsule = dim_capsule
        self.routings = routings
        self.kernel_size = kernel_size
        self.share_weights = share_weights
        if activation == 'default':
            self.activation = squash
        else:
            self.activation = Activation(activation)

    def build(self, input_shape):
        super(Capsule, self).build(input_shape)
        input_dim_capsule = input_shape[-1]
        if self.share_weights:
            self.W = self.add_weight(name='capsule_kernel',
                                     shape=(1, input_dim_capsule,
                                            self.num_capsule * self.dim_capsule),
                                     # shape=self.kernel_size,
                                     initializer='glorot_uniform',
                                     trainable=True)
        else:
            input_num_capsule = input_shape[-2]
            self.W = self.add_weight(name='capsule_kernel',
                                     shape=(input_num_capsule,
                                            input_dim_capsule,
                                            self.num_capsule * self.dim_capsule),
                                     initializer='glorot_uniform',
                                     trainable=True)

    def call(self, u_vecs):
        if self.share_weights:
            u_hat_vecs = K.conv1d(u_vecs, self.W)
        else:
            u_hat_vecs = K.local_conv1d(u_vecs, self.W, [1], [1])

        batch_size = K.shape(u_vecs)[0]
        input_num_capsule = K.shape(u_vecs)[1]
        u_hat_vecs = K.reshape(u_hat_vecs, (batch_size, input_num_capsule,
                                            self.num_capsule, self.dim_capsule))
        u_hat_vecs = K.permute_dimensions(u_hat_vecs, (0, 2, 1, 3))
        # final u_hat_vecs.shape = [None, num_capsule, input_num_capsule, dim_capsule]

        b = K.zeros_like(u_hat_vecs[:, :, :, 0])  # shape = [None, num_capsule, input_num_capsule]
        for i in range(self.routings):
            b = K.permute_dimensions(b, (0, 2, 1))  # shape = [None, input_num_capsule, num_capsule]
            c = K.softmax(b)
            c = K.permute_dimensions(c, (0, 2, 1))
            b = K.permute_dimensions(b, (0, 2, 1))
            outputs = self.activation(K.batch_dot(c, u_hat_vecs, [2, 2]))
            if i < self.routings - 1:
                b = K.batch_dot(outputs, u_hat_vecs, [2, 3])

        return outputs

    def compute_output_shape(self, input_shape):
        return (None, self.num_capsule, self.dim_capsule)


def get_model():
    input1 = Input(shape=(maxlen,))
    embed_layer = Embedding(max_features,
                            embed_size,
                            input_length=maxlen,
                            weights=[embedding_matrix],
                            trainable=False)(input1)
    embed_layer = SpatialDropout1D(rate_drop_dense)(embed_layer)

    x = Bidirectional(
        GRU(gru_len, activation='relu', dropout=dropout_p, recurrent_dropout=dropout_p, return_sequences=True))(
        embed_layer)
    capsule = Capsule(num_capsule=Num_capsule, dim_capsule=Dim_capsule, routings=Routings,
                      share_weights=True)(x)
    # output_capsule = Lambda(lambda x: K.sqrt(K.sum(K.square(x), 2)))(capsule)
    capsule = Flatten()(capsule)
    capsule = Dropout(dropout_p)(capsule)
    output = Dense(6, activation='sigmoid')(capsule)
    model = Model(inputs=input1, outputs=output)
    model.compile(
        loss='binary_crossentropy',
        optimizer='adam',
        metrics=['accuracy'])
    model.summary()
    return model

In [15]:
# Instantiate architecture
model = get_model()

# Define training job parameters
batch_size = 256
epochs = 3

#Define train-test data split for scoring model
X_tra, X_val, y_tra, y_val = train_test_split(x_train, y_train, train_size=0.95, random_state=233)
RocAuc = RocAucEvaluation(validation_data=(X_val, y_val), interval=1)

Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 200)               0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 200, 300)          30000000  
_________________________________________________________________
spatial_dropout1d_1 (Spatial (None, 200, 300)          0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 200, 256)          329472    
_________________________________________________________________
capsule_1 (Capsule)          (None, 10, 16)            40960     
_________________________________________________________________
flatten_1 (Flatten)          (None, 160)               0         
___________________________________________________________



In [16]:
# Fit model (will take some time)
hist = model.fit(X_tra, y_tra, batch_size=batch_size, epochs=1, validation_data=(X_val, y_val),
                 callbacks=[RocAuc], verbose=1)

Train on 151592 samples, validate on 7979 samples
Epoch 1/1
 ROC-AUC - epoch: 1 - score: 0.976220


In [17]:
# Save weights from the model, allows prediction without retraining and sharing model with others.
model.save_weights('best.hdf5')

### Let's create a snapshot of our model and training results!

Great, we've been able to recreate the model on the training data! There are a few things we're going to want to do now.

**Save your notebook!**

 * ` File --> Save and Checkpoint`


**Create a snapshot** (in your root project folder)
```bash
$ datmo snapshot create -m "Original capsulenet + GRU classifier" --stats acc:0.9800 --config batch_size:256 --config epochs:3
```

**Visualize the snapshot we just took**

```bash
$ datmo snapshot ls
+------------------------------------------+---------------------+------------------------------------------+---------------------+--------------------------------------+-------+
|                    id                    |      created at     |                  config                  |        stats        |               message                | label |
+------------------------------------------+---------------------+------------------------------------------+---------------------+--------------------------------------+-------+
| 7a21e186c3e6e778455eb95f27b9cb94c41bd3c6 | 2018-06-08 01:59:21 | {u'epochs': u'3', u'batch_size': u'256'} | {u'acc': u'0.9800'} | Original capsulenet + GRU classifier |  None |
+------------------------------------------+---------------------+------------------------------------------+---------------------+--------------------------------------+-------+
```

### Creating predictions for the Kaggle submission data

Given a test set from the competition, we can use our model to create predictions for classifying the unlabeled  comments present in `test.csv`.

In [36]:
# Perform prediction on the Kaggle test set previously imported
y_pred = model.predict(x_test, batch_size=1024, verbose=1)



In [64]:
# Create a pandas dataframe from the model predictions (prev numpy array)
prediction_df = pd.DataFrame(y_pred, columns=["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"])
prediction_df.head(15)

Unnamed: 0,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0.977612,0.394969,0.95055,0.099204,0.901001,0.216824
1,0.004718,0.001969,0.002531,0.002945,0.002312,0.002568
2,0.003458,0.002344,0.002853,0.003776,0.002102,0.002878
3,0.003513,0.002292,0.00277,0.003654,0.002206,0.002168
4,0.009126,0.001767,0.003378,0.00232,0.003045,0.002157
5,0.003855,0.002208,0.002577,0.003744,0.002179,0.002267
6,0.017392,0.001712,0.003295,0.002054,0.004815,0.002133
7,0.633773,0.011423,0.094458,0.006996,0.210926,0.021639
8,0.051208,0.002009,0.007375,0.002079,0.0113,0.003268
9,0.004697,0.001848,0.002821,0.002583,0.002378,0.002094


In [65]:
# Let's combine the original comments with the model predictions so we can more easily make sense of the results.
combined_df = test.join(prediction_df)
combined_df.head(15)

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,00001cee341fdb12,Yo bitch Ja Rule is more succesful then you'll...,0.977612,0.394969,0.95055,0.099204,0.901001,0.216824
1,0000247867823ef7,== From RfC == \n\n The title is fine as it is...,0.004718,0.001969,0.002531,0.002945,0.002312,0.002568
2,00013b17ad220c46,""" \n\n == Sources == \n\n * Zawe Ashton on Lap...",0.003458,0.002344,0.002853,0.003776,0.002102,0.002878
3,00017563c3f7919a,":If you have a look back at the source, the in...",0.003513,0.002292,0.00277,0.003654,0.002206,0.002168
4,00017695ad8997eb,I don't anonymously edit articles at all.,0.009126,0.001767,0.003378,0.00232,0.003045,0.002157
5,0001ea8717f6de06,Thank you for understanding. I think very high...,0.003855,0.002208,0.002577,0.003744,0.002179,0.002267
6,00024115d4cbde0f,Please do not add nonsense to Wikipedia. Such ...,0.017392,0.001712,0.003295,0.002054,0.004815,0.002133
7,000247e83dcc1211,:Dear god this site is horrible.,0.633773,0.011423,0.094458,0.006996,0.210926,0.021639
8,00025358d4737918,""" \n Only a fool can believe in such numbers. ...",0.051208,0.002009,0.007375,0.002079,0.0113,0.003268
9,00026d1092fe71cc,== Double Redirects == \n\n When fixing double...,0.004697,0.001848,0.002821,0.002583,0.002378,0.002094


In [66]:
# Saving our dataframe to file
combined_df.to_csv('kaggle_results.csv', index=False)

### We've finished writing out our Kaggle results, let's take a snapshot.

We can create another Datmo snapshot here to save the state of our environment, notebook, and files now that we're finished with our Kaggle results. This means we could always revert back to this snapshot's state should we decide to keep tinkering, or simply have a clean checkpoint for others to revert to should they want to reproduce our work up to this point.

**Save your notebook!**

 * ` File --> Save and Checkpoint`


**Create a snapshot** (in your root project folder)
```bash
$ datmo snapshot create -m "CapsNet prediction on Kaggle test data" --stats time:243s --config results_file:kaggle_results.csv 
```

**Visualize the snapshot we just took** 

Notice, we can see the snapshot we just created, as well as the one we took immediately following training!

```bash
$ datmo snapshot ls
+------------------------------------------+---------------------+------------------------------------------+---------------------+----------------------------------------+-------+
|                    id                    |      created at     |                  config                  |        stats        |                message                 | label |
+------------------------------------------+---------------------+------------------------------------------+---------------------+----------------------------------------+-------+
| f398e5f6c50fc5a66f45bf77e81bcbcdeaafcdb8 | 2018-06-08 23:04:05 | {u'results_file': u'kaggle_results.csv'} |  {u'time': u'243s'} | CapsNet prediction on Kaggle test data |  None |
| 7a21e186c3e6e778455eb95f27b9cb94c41bd3c6 | 2018-06-08 01:59:21 | {u'epochs': u'3', u'batch_size': u'256'} | {u'acc': u'0.9800'} |  Original capsulenet + GRU classifier  |  None |
+------------------------------------------+---------------------+------------------------------------------+---------------------+----------------------------------------+-------+
```

---
# Using our saved model to predict on new data

Although the Kaggle competition is finished, we can use our model to classify other comments and see what it thinks!

Because the model weights are available (either saved by the user after training, or using the weights provided from the repository), we can perform this classification without needing to retrain the model.

## 1. Instantiate the model architecture

This cell is only necessary if you are running the notebook without performing the training in the same session.
Note: because we're using a custom layer (Capsule), we'll need to first redefine the architecture before
loading in the weights.

In [79]:
from keras.layers import K, Activation
from keras.engine import Layer
from keras.layers import Dense, Input, Embedding, Dropout, Bidirectional, GRU, Flatten, SpatialDropout1D
gru_len = 128
Routings = 5
Num_capsule = 10
Dim_capsule = 16
dropout_p = 0.25
rate_drop_dense = 0.28

def squash(x, axis=-1):
    # s_squared_norm is really small
    # s_squared_norm = K.sum(K.square(x), axis, keepdims=True) + K.epsilon()
    # scale = K.sqrt(s_squared_norm)/ (0.5 + s_squared_norm)
    # return scale * x
    s_squared_norm = K.sum(K.square(x), axis, keepdims=True)
    scale = K.sqrt(s_squared_norm + K.epsilon())
    return x / scale


# A Capsule Layer implementation with pure Keras
class Capsule(Layer):
    def __init__(self, num_capsule, dim_capsule, routings=3, kernel_size=(9, 1), share_weights=True,
                 activation='default', **kwargs):
        super(Capsule, self).__init__(**kwargs)
        self.num_capsule = num_capsule
        self.dim_capsule = dim_capsule
        self.routings = routings
        self.kernel_size = kernel_size
        self.share_weights = share_weights
        if activation == 'default':
            self.activation = squash
        else:
            self.activation = Activation(activation)

    def build(self, input_shape):
        super(Capsule, self).build(input_shape)
        input_dim_capsule = input_shape[-1]
        if self.share_weights:
            self.W = self.add_weight(name='capsule_kernel',
                                     shape=(1, input_dim_capsule,
                                            self.num_capsule * self.dim_capsule),
                                     # shape=self.kernel_size,
                                     initializer='glorot_uniform',
                                     trainable=True)
        else:
            input_num_capsule = input_shape[-2]
            self.W = self.add_weight(name='capsule_kernel',
                                     shape=(input_num_capsule,
                                            input_dim_capsule,
                                            self.num_capsule * self.dim_capsule),
                                     initializer='glorot_uniform',
                                     trainable=True)

    def call(self, u_vecs):
        if self.share_weights:
            u_hat_vecs = K.conv1d(u_vecs, self.W)
        else:
            u_hat_vecs = K.local_conv1d(u_vecs, self.W, [1], [1])

        batch_size = K.shape(u_vecs)[0]
        input_num_capsule = K.shape(u_vecs)[1]
        u_hat_vecs = K.reshape(u_hat_vecs, (batch_size, input_num_capsule,
                                            self.num_capsule, self.dim_capsule))
        u_hat_vecs = K.permute_dimensions(u_hat_vecs, (0, 2, 1, 3))
        # final u_hat_vecs.shape = [None, num_capsule, input_num_capsule, dim_capsule]

        b = K.zeros_like(u_hat_vecs[:, :, :, 0])  # shape = [None, num_capsule, input_num_capsule]
        for i in range(self.routings):
            b = K.permute_dimensions(b, (0, 2, 1))  # shape = [None, input_num_capsule, num_capsule]
            c = K.softmax(b)
            c = K.permute_dimensions(c, (0, 2, 1))
            b = K.permute_dimensions(b, (0, 2, 1))
            outputs = self.activation(K.batch_dot(c, u_hat_vecs, [2, 2]))
            if i < self.routings - 1:
                b = K.batch_dot(outputs, u_hat_vecs, [2, 3])

        return outputs

    def compute_output_shape(self, input_shape):
        return (None, self.num_capsule, self.dim_capsule)


def get_model():
    input1 = Input(shape=(maxlen,))
    embed_layer = Embedding(max_features,
                            embed_size,
                            input_length=maxlen,
                            weights=[embedding_matrix],
                            trainable=False)(input1)
    embed_layer = SpatialDropout1D(rate_drop_dense)(embed_layer)

    x = Bidirectional(
        GRU(gru_len, activation='relu', dropout=dropout_p, recurrent_dropout=dropout_p, return_sequences=True))(
        embed_layer)
    capsule = Capsule(num_capsule=Num_capsule, dim_capsule=Dim_capsule, routings=Routings,
                      share_weights=True)(x)
    # output_capsule = Lambda(lambda x: K.sqrt(K.sum(K.square(x), 2)))(capsule)
    capsule = Flatten()(capsule)
    capsule = Dropout(dropout_p)(capsule)
    output = Dense(6, activation='sigmoid')(capsule)
    model = Model(inputs=input1, outputs=output)
    model.compile(
        loss='binary_crossentropy',
        optimizer='adam',
        metrics=['accuracy'])
    model.summary()
    return model

In [25]:
# Load the weights from disk for our model object
model.load_weights('best.hdf5')

## 2. Predict on a handful of manually defined strings

In [80]:
X_test = ["this is an innocent comment", "stfu baddie", "shit fuck piss"] # create list of strings to test
comment_df = pd.DataFrame(data=X_test,columns=["comment_text"]) # create dataframe from this list for later

# tokenize comments for use with the model
X_test=tok.texts_to_sequences(X_test)
x_test=sequence.pad_sequences(X_test,maxlen=maxlen)

# perform prediction
predictions = model.predict(x_test, batch_size=1024, verbose=1)[0:len(X_test)]



In [81]:
# Visualize results
prediction_df = pd.DataFrame(predictions, columns=["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"])
combined_df = comment_df.join(prediction_df) # join the comment dataframe with the results dataframe
combined_df.head(15)

Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,this is an innocent comment,0.041128,0.002055,0.004156,0.002384,0.008886,0.003515
1,stfu baddie,0.764418,0.02416,0.404104,0.00869,0.345152,0.035791
2,shit fuck piss,0.970915,0.411046,0.952534,0.080814,0.860406,0.196927


In [78]:
# Optional: Write dataframe out to CSV
combined_df.to_csv('manual_comment_results.csv', index=False)

## 3. Predicting on a new CSV of comments

Now that you understand how the code works on a handful of arbitrary strings, we can go a step further and perform prediction on a larger comment dataset loaded from a CSV file.

In [82]:
# Load in your own data
new_comments_df = pd.read_csv('input/test.csv') # Replace 'test.csv' with your dataset
X_test = test["comment_text"].str.lower() # Replace "comment_text" with the label of the column containing your comments

In [83]:
# Tokenizing your data for use with the model
X_test=tok.texts_to_sequences(X_test)
x_test=sequence.pad_sequences(X_test,maxlen=maxlen)

In [84]:
# Perform predictions and concatenate data into one dataframe
predictions = model.predict(x_test, batch_size=1024, verbose=1)[0:len(X_test)]
prediction_df = pd.DataFrame(predictions, columns=["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"])
combined_df = new_comments_df.join(prediction_df) # join the comment dataframe with the results dataframe

combined_df.head(15)



Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,00001cee341fdb12,Yo bitch Ja Rule is more succesful then you'll...,0.977612,0.394969,0.95055,0.099204,0.901001,0.216824
1,0000247867823ef7,== From RfC == \n\n The title is fine as it is...,0.004718,0.001969,0.002531,0.002945,0.002312,0.002568
2,00013b17ad220c46,""" \n\n == Sources == \n\n * Zawe Ashton on Lap...",0.003458,0.002344,0.002853,0.003776,0.002102,0.002878
3,00017563c3f7919a,":If you have a look back at the source, the in...",0.003513,0.002292,0.00277,0.003654,0.002206,0.002168
4,00017695ad8997eb,I don't anonymously edit articles at all.,0.009126,0.001767,0.003378,0.00232,0.003045,0.002157
5,0001ea8717f6de06,Thank you for understanding. I think very high...,0.003855,0.002208,0.002577,0.003744,0.002179,0.002267
6,00024115d4cbde0f,Please do not add nonsense to Wikipedia. Such ...,0.017392,0.001712,0.003295,0.002054,0.004815,0.002133
7,000247e83dcc1211,:Dear god this site is horrible.,0.633773,0.011423,0.094458,0.006996,0.210926,0.021639
8,00025358d4737918,""" \n Only a fool can believe in such numbers. ...",0.051208,0.002009,0.007375,0.002079,0.0113,0.003268
9,00026d1092fe71cc,== Double Redirects == \n\n When fixing double...,0.004697,0.001848,0.002821,0.002583,0.002378,0.002094


In [None]:
# Optional: Create an output folder to store your predictions

outdir = './output'
if not os.path.exists(outdir):
    os.mkdir(outdir)

In [85]:
# Optional: Create an output folder and write dataframe out to CSV

outname = 'user_dataset_results.csv' # You can rename this file to whatever you'd like. Using the same output filename multiple times could cause overwriting, be careful!
fullpath = os.path.join('./output', outname)  

combined_df.to_csv(fullpath, index=False)