<center><h1 style="color:green; background-color:black;">Toxic Comment Classification</h1></center>

### Table of Content :
1. Importing Data and Libraries
2. Exploratory Data Analysis (EDA)
3. Data Pre-processing
4. Modeling<br />
    * Naive Bayes SVM Model <br />
    * LSTM <br />
    * BERT model <br />
5. Model Ensembling

<h2 style="color:blue">1. Importing Libraries</h2>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation
from keras.layers import Bidirectional, GlobalMaxPool1D
from keras.models import Model
from keras import initializers, regularizers, constraints, optimizers, layers

import warnings
warnings.simplefilter(action="ignore")

<h2 style="color:blue">Loading the Data</h2>

In [None]:
train = pd.read_csv('../input/jigsaw-toxic-comment-classification-challenge/train.csv.zip')
test = pd.read_csv('../input/jigsaw-toxic-comment-classification-challenge/test.csv.zip')

In [None]:
train.head()

In [None]:
test.head()

<h2 style="color:blue">2. Exploratory Data Analysis</h2>

<h2 style="color:blue">3. Data Pre-Processing</h2>

In [None]:
classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
targets = train[classes].values

train_sentences = train['comment_text']
test_sentences = test['comment_text']

<h2 style="color:blue">Tokenization</h2>

In [None]:
max_features = 20000
tokenizer = Tokenizer(num_words = max_features)
tokenizer.fit_on_texts(list(train_sentences))
tokenized_train = tokenizer.texts_to_sequences(train_sentences)
tokenized_test = tokenizer.texts_to_sequences(test_sentences)

In [None]:
tokenized_train[:1]

<h3 style="color:blue">Padding</h3>

In [None]:
maxlen = 200
X_train = pad_sequences(tokenized_train, maxlen = maxlen)
X_test = pad_sequences(tokenized_test, maxlen = maxlen)

In [None]:
totalNumWords = [len(comment) for comment in tokenized_train]

<h2 style="color:Blue;">4. Modeling</h2>

<h3 style="color:green;">Naive Bayes SVM Model</h3>

<h3 style="color:green;">LSTM</h3>

In [None]:
embed_size = 128

inp = Input(shape = (maxlen, ))
x = Embedding(max_features, embed_size)(inp)
x = LSTM(60, return_sequences=True, name='lstm_layer')(x)
x = GlobalMaxPool1D()(x)
x = Dropout(0.1)(x)
x = Dense(50, activation="relu")(x)
x = Dropout(0.1)(x)
x = Dense(6, activation="sigmoid")(x)

In [None]:
model = Model(inputs=inp, outputs=x)
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

In [None]:
model.summary()

In [None]:
batch_size = 64
epochs = 2
model.fit(X_train, targets, batch_size=batch_size, epochs=epochs, validation_split=0.1)

<h3 style="color:blue;">Prediction</h3>

In [None]:
prediction = model.predict(X_test)
prediction

<h3 style="color:green;">BERT model</h3>

<h2 style="color:Blue;">5. Model Ensembling</h2>