# CNN Classification with MPQA Dataset
<hr>

The __modus operandi__ for text classification is to use __word embedding__ for representing words and a Convolutional neural network to learn how to discriminate documents on classification problems. 

__Yoav Goldberg__ commented in _A Primer on Neural Network Models for Natural Language Processing, 2015._ :
> _The non-linearity of the network, as well as the ability to easily integrate pre-trained
word embeddings, often lead to superior classification accuracy._

He also commented in _Neural Network Methods for Natural Language Processing, 2017_ :
> ... _the CNN is in essence a feature-extracting architecture. ... . The CNNs layer's responsibility is to extract meaningful sub-structures that are useful for the overall prediction task at hand._

We will build a text classification model using CNN model on the Customer Reviews Dataset. Since there is no standard train/test split for this dataset, we will use 10-Fold Cross Validation (CV). 

The CNN model is inspired by __Yoon Kim__ paper in his study on the use of Word Embedding + CNN for text classification. The hyperparameters we use based on his study are as follows:
- Transfer function: rectified linear.
- Kernel sizes: 1,2, 3, 4, 5.
- Number of filters: 100.
- Dropout rate: 0.5.
- Weight regularization (L2) constraint: 3.
- Batch Size: 50.
- Update Rule: Adam

## Load the library

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk
import random
# from nltk.tokenize import TweetTokenizer
from sklearn.model_selection import KFold

%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False
# nltk.download('twitter_samples')

In [2]:
tf.config.list_physical_devices('GPU') 

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

## Load the Dataset

In [4]:
corpus = pd.read_pickle('../../../0_data/MPQA/MPQA.pkl')
corpus.label = corpus.label.astype(int)
print(corpus.shape)
corpus

(10606, 3)


Unnamed: 0,sentence,label,split
0,complaining,0,train
1,failing to support,0,train
2,desperately needs,0,train
3,many years of decay,0,train
4,no quick fix,0,train
...,...,...,...
10601,urged,1,train
10602,strictly abide,1,train
10603,hope,1,train
10604,strictly abide,1,train


In [5]:
corpus.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10606 entries, 0 to 10605
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   sentence  10606 non-null  object
 1   label     10606 non-null  int32 
 2   split     10606 non-null  object
dtypes: int32(1), object(2)
memory usage: 207.3+ KB


In [6]:
corpus.groupby( by='label').count()

Unnamed: 0_level_0,sentence,split
label,Unnamed: 1_level_1,Unnamed: 2_level_1
0,7294,7294
1,3312,3312


In [7]:
# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

In [8]:
sentences[0]

'complaining'

<!--## Split Dataset-->

# Data Preprocessing
<hr>

Preparing data for word embedding, especially for pre-trained word embedding like Word2Vec or GloVe, __don't use standard preprocessing steps like stemming or stopword removal__. Compared to our approach on cleaning the text when doing word count based feature extraction (e.g. TFIDF) such as removing stopwords, stemming etc, now we will keep these words as we do not want to lose such information that might help the model learn better.

__Tomas Mikolov__, one of the developers of Word2Vec, in _word2vec-toolkit: google groups thread., 2015_, suggests only very minimal text cleaning is required when learning a word embedding model. Sometimes, it's good to disconnect
In short, what we will do is:
- Puntuations removal
- Lower the letter case
- Tokenization

The process above will be handled by __Tokenizer__ class in TensorFlow

- <b>One way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set.</b>

In [9]:
# Define a function to compute the max length of sequence
def max_length(sequences):
    '''
    input:
        sequences: a 2D list of integer sequences
    output:
        max_length: the max length of the sequences
    '''
    max_length = 0
    for i, seq in enumerate(sequences):
        length = len(seq)
        if max_length < length:
            max_length = length
    return max_length

In [12]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"

# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

# Cleaning and Tokenization
tokenizer = Tokenizer(oov_token=oov_tok)
tokenizer.fit_on_texts(sentences)

print("Example of sentence: ", sentences[8])

# Turn the text into sequence
training_sequences = tokenizer.texts_to_sequences(sentences)
max_len = max_length(training_sequences)

print('Into a sequence of int:', training_sequences[8])

# Pad the sequence to have the same size
training_padded = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
print('Into a padded sequence:', training_padded[8])

Example of sentence:  a very complicated process
Into a sequence of int: [5, 44, 946, 581]
Into a padded sequence: [  5  44 946 581   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]


In [13]:
word_index = tokenizer.word_index
# See the first 10 words in the vocabulary
for i, word in enumerate(word_index):
    print(word, word_index.get(word))
    if i==9:
        break
vocab_size = len(word_index)+1
print(vocab_size)

<UNK> 1
the 2
of 3
to 4
a 5
and 6
not 7
is 8
in 9
be 10
6236


# Model 1: Embedding Random
<hr>

A __standard model__ for document classification is to use (quoted from __Jason Brownlee__, the author of [machinelearningmastery.com](https://machinelearningmastery.com)):
>- Word Embedding: A distributed representation of words where different words that have a similar meaning (based on their usage) also have a similar representation.
>- Convolutional Model: A feature extraction model that learns to extract salient features from documents represented using a word embedding.
>- Fully Connected Model: The interpretation of extracted features in terms of a predictive output.


Therefore, the model is comprised of the following elements:
- __Input layer__ that defines the length of input sequences.
- __Embedding layer__ set to the size of the vocabulary and 100-dimensional real-valued representations.
- __Conv1D layer__ with 32 filters and a kernel size set to the number of words to read at once.
- __MaxPooling1D layer__ to consolidate the output from the convolutional layer.
- __Flatten layer__ to reduce the three-dimensional output to two dimensional for concatenation.

The CNN model is inspired by __Yoon Kim__ paper in his study on the use of Word Embedding + CNN for text classification. The hyperparameters we use based on his study are as follows:
- Transfer function: rectified linear.
- Kernel sizes: 3, 4, 5.
- Number of filters: 100.
- Dropout rate: 0.5.
- Weight regularization (L2): 3.
- Batch Size: 50.
- Update Rule: Adam

We will perform the best parameter using __grid search__ and 10-fold cross validation.

## CNN Model

Now, we will build Convolutional Neural Network (CNN) models to classify encoded documents as either positive or negative.

The model takes inspiration from `CNN for Sentence Classification` by *Yoon Kim*.

Now, we will define our CNN model as follows:
- One Conv layer with 100 filters, kernel size 5, and relu activation function;
- One MaxPool layer with pool size = 2;
- One Dropout layer after flattened;
- Optimizer: Adam (The best learning algorithm so far)
- Loss function: binary cross-entropy (suited for binary classification problem)

**Note**: 
- The whole purpose of dropout layers is to tackle the problem of over-fitting and to introduce generalization to the model. Hence it is advisable to keep dropout parameter near 0.5 in hidden layers. 
- https://missinglink.ai/guides/keras/keras-conv1d-working-1d-convolutional-neural-networks-keras/

In [14]:
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import MaxNorm

def define_model(filters = 100, kernel_size = 3, activation='relu', input_dim = None, output_dim=300, max_length = None ):
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(input_dim=vocab_size, 
                                  output_dim=output_dim, 
                                  input_length=max_length, 
                                  input_shape=(max_length, )),
        
        tf.keras.layers.Conv1D(filters=filters, kernel_size = kernel_size, activation = activation, 
                               # set 'axis' value to the first and second axis of conv1D weights (rows, cols)
                               kernel_constraint= MaxNorm( max_value=3, axis=[0,1])),
        
        tf.keras.layers.MaxPool1D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation=activation, 
                              # set axis to 0 to constrain each weight vector of length (input_dim,) in dense layer
                              kernel_constraint = MaxNorm( max_value=3, axis=0)),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=1, activation='sigmoid')
    ])
    
    model.compile( loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
#     model.summary()
    return model

In [15]:
model_0 = define_model( input_dim=1000, max_length=100)
model_0.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 300)          1870800   
_________________________________________________________________
conv1d (Conv1D)              (None, 98, 100)           90100     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 49, 100)           0         
_________________________________________________________________
flatten (Flatten)            (None, 4900)              0         
_________________________________________________________________
dropout (Dropout)            (None, 4900)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                49010     
_________________________________________________________________
dropout_1 (Dropout)          (None, 10)                0

In [16]:
class myCallback(tf.keras.callbacks.Callback):
    # Overide the method on_epoch_end() for our benefit
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') > 0.93):
            print("\nReached 93% accuracy so cancelling training!")
            self.model.stop_training=True


callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
                                             patience=5, verbose=2, 
                                             mode='auto', restore_best_weights=True)

## Train and Test the Model

In [17]:
# Parameter Initialization
trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"
activations = ['relu', 'tanh']
filters = 100
kernel_sizes = [1, 2, 3, 4, 5, 6]

columns = ['Activation', 'Filters', 'acc1', 'acc2', 'acc3', 'acc4', 'acc5', 'acc6', 'acc7', 'acc8', 'acc9', 'acc10', 'AVG']
record = pd.DataFrame(columns = columns)

# prepare cross validation with 10 splits and shuffle = True
kfold = KFold(10, True)

# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

for activation in activations:
    for kernel_size in kernel_sizes:
        # kfold.split() will return set indices for each split
        acc_list = []
        for train, test in kfold.split(sentences):
            
            train_x, test_x = [], []
            train_y, test_y = [], []
            
            for i in train:
                train_x.append(sentences[i])
                train_y.append(labels[i])

            for i in test:
                test_x.append(sentences[i])
                test_y.append(labels[i])

            # Turn the labels into a numpy array
            train_y = np.array(train_y)
            test_y = np.array(test_y)

            # encode data using
            # Cleaning and Tokenization
            tokenizer = Tokenizer(oov_token=oov_tok)
            tokenizer.fit_on_texts(train_x)

            # Turn the text into sequence
            training_sequences = tokenizer.texts_to_sequences(train_x)
            test_sequences = tokenizer.texts_to_sequences(test_x)

            max_len = max_length(training_sequences)

            # Pad the sequence to have the same size
            Xtrain = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
            Xtest = pad_sequences(test_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)

            word_index = tokenizer.word_index
            vocab_size = len(word_index)+1

            # Define the input shape
            model = define_model(filters, kernel_size, activation, input_dim=vocab_size, max_length=max_len)

            # Train the model
            model.fit(Xtrain, train_y, batch_size=50, epochs=15, verbose=2, 
                      callbacks=[callbacks], validation_data=(Xtest, test_y))

            # evaluate the model
            loss, acc = model.evaluate(Xtest, test_y, verbose=0)
            print('Test Accuracy: {}'.format(acc*100))

            acc_list.append(acc*100)
            
        mean_acc = np.array(acc_list).mean()
        parameters = [activation, kernel_size]
        entries = parameters + acc_list + [mean_acc]

        temp = pd.DataFrame([entries], columns=columns)
        record = record.append(temp, ignore_index=True)
        print()
        print(record)
        print()



Epoch 1/15
191/191 - 12s - loss: 0.5715 - accuracy: 0.7051 - val_loss: 0.4476 - val_accuracy: 0.8435
Epoch 2/15
191/191 - 6s - loss: 0.3333 - accuracy: 0.8698 - val_loss: 0.4152 - val_accuracy: 0.8549
Epoch 3/15
191/191 - 6s - loss: 0.2107 - accuracy: 0.9331 - val_loss: 0.4450 - val_accuracy: 0.8483
Epoch 4/15
191/191 - 7s - loss: 0.1787 - accuracy: 0.9459 - val_loss: 0.5061 - val_accuracy: 0.8435
Epoch 5/15
191/191 - 7s - loss: 0.1517 - accuracy: 0.9537 - val_loss: 0.5590 - val_accuracy: 0.8530
Epoch 6/15
191/191 - 7s - loss: 0.1390 - accuracy: 0.9577 - val_loss: 0.5864 - val_accuracy: 0.8464
Epoch 7/15
191/191 - 7s - loss: 0.1331 - accuracy: 0.9580 - val_loss: 0.6178 - val_accuracy: 0.8464
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 85.48539280891418
Epoch 1/15
191/191 - 9s - loss: 0.5902 - accuracy: 0.6865 - val_loss: 0.4577 - val_accuracy: 0.7936
Epoch 2/15
191/191 - 7s - loss: 0.3521 - accuracy: 0.8749 - val_loss: 0.3611 - val

Test Accuracy: 88.01887035369873
Epoch 1/15
191/191 - 8s - loss: 0.6219 - accuracy: 0.6873 - val_loss: 0.5207 - val_accuracy: 0.6745
Epoch 2/15
191/191 - 7s - loss: 0.4104 - accuracy: 0.8284 - val_loss: 0.3595 - val_accuracy: 0.8547
Epoch 3/15
191/191 - 11s - loss: 0.2541 - accuracy: 0.9301 - val_loss: 0.3622 - val_accuracy: 0.8651
Epoch 4/15
191/191 - 7s - loss: 0.2082 - accuracy: 0.9427 - val_loss: 0.4042 - val_accuracy: 0.8575
Epoch 5/15
191/191 - 8s - loss: 0.1782 - accuracy: 0.9510 - val_loss: 0.4418 - val_accuracy: 0.8585
Epoch 6/15
191/191 - 8s - loss: 0.1566 - accuracy: 0.9574 - val_loss: 0.5155 - val_accuracy: 0.8453
Epoch 7/15
191/191 - 8s - loss: 0.1425 - accuracy: 0.9614 - val_loss: 0.5399 - val_accuracy: 0.8462
Epoch 8/15
191/191 - 8s - loss: 0.1327 - accuracy: 0.9616 - val_loss: 0.5727 - val_accuracy: 0.8443
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 86.50943636894226

  Activation Filters       acc1       acc2      

Epoch 1/15
191/191 - 9s - loss: 0.5795 - accuracy: 0.7132 - val_loss: 0.4173 - val_accuracy: 0.8519
Epoch 2/15
191/191 - 8s - loss: 0.3426 - accuracy: 0.8757 - val_loss: 0.3596 - val_accuracy: 0.8708
Epoch 3/15
191/191 - 9s - loss: 0.2087 - accuracy: 0.9321 - val_loss: 0.3973 - val_accuracy: 0.8575
Epoch 4/15
191/191 - 9s - loss: 0.1678 - accuracy: 0.9463 - val_loss: 0.4674 - val_accuracy: 0.8594
Epoch 5/15
191/191 - 9s - loss: 0.1385 - accuracy: 0.9550 - val_loss: 0.5085 - val_accuracy: 0.8623
Epoch 6/15
191/191 - 8s - loss: 0.1165 - accuracy: 0.9580 - val_loss: 0.5853 - val_accuracy: 0.8557
Epoch 7/15
191/191 - 8s - loss: 0.1015 - accuracy: 0.9642 - val_loss: 0.6634 - val_accuracy: 0.8453
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 87.07547187805176
Epoch 1/15
191/191 - 11s - loss: 0.5824 - accuracy: 0.7064 - val_loss: 0.4451 - val_accuracy: 0.8274
Epoch 2/15
191/191 - 10s - loss: 0.3427 - accuracy: 0.8553 - val_loss: 0.3901 - va

Epoch 7/15
191/191 - 8s - loss: 0.0950 - accuracy: 0.9697 - val_loss: 0.5684 - val_accuracy: 0.8736
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 88.86792659759521
Epoch 1/15
191/191 - 9s - loss: 0.5950 - accuracy: 0.6976 - val_loss: 0.4756 - val_accuracy: 0.8113
Epoch 2/15
191/191 - 8s - loss: 0.3548 - accuracy: 0.8498 - val_loss: 0.4107 - val_accuracy: 0.8009
Epoch 3/15
191/191 - 11s - loss: 0.2177 - accuracy: 0.9279 - val_loss: 0.4689 - val_accuracy: 0.7943
Epoch 4/15
191/191 - 10s - loss: 0.1593 - accuracy: 0.9470 - val_loss: 0.5767 - val_accuracy: 0.7840
Epoch 5/15
191/191 - 8s - loss: 0.1354 - accuracy: 0.9556 - val_loss: 0.6309 - val_accuracy: 0.7764
Epoch 6/15
191/191 - 8s - loss: 0.1166 - accuracy: 0.9596 - val_loss: 0.6562 - val_accuracy: 0.7877
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 81.13207817077637
Epoch 1/15
191/191 - 14s - loss: 0.6138 - accuracy: 0.6867 - val

191/191 - 9s - loss: 0.2496 - accuracy: 0.8770 - val_loss: 0.3522 - val_accuracy: 0.8728
Epoch 4/15
191/191 - 9s - loss: 0.2137 - accuracy: 0.8983 - val_loss: 0.3822 - val_accuracy: 0.8586
Epoch 5/15
191/191 - 10s - loss: 0.1672 - accuracy: 0.9252 - val_loss: 0.4353 - val_accuracy: 0.8605
Epoch 6/15
191/191 - 9s - loss: 0.1339 - accuracy: 0.9417 - val_loss: 0.6284 - val_accuracy: 0.8351
Epoch 7/15
191/191 - 9s - loss: 0.1132 - accuracy: 0.9629 - val_loss: 0.6229 - val_accuracy: 0.8134
Epoch 8/15
191/191 - 9s - loss: 0.0980 - accuracy: 0.9661 - val_loss: 0.6948 - val_accuracy: 0.8407
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 87.27615475654602
Epoch 1/15
191/191 - 13s - loss: 0.6056 - accuracy: 0.6889 - val_loss: 0.5134 - val_accuracy: 0.6613
Epoch 2/15
191/191 - 11s - loss: 0.3700 - accuracy: 0.8467 - val_loss: 0.3830 - val_accuracy: 0.8462
Epoch 3/15
191/191 - 10s - loss: 0.2254 - accuracy: 0.9298 - val_loss: 0.4095 - val_accurac

Epoch 8/15
191/191 - 8s - loss: 0.0827 - accuracy: 0.9746 - val_loss: 0.7902 - val_accuracy: 0.8002
Epoch 9/15
191/191 - 8s - loss: 0.0849 - accuracy: 0.9727 - val_loss: 0.7253 - val_accuracy: 0.8388
Restoring model weights from the end of the best epoch.
Epoch 00009: early stopping
Test Accuracy: 86.33365035057068
Epoch 1/15
191/191 - 10s - loss: 0.5994 - accuracy: 0.6908 - val_loss: 0.4577 - val_accuracy: 0.8473
Epoch 2/15
191/191 - 9s - loss: 0.3672 - accuracy: 0.8356 - val_loss: 0.3643 - val_accuracy: 0.8680
Epoch 3/15
191/191 - 9s - loss: 0.2102 - accuracy: 0.9332 - val_loss: 0.3930 - val_accuracy: 0.8596
Epoch 4/15
191/191 - 9s - loss: 0.1539 - accuracy: 0.9576 - val_loss: 0.4313 - val_accuracy: 0.8567
Epoch 5/15
191/191 - 9s - loss: 0.1230 - accuracy: 0.9627 - val_loss: 0.5155 - val_accuracy: 0.8549
Epoch 6/15
191/191 - 9s - loss: 0.1001 - accuracy: 0.9696 - val_loss: 0.6146 - val_accuracy: 0.8068
Epoch 7/15
191/191 - 9s - loss: 0.0877 - accuracy: 0.9711 - val_loss: 0.7082 - val

Test Accuracy: 80.6786060333252
Epoch 1/15
191/191 - 10s - loss: 0.6090 - accuracy: 0.6935 - val_loss: 0.5094 - val_accuracy: 0.8058
Epoch 2/15
191/191 - 9s - loss: 0.3720 - accuracy: 0.8373 - val_loss: 0.4046 - val_accuracy: 0.8407
Epoch 3/15
191/191 - 9s - loss: 0.2212 - accuracy: 0.9141 - val_loss: 0.4670 - val_accuracy: 0.7917
Epoch 4/15
191/191 - 9s - loss: 0.1566 - accuracy: 0.9301 - val_loss: 0.4954 - val_accuracy: 0.8351
Epoch 5/15
191/191 - 10s - loss: 0.1354 - accuracy: 0.9451 - val_loss: 0.5613 - val_accuracy: 0.8266
Epoch 6/15
191/191 - 9s - loss: 0.1044 - accuracy: 0.9535 - val_loss: 0.7394 - val_accuracy: 0.8219
Epoch 7/15
191/191 - 9s - loss: 0.0932 - accuracy: 0.9586 - val_loss: 0.7349 - val_accuracy: 0.8275
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 84.0716302394867
Epoch 1/15
191/191 - 10s - loss: 0.6044 - accuracy: 0.6856 - val_loss: 0.4603 - val_accuracy: 0.7022
Epoch 2/15
191/191 - 8s - loss: 0.3899 - accuracy

Epoch 2/15
191/191 - 7s - loss: 0.2964 - accuracy: 0.9010 - val_loss: 0.3925 - val_accuracy: 0.8379
Epoch 3/15
191/191 - 7s - loss: 0.1903 - accuracy: 0.9459 - val_loss: 0.4217 - val_accuracy: 0.8417
Epoch 4/15
191/191 - 7s - loss: 0.1554 - accuracy: 0.9540 - val_loss: 0.4242 - val_accuracy: 0.8511
Epoch 5/15
191/191 - 7s - loss: 0.1352 - accuracy: 0.9604 - val_loss: 0.4668 - val_accuracy: 0.8417
Epoch 6/15
191/191 - 7s - loss: 0.1228 - accuracy: 0.9628 - val_loss: 0.4638 - val_accuracy: 0.8464
Epoch 7/15
191/191 - 7s - loss: 0.1052 - accuracy: 0.9689 - val_loss: 0.5371 - val_accuracy: 0.8322
Epoch 8/15
191/191 - 7s - loss: 0.0978 - accuracy: 0.9721 - val_loss: 0.5047 - val_accuracy: 0.8539
Epoch 9/15
191/191 - 7s - loss: 0.0903 - accuracy: 0.9737 - val_loss: 0.5596 - val_accuracy: 0.8445
Epoch 10/15
191/191 - 7s - loss: 0.0879 - accuracy: 0.9733 - val_loss: 0.5584 - val_accuracy: 0.8473
Epoch 11/15
191/191 - 7s - loss: 0.0821 - accuracy: 0.9739 - val_loss: 0.5751 - val_accuracy: 0.837

Epoch 3/15
191/191 - 7s - loss: 0.1842 - accuracy: 0.9422 - val_loss: 0.4148 - val_accuracy: 0.8226
Epoch 4/15
191/191 - 7s - loss: 0.1497 - accuracy: 0.9534 - val_loss: 0.4563 - val_accuracy: 0.8160
Epoch 5/15
191/191 - 7s - loss: 0.1281 - accuracy: 0.9606 - val_loss: 0.4586 - val_accuracy: 0.8283
Epoch 6/15
191/191 - 7s - loss: 0.1106 - accuracy: 0.9662 - val_loss: 0.5040 - val_accuracy: 0.8170
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 83.77358317375183

  Activation Filters       acc1       acc2       acc3       acc4       acc5  \
0       relu       1  85.485393  85.862392  86.993402  85.956645  86.616397   
1       relu       2  85.673892  86.145145  85.956645  86.050898  85.956645   
2       relu       3  86.333650  84.071630  85.862392  84.825635  85.768145   
3       relu       4  86.522150  85.862392  86.616397  85.108387  85.391140   
4       relu       5  84.260130  85.862392  87.276155  86.333650  86.804903   
5       

Test Accuracy: 88.49056363105774
Epoch 1/15
191/191 - 9s - loss: 0.5560 - accuracy: 0.7373 - val_loss: 0.3865 - val_accuracy: 0.8491
Epoch 2/15
191/191 - 7s - loss: 0.2947 - accuracy: 0.9021 - val_loss: 0.3623 - val_accuracy: 0.8651
Epoch 3/15
191/191 - 8s - loss: 0.1821 - accuracy: 0.9452 - val_loss: 0.3936 - val_accuracy: 0.8453
Epoch 4/15
191/191 - 7s - loss: 0.1439 - accuracy: 0.9614 - val_loss: 0.4341 - val_accuracy: 0.8481
Epoch 5/15
191/191 - 7s - loss: 0.1177 - accuracy: 0.9669 - val_loss: 0.4636 - val_accuracy: 0.8462
Epoch 6/15
191/191 - 7s - loss: 0.1017 - accuracy: 0.9716 - val_loss: 0.5090 - val_accuracy: 0.8481
Epoch 7/15
191/191 - 7s - loss: 0.0904 - accuracy: 0.9731 - val_loss: 0.5291 - val_accuracy: 0.8453
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 86.50943636894226
Epoch 1/15
191/191 - 9s - loss: 0.5636 - accuracy: 0.7362 - val_loss: 0.4125 - val_accuracy: 0.8292
Epoch 2/15
191/191 - 7s - loss: 0.3043 - accuracy:

Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 84.91988778114319
Epoch 1/15
191/191 - 9s - loss: 0.5662 - accuracy: 0.7290 - val_loss: 0.3848 - val_accuracy: 0.8360
Epoch 2/15
191/191 - 7s - loss: 0.2967 - accuracy: 0.9024 - val_loss: 0.3337 - val_accuracy: 0.8671
Epoch 3/15
191/191 - 7s - loss: 0.1708 - accuracy: 0.9499 - val_loss: 0.3639 - val_accuracy: 0.8435
Epoch 4/15
191/191 - 8s - loss: 0.1266 - accuracy: 0.9629 - val_loss: 0.4507 - val_accuracy: 0.8153
Epoch 5/15
191/191 - 8s - loss: 0.1018 - accuracy: 0.9692 - val_loss: 0.4703 - val_accuracy: 0.8473
Epoch 6/15
191/191 - 8s - loss: 0.0842 - accuracy: 0.9746 - val_loss: 0.5587 - val_accuracy: 0.8369
Epoch 7/15
191/191 - 8s - loss: 0.0816 - accuracy: 0.9754 - val_loss: 0.5459 - val_accuracy: 0.8153
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 86.71064972877502
Epoch 1/15
191/191 - 9s - loss: 0.5675 - accuracy: 0.7260 - val_lo

Epoch 7/15
191/191 - 8s - loss: 0.0797 - accuracy: 0.9760 - val_loss: 0.5263 - val_accuracy: 0.8530
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 86.8991494178772
Epoch 1/15
191/191 - 10s - loss: 0.5713 - accuracy: 0.7221 - val_loss: 0.3984 - val_accuracy: 0.8238
Epoch 2/15
191/191 - 8s - loss: 0.2891 - accuracy: 0.9046 - val_loss: 0.3641 - val_accuracy: 0.8624
Epoch 3/15
191/191 - 8s - loss: 0.1770 - accuracy: 0.9517 - val_loss: 0.4073 - val_accuracy: 0.8483
Epoch 4/15
191/191 - 8s - loss: 0.1312 - accuracy: 0.9635 - val_loss: 0.4621 - val_accuracy: 0.8332
Epoch 5/15
191/191 - 8s - loss: 0.1035 - accuracy: 0.9717 - val_loss: 0.5072 - val_accuracy: 0.8068
Epoch 6/15
191/191 - 8s - loss: 0.0905 - accuracy: 0.9755 - val_loss: 0.5216 - val_accuracy: 0.8294
Epoch 7/15
191/191 - 8s - loss: 0.0820 - accuracy: 0.9763 - val_loss: 0.5289 - val_accuracy: 0.8473
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping

Epoch 8/15
191/191 - 11s - loss: 0.0607 - accuracy: 0.9795 - val_loss: 0.5990 - val_accuracy: 0.8294
Epoch 9/15
191/191 - 9s - loss: 0.0564 - accuracy: 0.9805 - val_loss: 0.6204 - val_accuracy: 0.8228
Epoch 10/15
191/191 - 10s - loss: 0.0597 - accuracy: 0.9806 - val_loss: 0.6378 - val_accuracy: 0.8238
Epoch 11/15
191/191 - 10s - loss: 0.0568 - accuracy: 0.9822 - val_loss: 0.6164 - val_accuracy: 0.8567
Epoch 12/15
191/191 - 10s - loss: 0.0556 - accuracy: 0.9794 - val_loss: 0.6754 - val_accuracy: 0.8153
Epoch 13/15
191/191 - 9s - loss: 0.0543 - accuracy: 0.9805 - val_loss: 0.6718 - val_accuracy: 0.8106
Epoch 14/15
191/191 - 12s - loss: 0.0486 - accuracy: 0.9838 - val_loss: 0.6473 - val_accuracy: 0.8454
Epoch 15/15
191/191 - 8s - loss: 0.0498 - accuracy: 0.9829 - val_loss: 0.6701 - val_accuracy: 0.8030
Test Accuracy: 80.30160069465637
Epoch 1/15
191/191 - 11s - loss: 0.5691 - accuracy: 0.7188 - val_loss: 0.4266 - val_accuracy: 0.8322
Epoch 2/15
191/191 - 9s - loss: 0.2822 - accuracy: 0.90

Epoch 4/15
191/191 - 8s - loss: 0.1208 - accuracy: 0.9671 - val_loss: 0.4694 - val_accuracy: 0.8132
Epoch 5/15
191/191 - 8s - loss: 0.0972 - accuracy: 0.9728 - val_loss: 0.5513 - val_accuracy: 0.8038
Epoch 6/15
191/191 - 9s - loss: 0.0846 - accuracy: 0.9776 - val_loss: 0.5648 - val_accuracy: 0.8000
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 83.96226167678833

   Activation Filters       acc1       acc2       acc3       acc4       acc5  \
0        relu       1  85.485393  85.862392  86.993402  85.956645  86.616397   
1        relu       2  85.673892  86.145145  85.956645  86.050898  85.956645   
2        relu       3  86.333650  84.071630  85.862392  84.825635  85.768145   
3        relu       4  86.522150  85.862392  86.616397  85.108387  85.391140   
4        relu       5  84.260130  85.862392  87.276155  86.333650  86.804903   
5        relu       6  84.542882  80.678606  84.071630  86.522150  85.862392   
6        tanh       1 

Epoch 7/15
191/191 - 9s - loss: 0.0742 - accuracy: 0.9773 - val_loss: 0.5734 - val_accuracy: 0.8292
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 85.94339489936829
Epoch 1/15
191/191 - 9s - loss: 0.5788 - accuracy: 0.7187 - val_loss: 0.4064 - val_accuracy: 0.8472
Epoch 2/15
191/191 - 8s - loss: 0.2947 - accuracy: 0.9045 - val_loss: 0.3850 - val_accuracy: 0.8179
Epoch 3/15
191/191 - 8s - loss: 0.1715 - accuracy: 0.9540 - val_loss: 0.4311 - val_accuracy: 0.8142
Epoch 4/15
191/191 - 8s - loss: 0.1317 - accuracy: 0.9666 - val_loss: 0.4440 - val_accuracy: 0.8566
Epoch 5/15
191/191 - 8s - loss: 0.1110 - accuracy: 0.9695 - val_loss: 0.5196 - val_accuracy: 0.8075
Epoch 6/15
191/191 - 8s - loss: 0.0969 - accuracy: 0.9738 - val_loss: 0.5448 - val_accuracy: 0.8132
Epoch 7/15
191/191 - 8s - loss: 0.0868 - accuracy: 0.9755 - val_loss: 0.5472 - val_accuracy: 0.8160
Epoch 8/15
191/191 - 9s - loss: 0.0753 - accuracy: 0.9783 - val_loss: 0.5871 - val_

## Summary

In [18]:
record.sort_values(by='AVG', ascending=False)

Unnamed: 0,Activation,Filters,acc1,acc2,acc3,acc4,acc5,acc6,acc7,acc8,acc9,acc10,AVG
0,relu,1,85.485393,85.862392,86.993402,85.956645,86.616397,84.825635,87.641507,85.849059,88.01887,86.509436,86.375874
3,relu,4,86.52215,85.862392,86.616397,85.108387,85.39114,87.276155,84.622639,87.075472,84.622639,86.320752,85.941812
4,relu,5,84.26013,85.862392,87.276155,86.33365,86.804903,84.542882,85.094339,86.320752,85.66038,85.283017,85.74386
1,relu,2,85.673892,86.145145,85.956645,86.050898,85.956645,83.129126,85.943395,87.73585,87.075472,83.396226,85.706329
11,tanh,6,83.977377,85.39114,86.239398,85.768145,85.57964,85.485393,85.943395,85.66038,85.377359,86.792451,85.621468
7,tanh,2,83.977377,85.673892,85.296887,85.768145,86.616397,84.637135,88.490564,86.509436,85.471696,83.584905,85.602643
8,tanh,3,86.616397,86.239398,85.768145,85.768145,84.919888,86.71065,84.90566,82.924527,84.90566,85.754716,85.451319
9,tanh,4,84.354383,85.296887,86.899149,86.239398,81.715363,85.485393,86.132073,85.000002,85.188681,87.169814,85.348114
2,relu,3,86.33365,84.07163,85.862392,84.825635,85.768145,84.26013,88.867927,81.132078,85.377359,85.283017,85.178196
6,tanh,1,85.39114,83.788878,85.485393,86.899149,87.558907,85.108387,86.98113,81.320757,82.641512,83.773583,84.894884


In [19]:
record[['Activation', 'AVG']].groupby(by='Activation').max().sort_values(by='AVG', ascending=False)

Unnamed: 0_level_0,AVG
Activation,Unnamed: 1_level_1
relu,86.375874
tanh,85.621468


In [20]:
report = record.sort_values(by='AVG', ascending=False)
report = report.to_excel('CNN_MPQA.xlsx', sheet_name='random')

# Model 2: Word2Vec Static

__Using and updating pre-trained embeddings__
* In this part, we will create an Embedding layer in Tensorflow Keras using a pre-trained word embedding called Word2Vec 300-d tht has been trained 100 bilion words from Google News.
* In this part,  we will leave the embeddings fixed instead of updating them (dynamic).

1. __Load `Word2Vec` Pre-trained Word Embedding__

In [21]:
from gensim.models import KeyedVectors
word2vec = KeyedVectors.load_word2vec_format('../GoogleNews-vectors-negative300.bin', binary=True)

In [22]:
# Access the dense vector value for the word 'handsome'
# word2vec.word_vec('handsome') # 0.11376953
word2vec.word_vec('cool') # 1.64062500e-01

array([ 1.64062500e-01,  1.87500000e-01, -4.10156250e-02,  1.25000000e-01,
       -3.22265625e-02,  8.69140625e-02,  1.19140625e-01, -1.26953125e-01,
        1.77001953e-02,  8.83789062e-02,  2.12402344e-02, -2.00195312e-01,
        4.83398438e-02, -1.01074219e-01, -1.89453125e-01,  2.30712891e-02,
        1.17675781e-01,  7.51953125e-02, -8.39843750e-02, -1.33666992e-02,
        1.53320312e-01,  4.08203125e-01,  3.80859375e-02,  3.36914062e-02,
       -4.02832031e-02, -6.88476562e-02,  9.03320312e-02,  2.12890625e-01,
        1.72119141e-02, -6.44531250e-02, -1.29882812e-01,  1.40625000e-01,
        2.38281250e-01,  1.37695312e-01, -1.76757812e-01, -2.71484375e-01,
       -1.36718750e-01, -1.69921875e-01, -9.15527344e-03,  3.47656250e-01,
        2.22656250e-01, -3.06640625e-01,  1.98242188e-01,  1.33789062e-01,
       -4.34570312e-02, -5.12695312e-02, -3.46679688e-02, -8.49609375e-02,
        1.01562500e-01,  1.42578125e-01, -7.95898438e-02,  1.78710938e-01,
        2.30468750e-01,  

2. __Check number of training words present in Word2Vec__

In [23]:
def training_words_in_word2vector(word_to_vec_map, word_to_index):
    '''
    input:
        word_to_vec_map: a word2vec GoogleNews-vectors-negative300.bin model loaded using gensim.models
        word_to_index: word to index mapping from training set
    '''
    
    vocab_size = len(word_to_index) + 1
    count = 0
    # Set each row "idx" of the embedding matrix to be 
    # the word vector representation of the idx'th word of the vocabulary
    for word, idx in word_to_index.items():
        if word in word_to_vec_map:
            count+=1
            
    return print('Found {} words present from {} training vocabulary in the set of pre-trained word vector'.format(count, vocab_size))

In [24]:
# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

# Cleaning and Tokenization
tokenizer = Tokenizer(oov_token=oov_tok)
tokenizer.fit_on_texts(sentences)

word_index = tokenizer.word_index
training_words_in_word2vector(word2vec, word_index)

Found 6083 words present from 6236 training vocabulary in the set of pre-trained word vector


2. __Define a `pretrained_embedding_layer` function__

In [25]:
from tensorflow.keras.layers import Embedding

def pretrained_embedding_matrix(word_to_vec_map, word_to_index):
    '''
    input:
        word_to_vec_map: a word2vec GoogleNews-vectors-negative300.bin model loaded using gensim.models
        word_to_index: word to index mapping from training set
    '''
    
    # adding 1 to fit Keras embedding (requirement)
    vocab_size = len(word_to_index) + 1
    # define dimensionality of your pre-trained word vectors (= 300)
    emb_dim = word_to_vec_map.word_vec('handsome').shape[0]
    
    
    embed_matrix = np.zeros((vocab_size, emb_dim))
    
    # Set each row "idx" of the embedding matrix to be 
    # the word vector representation of the idx'th word of the vocabulary
    for word, idx in word_to_index.items():
        if word in word_to_vec_map:
            embed_matrix[idx] = word_to_vec_map.word_vec(word)
            
        # initialize the unknown word with standard normal distribution values
        else:
            embed_matrix[idx] = np.random.randn(emb_dim)
            
    return embed_matrix

In [26]:
# Test the function
w_2_i = {'<UNK>': 1, 'handsome': 2, 'cool': 3, 'shit': 4 }
em_matrix = pretrained_embedding_matrix(word2vec, w_2_i)
em_matrix

array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.55148904,  0.57902778, -0.11133335, ...,  0.74881703,
         1.93214372,  1.15349156],
       [ 0.11376953,  0.1796875 , -0.265625  , ..., -0.21875   ,
        -0.03930664,  0.20996094],
       [ 0.1640625 ,  0.1875    , -0.04101562, ...,  0.10888672,
        -0.01019287,  0.02075195],
       [ 0.10888672, -0.16699219,  0.08984375, ..., -0.19628906,
        -0.23144531,  0.04614258]])

## CNN Model

In [27]:
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import MaxNorm

def define_model_2(filters = 100, kernel_size = 3, activation='relu', 
                 input_dim = None, output_dim=300, max_length = None, emb_matrix = None):
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(input_dim=vocab_size, 
                                  output_dim=output_dim, 
                                  input_length=max_length, 
                                  input_shape=(max_length, ),
                                  # Assign the embedding weight with word2vec embedding marix
                                  weights = [emb_matrix],
                                  # Set the weight to be not trainable (static)
                                  trainable = False),
        
        tf.keras.layers.Conv1D(filters=filters, kernel_size = kernel_size, activation = activation, 
                               # set 'axis' value to the first and second axis of conv1D weights (rows, cols)
                               kernel_constraint= MaxNorm( max_value=3, axis=[0,1])),
        
        tf.keras.layers.MaxPool1D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation=activation, 
                              # set axis to 0 to constrain each weight vector of length (input_dim,) in dense layer
                              kernel_constraint = MaxNorm( max_value=3, axis=0)),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=1, activation='sigmoid')
    ])
    
    model.compile( loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
#     model.summary()
    return model

In [28]:
model_0 = define_model_2( input_dim=1000, max_length=100, emb_matrix=np.random.rand(vocab_size, 300))
model_0.summary()

Model: "sequential_121"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_121 (Embedding)    (None, 100, 300)          1768800   
_________________________________________________________________
conv1d_121 (Conv1D)          (None, 98, 100)           90100     
_________________________________________________________________
max_pooling1d_121 (MaxPoolin (None, 49, 100)           0         
_________________________________________________________________
flatten_121 (Flatten)        (None, 4900)              0         
_________________________________________________________________
dropout_242 (Dropout)        (None, 4900)              0         
_________________________________________________________________
dense_242 (Dense)            (None, 10)                49010     
_________________________________________________________________
dropout_243 (Dropout)        (None, 10)             

## Train and Test the Model

In [29]:
class myCallback(tf.keras.callbacks.Callback):
    # Overide the method on_epoch_end() for our benefit
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') >= 0.9):
            print("\nReached 90% accuracy so cancelling training!")
            self.model.stop_training=True

callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
                                             patience=5, verbose=2, 
                                             mode='auto', restore_best_weights=True)

In [30]:
# Parameter Initialization
trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"
activations = ['relu']
filters = 100
kernel_sizes = [1, 2, 3, 4, 5, 6, 7, 8]

columns = ['Activation', 'Filters', 'acc1', 'acc2', 'acc3', 'acc4', 'acc5', 'acc6', 'acc7', 'acc8', 'acc9', 'acc10', 'AVG']
record2 = pd.DataFrame(columns = columns)

# prepare cross validation with 10 splits and shuffle = True
kfold = KFold(10, True)

# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

for activation in activations:
    for kernel_size in kernel_sizes:
        # kfold.split() will return set indices for each split
        acc_list = []
        for train, test in kfold.split(sentences):
            
            train_x, test_x = [], []
            train_y, test_y = [], []
            
            for i in train:
                train_x.append(sentences[i])
                train_y.append(labels[i])

            for i in test:
                test_x.append(sentences[i])
                test_y.append(labels[i])

            # Turn the labels into a numpy array
            train_y = np.array(train_y)
            test_y = np.array(test_y)

            # encode data using
            # Cleaning and Tokenization
            tokenizer = Tokenizer(oov_token=oov_tok)
            tokenizer.fit_on_texts(train_x)

            # Turn the text into sequence
            training_sequences = tokenizer.texts_to_sequences(train_x)
            test_sequences = tokenizer.texts_to_sequences(test_x)

            max_len = max_length(training_sequences)

            # Pad the sequence to have the same size
            Xtrain = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
            Xtest = pad_sequences(test_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)

            word_index = tokenizer.word_index
            vocab_size = len(word_index)+1
            
            
            emb_matrix = pretrained_embedding_matrix(word2vec, word_index)
            
            # Define the input shape
            model = define_model_2(filters, kernel_size, activation, input_dim=vocab_size, 
                                 max_length=max_len, emb_matrix=emb_matrix)

            # Train the model
            model.fit(Xtrain, train_y, batch_size=50, epochs=30, verbose=0, 
                      callbacks=[callbacks], validation_data=(Xtest, test_y))

            # evaluate the model
            loss, acc = model.evaluate(Xtest, test_y, verbose=0)
            print('Test Accuracy: {}'.format(acc*100))

            acc_list.append(acc*100)
            
        mean_acc = np.array(acc_list).mean()
        parameters = [activation, kernel_size]
        entries = parameters + acc_list + [mean_acc]

        temp = pd.DataFrame([entries], columns=columns)
        record2 = record2.append(temp, ignore_index=True)
        print()
        print(record2)
        print()



Restoring model weights from the end of the best epoch.
Epoch 00012: early stopping
Test Accuracy: 87.84165978431702
Restoring model weights from the end of the best epoch.
Epoch 00012: early stopping
Test Accuracy: 88.31291198730469
Restoring model weights from the end of the best epoch.
Epoch 00009: early stopping
Test Accuracy: 86.52215003967285
Restoring model weights from the end of the best epoch.
Epoch 00015: early stopping
Test Accuracy: 87.55890727043152
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 85.48539280891418
Restoring model weights from the end of the best epoch.
Epoch 00009: early stopping
Test Accuracy: 88.87841701507568
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 85.4716956615448
Restoring model weights from the end of the best epoch.
Epoch 00018: early stopping
Test Accuracy: 87.35849261283875
Restoring model weights from the end of the best epoch.
Epoch 000

Restoring model weights from the end of the best epoch.
Epoch 00020: early stopping
Test Accuracy: 85.39113998413086
Restoring model weights from the end of the best epoch.
Epoch 00009: early stopping
Test Accuracy: 85.57963967323303
Restoring model weights from the end of the best epoch.
Epoch 00021: early stopping
Test Accuracy: 86.8991494178772
Restoring model weights from the end of the best epoch.
Epoch 00010: early stopping
Test Accuracy: 85.29688715934753
Restoring model weights from the end of the best epoch.
Epoch 00012: early stopping
Test Accuracy: 78.32233905792236
Restoring model weights from the end of the best epoch.
Epoch 00015: early stopping
Test Accuracy: 86.99340224266052
Restoring model weights from the end of the best epoch.
Epoch 00012: early stopping
Test Accuracy: 86.60377264022827
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 75.84905624389648
Restoring model weights from the end of the best epoch.
Epoch 000

## Summary

In [31]:
record2.sort_values(by='AVG', ascending=False)

Unnamed: 0,Activation,Filters,acc1,acc2,acc3,acc4,acc5,acc6,acc7,acc8,acc9,acc10,AVG
0,relu,1,87.84166,88.312912,86.52215,87.558907,85.485393,88.878417,85.471696,87.358493,87.641507,88.962263,87.40334
1,relu,2,86.239398,85.485393,86.145145,88.124412,87.370408,81.526864,85.754716,88.679248,81.698114,88.01887,85.904257
7,relu,8,85.862392,86.050898,87.558907,85.485393,86.050898,88.689917,85.66038,82.54717,86.509436,84.622639,85.903803
4,relu,5,86.050898,82.280868,85.673892,85.673892,84.354383,83.129126,85.377359,85.377359,87.452829,85.000002,85.037061
6,relu,7,84.26013,80.772853,86.050898,86.52215,85.296887,83.788878,85.000002,85.188681,85.188681,88.01887,85.008803
3,relu,4,81.526864,87.370408,88.124412,84.448636,81.809616,83.977377,76.415092,87.73585,83.867925,86.320752,84.159693
5,relu,6,85.39114,85.57964,86.899149,85.296887,78.322339,86.993402,86.603773,75.849056,76.792455,85.283017,83.301086
2,relu,3,77.568334,78.133833,84.26013,80.490106,85.956645,86.71065,86.509436,83.867925,77.735847,85.66038,82.689329


In [32]:
record2[['Activation', 'AVG']].groupby(by='Activation').max().sort_values(by='AVG', ascending=False)

Unnamed: 0_level_0,AVG
Activation,Unnamed: 1_level_1
relu,87.40334


In [33]:
report = record2.sort_values(by='AVG', ascending=False)
report = report.to_excel('CNN_MPQA_2.xlsx', sheet_name='static')

# Model 3: Word2Vec - Dynamic

* In this part,  we will fine tune the embeddings while training (dynamic).

## CNN Model

In [34]:
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import MaxNorm

def define_model_3(filters = 100, kernel_size = 3, activation='relu', 
                 input_dim = None, output_dim=300, max_length = None, emb_matrix = None):
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(input_dim=vocab_size, 
                                  output_dim=output_dim, 
                                  input_length=max_length, 
                                  input_shape=(max_length, ),
                                  # Assign the embedding weight with word2vec embedding marix
                                  weights = [emb_matrix],
                                  # Set the weight to be not trainable (static)
                                  trainable = True),
        
        tf.keras.layers.Conv1D(filters=filters, kernel_size = kernel_size, activation = activation, 
                               # set 'axis' value to the first and second axis of conv1D weights (rows, cols)
                               kernel_constraint= MaxNorm( max_value=3, axis=[0,1])),
        
        tf.keras.layers.MaxPool1D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation=activation, 
                              # set axis to 0 to constrain each weight vector of length (input_dim,) in dense layer
                              kernel_constraint = MaxNorm( max_value=3, axis=0)),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=1, activation='sigmoid')
    ])
    
    model.compile( loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
#     model.summary()
    return model

In [35]:
model_0 = define_model_3( input_dim=1000, max_length=100, emb_matrix=np.random.rand(vocab_size, 300))
model_0.summary()

Model: "sequential_202"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_202 (Embedding)    (None, 100, 300)          1772100   
_________________________________________________________________
conv1d_202 (Conv1D)          (None, 98, 100)           90100     
_________________________________________________________________
max_pooling1d_202 (MaxPoolin (None, 49, 100)           0         
_________________________________________________________________
flatten_202 (Flatten)        (None, 4900)              0         
_________________________________________________________________
dropout_404 (Dropout)        (None, 4900)              0         
_________________________________________________________________
dense_404 (Dense)            (None, 10)                49010     
_________________________________________________________________
dropout_405 (Dropout)        (None, 10)             

## Train and Test the Model

In [36]:
class myCallback(tf.keras.callbacks.Callback):
    # Overide the method on_epoch_end() for our benefit
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') > 0.93):
            print("\nReached 93% accuracy so cancelling training!")
            self.model.stop_training=True

callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
                                             patience=5, verbose=2, 
                                             mode='auto', restore_best_weights=True)

In [37]:
# Parameter Initialization
trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"
activations = ['relu']
filters = 100
kernel_sizes = [1, 2, 3, 4, 5, 6, 7, 8]

columns = ['Activation', 'Filters', 'acc1', 'acc2', 'acc3', 'acc4', 'acc5', 'acc6', 'acc7', 'acc8', 'acc9', 'acc10', 'AVG']
record3 = pd.DataFrame(columns = columns)

# prepare cross validation with 10 splits and shuffle = True
kfold = KFold(10, True)

# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

for activation in activations:
    for kernel_size in kernel_sizes:
        # kfold.split() will return set indices for each split
        acc_list = []
        for train, test in kfold.split(sentences):
            
            train_x, test_x = [], []
            train_y, test_y = [], []
            
            for i in train:
                train_x.append(sentences[i])
                train_y.append(labels[i])

            for i in test:
                test_x.append(sentences[i])
                test_y.append(labels[i])

            # Turn the labels into a numpy array
            train_y = np.array(train_y)
            test_y = np.array(test_y)

            # encode data using
            # Cleaning and Tokenization
            tokenizer = Tokenizer(oov_token=oov_tok)
            tokenizer.fit_on_texts(train_x)

            # Turn the text into sequence
            training_sequences = tokenizer.texts_to_sequences(train_x)
            test_sequences = tokenizer.texts_to_sequences(test_x)

            max_len = max_length(training_sequences)

            # Pad the sequence to have the same size
            Xtrain = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
            Xtest = pad_sequences(test_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)

            word_index = tokenizer.word_index
            vocab_size = len(word_index)+1
            
            
            emb_matrix = pretrained_embedding_matrix(word2vec, word_index)
            
            # Define the input shape
            model = define_model_3(filters, kernel_size, activation, input_dim=vocab_size, 
                                 max_length=max_len, emb_matrix=emb_matrix)

            # Train the model
            model.fit(Xtrain, train_y, batch_size=50, epochs=20, verbose=0, 
                      callbacks=[callbacks], validation_data=(Xtest, test_y))

            # evaluate the model
            loss, acc = model.evaluate(Xtest, test_y, verbose=0)
            print('Test Accuracy: {}'.format(acc*100))

            acc_list.append(acc*100)
            
        mean_acc = np.array(acc_list).mean()
        parameters = [activation, kernel_size]
        entries = parameters + acc_list + [mean_acc]

        temp = pd.DataFrame([entries], columns=columns)
        record3 = record3.append(temp, ignore_index=True)
        print()
        print(record3)
        print()



Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 81.99811577796936
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 86.23939752578735
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 83.31762552261353
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 86.42789721488953
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 88.31291198730469
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 87.65316009521484
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 81.22641444206238
Restoring model weights from the end of the best epoch.
Epoch 00008: early stopping
Test Accuracy: 87.73584961891174
Restoring model weights from the end of the best epoch.
Epoch 00

Restoring model weights from the end of the best epoch.
Epoch 00009: early stopping
Test Accuracy: 87.37040758132935
Restoring model weights from the end of the best epoch.
Epoch 00009: early stopping
Test Accuracy: 87.84165978431702
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 78.22808623313904
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 88.21865916252136
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping
Test Accuracy: 81.62111043930054
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 87.37040758132935
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 85.56603789329529
Restoring model weights from the end of the best epoch.
Epoch 00007: early stopping
Test Accuracy: 86.22641563415527
Restoring model weights from the end of the best epoch.
Epoch 00

## Summary

In [38]:
record3.sort_values(by='AVG', ascending=False)

Unnamed: 0,Activation,Filters,acc1,acc2,acc3,acc4,acc5,acc6,acc7,acc8,acc9,acc10,AVG
7,relu,8,82.469368,87.84166,87.181902,87.464654,86.71065,86.050898,86.037737,82.169813,87.26415,83.396226,85.658706
2,relu,3,87.935907,87.747407,78.793591,86.899149,83.129126,85.956645,85.849059,86.132073,87.26415,86.603773,85.631088
5,relu,6,87.370408,87.84166,78.228086,88.218659,81.62111,87.370408,85.566038,86.226416,87.169814,85.66038,85.527298
0,relu,1,81.998116,86.239398,83.317626,86.427897,88.312912,87.65316,81.226414,87.73585,87.169814,85.000002,85.508119
3,relu,4,87.181902,80.207354,88.124412,81.809616,87.84166,82.75212,85.66038,87.26415,81.415093,86.698115,84.89548
4,relu,5,81.432611,87.84166,81.244111,81.432611,86.616397,85.57964,86.226416,85.849059,86.792451,85.66038,84.867533
1,relu,2,85.485393,75.306314,83.788878,86.804903,88.689917,83.694625,80.943394,85.754716,84.528303,87.075472,84.207191
6,relu,7,80.584353,87.087655,85.296887,80.584353,81.526864,83.223373,83.679247,83.018869,81.415093,82.452828,82.886952


In [39]:
report = record3.sort_values(by='AVG', ascending=False)
report = report.to_excel('CNN_MPQA_3.xlsx', sheet_name='dynamic')