# MLP Classification with TREC Dataset
<hr>

We will build a text classification model using MLP model on the TREC Dataset. 

## Load the library

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk
import random
from nltk.corpus import stopwords, twitter_samples
# from nltk.tokenize import TweetTokenizer
from sklearn.model_selection import KFold
from nltk.stem import PorterStemmer
from string import punctuation
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.preprocessing.text import Tokenizer
import time

%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False
# nltk.download('twitter_samples')

## Load the Dataset

In [2]:
corpus = pd.read_pickle('../../../0_data/TREC/TREC.pkl')
corpus.label = corpus.label.astype(int)
print(corpus.shape)
corpus

(5952, 3)


Unnamed: 0,sentence,label,split
0,how did serfdom develop in and then leave russ...,0,train
1,what films featured the character popeye doyle ?,1,train
2,how can i find a list of celebrities ' real na...,0,train
3,what fowl grabs the spotlight after the chines...,1,train
4,what is the full form of .com ?,2,train
...,...,...,...
5947,who was the 22nd president of the us ?,3,test
5948,what is the money they use in zambia ?,1,test
5949,how many feet in a mile ?,5,test
5950,what is the birthstone of october ?,1,test


In [3]:
corpus.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5952 entries, 0 to 5951
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   sentence  5952 non-null   object
 1   label     5952 non-null   int32 
 2   split     5952 non-null   object
dtypes: int32(1), object(2)
memory usage: 116.4+ KB


In [4]:
corpus.groupby( by=['split','label']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,sentence
split,label,Unnamed: 2_level_1
test,0,138
test,1,94
test,2,9
test,3,65
test,4,81
test,5,113
train,0,1162
train,1,1250
train,2,86
train,3,1223


In [5]:
corpus.groupby(by='split').count()

Unnamed: 0_level_0,sentence,label
split,Unnamed: 1_level_1,Unnamed: 2_level_1
test,500,500
train,5452,5452


In [6]:
# Separate the sentences and the labels
# Separate the sentences and the labels for training and testing
train_x = list(corpus[corpus.split=='train'].sentence)
train_y = np.array(corpus[corpus.split=='train'].label)
print(len(train_x))
print(len(train_y))

test_x = list(corpus[corpus.split=='test'].sentence)
test_y = np.array(corpus[corpus.split=='test'].label)
print(len(test_x))
print(len(test_y))

5452
5452
500
500


# Data Preprocessing
<hr>

Preparing data for word embedding, especially for pre-trained word embedding like Word2Vec or GloVe, __don't use standard preprocessing steps like stemming or stopword removal__. Compared to our approach on cleaning the text when doing word count based feature extraction (e.g. TFIDF) such as removing stopwords, stemming etc, now we will keep these words as we do not want to lose such information that might help the model learn better.

__Tomas Mikolov__, one of the developers of Word2Vec, in _word2vec-toolkit: google groups thread., 2015_, suggests only very minimal text cleaning is required when learning a word embedding model. Sometimes, it's good to disconnect
In short, what we will do is:
- Puntuations removal
- Lower the letter case
- Tokenization

The process above will be handled by __Tokenizer__ class in TensorFlow

- <b>One way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set.</b>## Develop Vocabulary

A part of preparing text for text classification involves defining and tailoring the vocabulary of words supported by the model. **We can do this by loading all of the documents in the dataset and building a set of words.**

The larger the vocabulary, the more sparse the representation of each word or document. So, we may decide to support all of these words, or perhaps discard some. The final chosen vocabulary can then be saved to a file for later use, such as filtering words in new documents in the future.

In [7]:
# Define a function to compute the max length of sequence
def max_length(sequences):
    '''
    input:
        sequences: a 2D list of integer sequences
    output:
        max_length: the max length of the sequences
    '''
    max_length = 0
    for i, seq in enumerate(sequences):
        length = len(seq)
        if max_length < length:
            max_length = length
    return max_length

In [8]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"

# Separate the sentences and the labels
train_x = list(corpus[corpus.split=='train'].sentence)
train_y = np.array(corpus[corpus.split=='train'].label)
test_x = list(corpus[corpus.split=='test'].sentence)
test_y = np.array(corpus[corpus.split=='test'].label)

# Cleaning and Tokenization
tokenizer = Tokenizer(oov_token=oov_tok)
tokenizer.fit_on_texts(train_x)

print("Example of sentence: ", train_x[4])

# Turn the text into sequence
training_sequences = tokenizer.texts_to_sequences(train_x)
max_len = max_length(training_sequences)

print('Into a sequence of int:', training_sequences[4])

# Pad the sequence to have the same size
training_padded = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
print('Into a padded sequence:', training_padded[4])

Example of sentence:  what is the full form of .com ?
Into a sequence of int: [3, 4, 2, 471, 261, 5, 372]
Into a padded sequence: [  3   4   2 471 261   5 372   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]


In [9]:
# See the first 10 words in the vocabulary

word_index = tokenizer.word_index
for i, word in enumerate(word_index):
    print(word, word_index.get(word))
    if i==9:
        break
vocab_size = len(word_index)+1
print(vocab_size)

<UNK> 1
the 2
what 3
is 4
of 5
in 6
a 7
how 8
's 9
was 10
8461


# Model 1: Embedding Random
<hr>

A __standard model__ for document classification is to use (quoted from __Jason Brownlee__, the author of [machinelearningmastery.com](https://machinelearningmastery.com)):
>- Word Embedding: A distributed representation of words where different words that have a similar meaning (based on their usage) also have a similar representation.
>- Convolutional Model: A feature extraction model that learns to extract salient features from documents represented using a word embedding.
>- Fully Connected Model: The interpretation of extracted features in terms of a predictive output.


Therefore, the model is comprised of the following elements:
- __Input layer__ that defines the length of input sequences.
- __Embedding layer__ set to the size of the vocabulary and 100-dimensional real-valued representations.
- __Conv1D layer__ with 32 filters and a kernel size set to the number of words to read at once.
- __MaxPooling1D layer__ to consolidate the output from the convolutional layer.
- __Flatten layer__ to reduce the three-dimensional output to two dimensional for concatenation.

The CNN model is inspired by __Yoon Kim__ paper in his study on the use of Word Embedding + CNN for text classification. The hyperparameters we use based on his study are as follows:
- Transfer function: rectified linear.
- Kernel sizes: 1-8.
- Number of filters: 100.
- Dropout rate: 0.5.
- L2 Constraint: 3.
- Batch Size: 50.
- Update Rule: Adam

We will perform the best parameter using __grid search__ and 10-fold cross validation.

## CNN Model

Now, we will build Convolutional Neural Network (CNN) models to classify encoded documents as either positive or negative.

The model takes inspiration from `CNN for Sentence Classification` by *Yoon Kim*.

Now, we will define our CNN model as follows:
- One Conv layer with 100 filters, kernel size 5, and relu activation function;
- One MaxPool layer with pool size = 2;
- One Dropout layer after flattened;
- Optimizer: Adam (The best learning algorithm so far)
- Loss function: binary cross-entropy (suited for binary classification problem)

**Note**: 
- The whole purpose of dropout layers is to tackle the problem of over-fitting and to introduce generalization to the model. Hence it is advisable to keep dropout parameter near 0.5 in hidden layers. 
- https://missinglink.ai/guides/keras/keras-conv1d-working-1d-convolutional-neural-networks-keras/

In [37]:
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import MaxNorm

def define_model(filters = 100, kernel_size = 3, activation='relu', input_dim = None, output_dim=300, max_length = None ):
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(input_dim=vocab_size, 
                                  output_dim=output_dim, 
                                  input_length=max_length, 
                                  input_shape=(max_length, )),
        
        tf.keras.layers.Conv1D(filters=filters, kernel_size = kernel_size, activation = activation, 
                               # set 'axis' value to the first and second axis of conv1D weights (rows, cols)
                               kernel_constraint= MaxNorm( max_value=3, axis=[0,1])),
        
        tf.keras.layers.MaxPool1D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation=activation, 
                              # set axis to 0 to constrain each weight vector of length (input_dim,) in dense layer
                              kernel_constraint = MaxNorm( max_value=3, axis=0)),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=6, activation='softmax')
    ])
    
    model.compile( loss = 'sparse_categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
#     model.summary()
    return model

In [38]:
model_0 = define_model( input_dim=1000, max_length=100)
model_0.summary()

Model: "sequential_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_18 (Embedding)     (None, 100, 300)          2538300   
_________________________________________________________________
conv1d_18 (Conv1D)           (None, 98, 100)           90100     
_________________________________________________________________
max_pooling1d_18 (MaxPooling (None, 49, 100)           0         
_________________________________________________________________
flatten_18 (Flatten)         (None, 4900)              0         
_________________________________________________________________
dropout_36 (Dropout)         (None, 4900)              0         
_________________________________________________________________
dense_36 (Dense)             (None, 10)                49010     
_________________________________________________________________
dropout_37 (Dropout)         (None, 10)              

In [48]:
class myCallback(tf.keras.callbacks.Callback):
    # Overide the method on_epoch_end() for our benefit
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') > 0.93):
            print("\nReached 93% accuracy so cancelling training!")
            self.model.stop_training=True


callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
                                             patience=20, verbose=2, 
                                             mode='auto', restore_best_weights=True)
# callbacks = myCallback()

## Train and Test the Model

In [49]:
# Parameter Initialization
trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"
activations = ['relu', 'tanh']
filters = 100
kernel_sizes = [1, 2, 3, 4, 5, 6]

columns = ['Activation', 'Filters', 'Acc']
record = pd.DataFrame(columns = columns)

# Separate the sentences and the labels
train_x = list(corpus[corpus.split=='train'].sentence)
train_y = np.array(corpus[corpus.split=='train'].label)
test_x = list(corpus[corpus.split=='test'].sentence)
test_y = np.array(corpus[corpus.split=='test'].label)

for activation in activations:

    for kernel_size in kernel_sizes:

        # encode data using
        # Cleaning and Tokenization
        tokenizer = Tokenizer(oov_token=oov_tok)
        tokenizer.fit_on_texts(train_x)

        # Turn the text into sequence
        training_sequences = tokenizer.texts_to_sequences(train_x)
        test_sequences = tokenizer.texts_to_sequences(test_x)

        max_len = max_length(training_sequences)

        # Pad the sequence to have the same size
        Xtrain = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
        Xtest = pad_sequences(test_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)

        word_index = tokenizer.word_index
        vocab_size = len(word_index)+1

        # Define the input shape
        model = define_model(filters, kernel_size, activation, input_dim=vocab_size, max_length=max_len)

        # Train the model
        model.fit(Xtrain, train_y, batch_size=50, epochs=60, verbose=2, 
                  callbacks=[callbacks], validation_data=(Xtest, test_y))

        # evaluate the model
        loss, acc = model.evaluate(Xtest, test_y, verbose=0)
        print('Test Accuracy: {}'.format(acc*100))

        parameters = [activation, kernel_size]
        entries = parameters + [acc]

        temp = pd.DataFrame([entries], columns=columns)
        record = record.append(temp, ignore_index=True)
        print()
        print(record)
        print()

Epoch 1/60
110/110 - 4s - loss: 1.5717 - accuracy: 0.3254 - val_loss: 1.2709 - val_accuracy: 0.5900
Epoch 2/60
110/110 - 4s - loss: 1.1928 - accuracy: 0.5072 - val_loss: 0.9443 - val_accuracy: 0.6980
Epoch 3/60
110/110 - 4s - loss: 0.9403 - accuracy: 0.6286 - val_loss: 0.7039 - val_accuracy: 0.8140
Epoch 4/60
110/110 - 4s - loss: 0.7102 - accuracy: 0.7208 - val_loss: 0.5478 - val_accuracy: 0.8620
Epoch 5/60
110/110 - 4s - loss: 0.5936 - accuracy: 0.7619 - val_loss: 0.5087 - val_accuracy: 0.8600
Epoch 6/60
110/110 - 4s - loss: 0.5077 - accuracy: 0.7876 - val_loss: 0.4671 - val_accuracy: 0.8560
Epoch 7/60
110/110 - 4s - loss: 0.4710 - accuracy: 0.7997 - val_loss: 0.4799 - val_accuracy: 0.8740
Epoch 8/60
110/110 - 4s - loss: 0.4430 - accuracy: 0.8138 - val_loss: 0.5056 - val_accuracy: 0.8800
Epoch 9/60
110/110 - 4s - loss: 0.4102 - accuracy: 0.8278 - val_loss: 0.4895 - val_accuracy: 0.8580
Epoch 10/60
110/110 - 4s - loss: 0.4020 - accuracy: 0.8245 - val_loss: 0.5055 - val_accuracy: 0.8500

KeyboardInterrupt: ignored

## Summary

In [None]:
record.sort_values(by='Acc', ascending=False)

Unnamed: 0,Activation,Filters,Acc
3,relu,4,0.898
7,tanh,2,0.898
11,tanh,6,0.894
8,tanh,3,0.888
9,tanh,4,0.888
0,relu,1,0.882
5,relu,6,0.882
10,tanh,5,0.882
1,relu,2,0.88
6,tanh,1,0.878


In [None]:
report = record.sort_values(by='Acc', ascending=False)
report = report.to_excel('CNN_TREC.xlsx', sheet_name='random')

# Model 2: Word2Vec Static

__Using and updating pre-trained embeddings__
* In this part, we will create an Embedding layer in Tensorflow Keras using a pre-trained word embedding called Word2Vec 300-d tht has been trained 100 bilion words from Google News.
* In this part,  we will leave the embeddings fixed instead of updating them (dynamic).

1. __Load `Word2Vec` Pre-trained Word Embedding__

In [10]:
from gensim.models import KeyedVectors
word2vec = KeyedVectors.load_word2vec_format('../GoogleNews-vectors-negative300.bin', binary=True)

In [11]:
# Access the dense vector value for the word 'handsome'
# word2vec.word_vec('handsome') # 0.11376953
word2vec.word_vec('cool') # 1.64062500e-01

array([ 1.64062500e-01,  1.87500000e-01, -4.10156250e-02,  1.25000000e-01,
       -3.22265625e-02,  8.69140625e-02,  1.19140625e-01, -1.26953125e-01,
        1.77001953e-02,  8.83789062e-02,  2.12402344e-02, -2.00195312e-01,
        4.83398438e-02, -1.01074219e-01, -1.89453125e-01,  2.30712891e-02,
        1.17675781e-01,  7.51953125e-02, -8.39843750e-02, -1.33666992e-02,
        1.53320312e-01,  4.08203125e-01,  3.80859375e-02,  3.36914062e-02,
       -4.02832031e-02, -6.88476562e-02,  9.03320312e-02,  2.12890625e-01,
        1.72119141e-02, -6.44531250e-02, -1.29882812e-01,  1.40625000e-01,
        2.38281250e-01,  1.37695312e-01, -1.76757812e-01, -2.71484375e-01,
       -1.36718750e-01, -1.69921875e-01, -9.15527344e-03,  3.47656250e-01,
        2.22656250e-01, -3.06640625e-01,  1.98242188e-01,  1.33789062e-01,
       -4.34570312e-02, -5.12695312e-02, -3.46679688e-02, -8.49609375e-02,
        1.01562500e-01,  1.42578125e-01, -7.95898438e-02,  1.78710938e-01,
        2.30468750e-01,  

2. __Check number of training words present in Word2Vec__

In [12]:
def training_words_in_word2vector(word_to_vec_map, word_to_index):
    '''
    input:
        word_to_vec_map: a word2vec GoogleNews-vectors-negative300.bin model loaded using gensim.models
        word_to_index: word to index mapping from training set
    '''
    
    vocab_size = len(word_to_index) + 1
    count = 0
    # Set each row "idx" of the embedding matrix to be 
    # the word vector representation of the idx'th word of the vocabulary
    for word, idx in word_to_index.items():
        if word in word_to_vec_map:
            count+=1
            
    return print('Found {} words present from {} training vocabulary in the set of pre-trained word vector'.format(count, vocab_size))

In [13]:
# Separate the sentences and the labels
sentences, labels = list(corpus.sentence), list(corpus.label)

# Cleaning and Tokenization
tokenizer = Tokenizer(oov_token=oov_tok)
tokenizer.fit_on_texts(sentences)

word_index = tokenizer.word_index
training_words_in_word2vector(word2vec, word_index)

Found 7526 words present from 8761 training vocabulary in the set of pre-trained word vector


2. __Define a `pretrained_embedding_layer` function__

In [14]:
emb_mean = word2vec.vectors.mean()
emb_std = word2vec.vectors.std()
print('emb_mean: ', emb_mean)
print('emb_std: ', emb_std)

emb_mean:  -0.003527845
emb_std:  0.13315111


In [15]:
from tensorflow.keras.layers import Embedding

def pretrained_embedding_matrix(word_to_vec_map, word_to_index, emb_mean, emb_std):
    '''
    input:
        word_to_vec_map: a word2vec GoogleNews-vectors-negative300.bin model loaded using gensim.models
        word_to_index: word to index mapping from training set
    '''
    np.random.seed(2021)
    
    # adding 1 to fit Keras embedding (requirement)
    vocab_size = len(word_to_index) + 1
    # define dimensionality of your pre-trained word vectors (= 300)
    emb_dim = word_to_vec_map.word_vec('handsome').shape[0]
    
    # initialize the matrix with generic normal distribution values
    embed_matrix = np.random.normal(emb_mean, emb_std, (vocab_size, emb_dim))
    
    # Set each row "idx" of the embedding matrix to be 
    # the word vector representation of the idx'th word of the vocabulary
    for word, idx in word_to_index.items():
        if word in word_to_vec_map:
            embed_matrix[idx] = word_to_vec_map.get_vector(word)
            
    return embed_matrix

In [16]:
# Test the function
w_2_i = {'<UNK>': 1, 'handsome': 2, 'cool': 3, 'shit': 4 }
em_matrix = pretrained_embedding_matrix(word2vec, w_2_i, emb_mean, emb_std)
em_matrix

array([[ 0.19468211,  0.08648376, -0.05924511, ..., -0.16683994,
        -0.09975549, -0.08595189],
       [-0.13509196, -0.07441947,  0.15388953, ..., -0.05400787,
        -0.13156594, -0.05996158],
       [ 0.11376953,  0.1796875 , -0.265625  , ..., -0.21875   ,
        -0.03930664,  0.20996094],
       [ 0.1640625 ,  0.1875    , -0.04101562, ...,  0.10888672,
        -0.01019287,  0.02075195],
       [ 0.10888672, -0.16699219,  0.08984375, ..., -0.19628906,
        -0.23144531,  0.04614258]])

## CNN Model

In [17]:
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import MaxNorm

def define_model_2(filters = 100, kernel_size = 3, activation='relu', 
                 input_dim = None, output_dim=300, max_length = None, emb_matrix = None):
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(input_dim=input_dim, 
                                  output_dim=output_dim, 
                                  input_length=max_length, 
                                  input_shape=(max_length, ),
                                  # Assign the embedding weight with word2vec embedding marix
                                  weights = [emb_matrix],
                                  # Set the weight to be not trainable (static)
                                  trainable = False),
        
        tf.keras.layers.Conv1D(filters=filters, kernel_size = kernel_size, activation = activation, 
                               # set 'axis' value to the first and second axis of conv1D weights (rows, cols)
                               kernel_constraint= MaxNorm( max_value=3, axis=[0,1])),
        
        tf.keras.layers.MaxPool1D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation=activation, 
                              # set axis to 0 to constrain each weight vector of length (input_dim,) in dense layer
                              kernel_constraint = MaxNorm( max_value=3, axis=0)),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=6, activation='softmax')
    ])
    
    model.compile( loss = 'sparse_categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
#     model.summary()
    return model

In [18]:
model_0 = define_model_2( input_dim=1000, max_length=100, emb_matrix=np.random.rand(1000, 300))
model_0.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 300)          300000    
_________________________________________________________________
conv1d (Conv1D)              (None, 98, 100)           90100     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 49, 100)           0         
_________________________________________________________________
flatten (Flatten)            (None, 4900)              0         
_________________________________________________________________
dropout (Dropout)            (None, 4900)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                49010     
_________________________________________________________________
dropout_1 (Dropout)          (None, 10)                0

## Train and Test the Model

In [20]:
class myCallback(tf.keras.callbacks.Callback):
    # Overide the method on_epoch_end() for our benefit
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') >= 0.9):
            print("\nReached 90% accuracy so cancelling training!")
            self.model.stop_training=True

callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
                                             patience=30, verbose=2, 
                                             mode='auto', restore_best_weights=True)

In [22]:
# Parameter Initialization
trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"
activations = ['relu']
filters = 100
kernel_sizes = [1, 2, 3, 4, 5, 6]
emb_mean = emb_mean
emb_std = emb_std

columns = ['Activation', 'Filters', 'Acc']
record2 = pd.DataFrame(columns = columns)

# Separate the sentences and the labels
train_x = list(corpus[corpus.split=='train'].sentence)
train_y = np.array(corpus[corpus.split=='train'].label)
test_x = list(corpus[corpus.split=='test'].sentence)
test_y = np.array(corpus[corpus.split=='test'].label)

for activation in activations:
    for kernel_size in kernel_sizes:
            
        # encode data using
        # Cleaning and Tokenization
        tokenizer = Tokenizer(oov_token=oov_tok)
        tokenizer.fit_on_texts(train_x)

        # Turn the text into sequence
        training_sequences = tokenizer.texts_to_sequences(train_x)
        test_sequences = tokenizer.texts_to_sequences(test_x)

        max_len = max_length(training_sequences)

        # Pad the sequence to have the same size
        Xtrain = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
        Xtest = pad_sequences(test_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)

        word_index = tokenizer.word_index
        vocab_size = len(word_index)+1
        
        emb_matrix = pretrained_embedding_matrix(word2vec, word_index, emb_mean, emb_std)
        
        # Define the input shape
        model = define_model_2(filters, kernel_size, activation, input_dim=vocab_size, 
                              max_length=max_len, emb_matrix=emb_matrix)

        # Train the model
        model.fit(Xtrain, train_y, batch_size=50, epochs=200, verbose=2, 
                  callbacks=[callbacks], validation_data=(Xtest, test_y))

        # evaluate the model
        loss, acc = model.evaluate(Xtest, test_y, verbose=0)
        print('Test Accuracy: {}'.format(acc*100))
            
        parameters = [activation, kernel_size]
        entries = parameters + [acc*100]

        temp = pd.DataFrame([entries], columns=columns)
        record2 = record2.append(temp, ignore_index=True)
        print()
        print(record2)
        print()

Epoch 1/200
110/110 - 2s - loss: 1.7617 - accuracy: 0.2284 - val_loss: 1.7203 - val_accuracy: 0.2400
Epoch 2/200
110/110 - 1s - loss: 1.6113 - accuracy: 0.3096 - val_loss: 1.4970 - val_accuracy: 0.3200
Epoch 3/200
110/110 - 1s - loss: 1.3879 - accuracy: 0.4320 - val_loss: 1.2308 - val_accuracy: 0.6540
Epoch 4/200
110/110 - 1s - loss: 1.1771 - accuracy: 0.5367 - val_loss: 0.9947 - val_accuracy: 0.6880
Epoch 5/200
110/110 - 1s - loss: 1.0652 - accuracy: 0.5776 - val_loss: 0.8678 - val_accuracy: 0.7420
Epoch 6/200
110/110 - 1s - loss: 1.0141 - accuracy: 0.5930 - val_loss: 0.7924 - val_accuracy: 0.7840
Epoch 7/200
110/110 - 1s - loss: 0.9387 - accuracy: 0.6288 - val_loss: 0.7072 - val_accuracy: 0.8240
Epoch 8/200
110/110 - 1s - loss: 0.9013 - accuracy: 0.6344 - val_loss: 0.6719 - val_accuracy: 0.8140
Epoch 9/200
110/110 - 1s - loss: 0.8645 - accuracy: 0.6568 - val_loss: 0.6207 - val_accuracy: 0.8400
Epoch 10/200
110/110 - 1s - loss: 0.8422 - accuracy: 0.6621 - val_loss: 0.5932 - val_accura

Epoch 27/200
110/110 - 1s - loss: 0.4549 - accuracy: 0.8344 - val_loss: 0.2896 - val_accuracy: 0.9180
Epoch 28/200
110/110 - 1s - loss: 0.4426 - accuracy: 0.8382 - val_loss: 0.3076 - val_accuracy: 0.9120
Epoch 29/200
110/110 - 2s - loss: 0.4370 - accuracy: 0.8377 - val_loss: 0.2949 - val_accuracy: 0.9160
Epoch 30/200
110/110 - 1s - loss: 0.4314 - accuracy: 0.8430 - val_loss: 0.2981 - val_accuracy: 0.9100
Epoch 31/200
110/110 - 1s - loss: 0.4334 - accuracy: 0.8386 - val_loss: 0.2999 - val_accuracy: 0.9140
Epoch 32/200
110/110 - 1s - loss: 0.4175 - accuracy: 0.8500 - val_loss: 0.2826 - val_accuracy: 0.9220
Epoch 33/200
110/110 - 1s - loss: 0.4182 - accuracy: 0.8459 - val_loss: 0.2871 - val_accuracy: 0.9120
Epoch 34/200
110/110 - 2s - loss: 0.4328 - accuracy: 0.8390 - val_loss: 0.2816 - val_accuracy: 0.9120
Epoch 35/200
110/110 - 1s - loss: 0.4137 - accuracy: 0.8509 - val_loss: 0.2809 - val_accuracy: 0.9120
Epoch 36/200
110/110 - 1s - loss: 0.4081 - accuracy: 0.8492 - val_loss: 0.2905 - v

Test Accuracy: 91.20000004768372

  Activation Filters        Acc
0       relu       1  88.599998
1       relu       2  92.199999
2       relu       3  91.200000

Epoch 1/200
110/110 - 3s - loss: 1.7289 - accuracy: 0.2324 - val_loss: 1.5705 - val_accuracy: 0.4920
Epoch 2/200
110/110 - 2s - loss: 1.4681 - accuracy: 0.3615 - val_loss: 1.2125 - val_accuracy: 0.6380
Epoch 3/200
110/110 - 2s - loss: 1.2327 - accuracy: 0.4776 - val_loss: 0.9330 - val_accuracy: 0.6500
Epoch 4/200
110/110 - 2s - loss: 1.0526 - accuracy: 0.5737 - val_loss: 0.7482 - val_accuracy: 0.8060
Epoch 5/200
110/110 - 2s - loss: 0.9326 - accuracy: 0.6335 - val_loss: 0.6408 - val_accuracy: 0.8440
Epoch 6/200
110/110 - 2s - loss: 0.8510 - accuracy: 0.6610 - val_loss: 0.5734 - val_accuracy: 0.8680
Epoch 7/200
110/110 - 2s - loss: 0.7547 - accuracy: 0.7069 - val_loss: 0.4702 - val_accuracy: 0.8680
Epoch 8/200
110/110 - 2s - loss: 0.6897 - accuracy: 0.7296 - val_loss: 0.4368 - val_accuracy: 0.8760
Epoch 9/200
110/110 - 2s - lo

110/110 - 3s - loss: 1.6664 - accuracy: 0.2616 - val_loss: 1.4479 - val_accuracy: 0.5580
Epoch 2/200
110/110 - 2s - loss: 1.4076 - accuracy: 0.3925 - val_loss: 1.1469 - val_accuracy: 0.6880
Epoch 3/200
110/110 - 2s - loss: 1.1498 - accuracy: 0.5167 - val_loss: 0.8717 - val_accuracy: 0.7780
Epoch 4/200
110/110 - 2s - loss: 1.0163 - accuracy: 0.5725 - val_loss: 0.7249 - val_accuracy: 0.8300
Epoch 5/200
110/110 - 2s - loss: 0.9012 - accuracy: 0.6300 - val_loss: 0.5996 - val_accuracy: 0.8620
Epoch 6/200
110/110 - 2s - loss: 0.8182 - accuracy: 0.6706 - val_loss: 0.5699 - val_accuracy: 0.8460
Epoch 7/200
110/110 - 2s - loss: 0.7670 - accuracy: 0.6871 - val_loss: 0.5215 - val_accuracy: 0.8640
Epoch 8/200
110/110 - 2s - loss: 0.7224 - accuracy: 0.7065 - val_loss: 0.4985 - val_accuracy: 0.8620
Epoch 9/200
110/110 - 2s - loss: 0.6883 - accuracy: 0.7177 - val_loss: 0.4569 - val_accuracy: 0.8760
Epoch 10/200
110/110 - 2s - loss: 0.6487 - accuracy: 0.7298 - val_loss: 0.4541 - val_accuracy: 0.8780
E

Epoch 12/200
110/110 - 2s - loss: 0.6099 - accuracy: 0.7850 - val_loss: 0.4991 - val_accuracy: 0.8740
Epoch 13/200
110/110 - 3s - loss: 0.6009 - accuracy: 0.7815 - val_loss: 0.4742 - val_accuracy: 0.8900
Epoch 14/200
110/110 - 2s - loss: 0.5581 - accuracy: 0.7984 - val_loss: 0.4001 - val_accuracy: 0.8920
Epoch 15/200
110/110 - 2s - loss: 0.5488 - accuracy: 0.8001 - val_loss: 0.4173 - val_accuracy: 0.8640
Epoch 16/200
110/110 - 2s - loss: 0.5098 - accuracy: 0.8195 - val_loss: 0.4181 - val_accuracy: 0.8820
Epoch 17/200
110/110 - 3s - loss: 0.5154 - accuracy: 0.8168 - val_loss: 0.3877 - val_accuracy: 0.8980
Epoch 18/200
110/110 - 3s - loss: 0.4855 - accuracy: 0.8208 - val_loss: 0.3768 - val_accuracy: 0.9040
Epoch 19/200
110/110 - 3s - loss: 0.4786 - accuracy: 0.8287 - val_loss: 0.3489 - val_accuracy: 0.9060
Epoch 20/200
110/110 - 2s - loss: 0.4821 - accuracy: 0.8250 - val_loss: 0.3507 - val_accuracy: 0.9040
Epoch 21/200
110/110 - 2s - loss: 0.4711 - accuracy: 0.8274 - val_loss: 0.4098 - v

## Summary

In [23]:
record2.sort_values(by='Acc', ascending=False)

Unnamed: 0,Activation,Filters,Acc
1,relu,2,92.199999
4,relu,5,92.000002
3,relu,4,91.799998
2,relu,3,91.2
5,relu,6,91.000003
0,relu,1,88.599998


In [24]:
report = record2.sort_values(by='Acc', ascending=False)
report = report.to_excel('CNN_TREC_2.xlsx', sheet_name='static')

# Model 3: Word2Vec Dynamic

* In this part,  we will fine tune the embeddings while training (dynamic).

## CNN Model

In [25]:
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import MaxNorm

def define_model_3(filters = 100, kernel_size = 3, activation='relu', 
                 input_dim = None, output_dim=300, max_length = None, emb_matrix = None):
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(input_dim=input_dim, 
                                  output_dim=output_dim, 
                                  input_length=max_length, 
                                  input_shape=(max_length, ),
                                  # Assign the embedding weight with word2vec embedding marix
                                  weights = [emb_matrix],
                                  # Set the weight to be not trainable (static)
                                  trainable = True),
        
        tf.keras.layers.Conv1D(filters=filters, kernel_size = kernel_size, activation = activation, 
                               # set 'axis' value to the first and second axis of conv1D weights (rows, cols)
                               kernel_constraint= MaxNorm( max_value=3, axis=[0,1])),
        
        tf.keras.layers.MaxPool1D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation=activation, 
                              # set axis to 0 to constrain each weight vector of length (input_dim,) in dense layer
                              kernel_constraint = MaxNorm( max_value=3, axis=0)),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=6, activation='softmax')
    ])
    
    model.compile( loss = 'sparse_categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
#     model.summary()
    return model

In [26]:
model_0 = define_model_3( input_dim=1000, max_length=100, emb_matrix=np.random.rand(1000, 300))
model_0.summary()

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_12 (Embedding)     (None, 100, 300)          300000    
_________________________________________________________________
conv1d_12 (Conv1D)           (None, 98, 100)           90100     
_________________________________________________________________
max_pooling1d_12 (MaxPooling (None, 49, 100)           0         
_________________________________________________________________
flatten_12 (Flatten)         (None, 4900)              0         
_________________________________________________________________
dropout_24 (Dropout)         (None, 4900)              0         
_________________________________________________________________
dense_24 (Dense)             (None, 10)                49010     
_________________________________________________________________
dropout_25 (Dropout)         (None, 10)              

## Train and Test the Model

In [27]:
class myCallback(tf.keras.callbacks.Callback):
    # Overide the method on_epoch_end() for our benefit
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') >= 0.9):
            print("\nReached 90% accuracy so cancelling training!")
            self.model.stop_training=True

callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
                                             patience=30, verbose=2, 
                                             mode='auto', restore_best_weights=True)

In [32]:
# Parameter Initialization
trunc_type='post'
padding_type='post'
oov_tok = "<UNK>"
activations = ['relu']
filters = 100
kernel_sizes = [1, 2, 3, 4, 5, 6, 7, 8]
emb_mean = emb_mean
emb_std = emb_std

columns = ['Activation', 'Filters', 'Acc']
record3 = pd.DataFrame(columns = columns)

# Separate the sentences and the labels
train_x = list(corpus[corpus.split=='train'].sentence)
train_y = np.array(corpus[corpus.split=='train'].label)
test_x = list(corpus[corpus.split=='test'].sentence)
test_y = np.array(corpus[corpus.split=='test'].label)

for activation in activations:
    for kernel_size in kernel_sizes:
            
        # encode data using
        # Cleaning and Tokenization
        tokenizer = Tokenizer(oov_token=oov_tok)
        tokenizer.fit_on_texts(train_x)

        # Turn the text into sequence
        training_sequences = tokenizer.texts_to_sequences(train_x)
        test_sequences = tokenizer.texts_to_sequences(test_x)

        max_len = max_length(training_sequences)

        # Pad the sequence to have the same size
        Xtrain = pad_sequences(training_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)
        Xtest = pad_sequences(test_sequences, maxlen=max_len, padding=padding_type, truncating=trunc_type)

        word_index = tokenizer.word_index
        vocab_size = len(word_index)+1
        
        emb_matrix = pretrained_embedding_matrix(word2vec, word_index, emb_mean, emb_std)
        
        # Define the input shape
        model = define_model_3(filters, kernel_size, activation, input_dim=vocab_size, 
                              max_length=max_len, emb_matrix=emb_matrix)

        # Train the model
        model.fit(Xtrain, train_y, batch_size=50, epochs=200, verbose=2, 
                  callbacks=[callbacks], validation_data=(Xtest, test_y))

        # evaluate the model
        loss, acc = model.evaluate(Xtest, test_y, verbose=0)
        print('Test Accuracy: {}'.format(acc*100))
            
        parameters = [activation, kernel_size]
        entries = parameters + [acc*100]

        temp = pd.DataFrame([entries], columns=columns)
        record3 = record3.append(temp, ignore_index=True)
        print()
        print(record3)
        print()

Epoch 1/200
110/110 - 5s - loss: 1.6516 - accuracy: 0.2797 - val_loss: 1.3393 - val_accuracy: 0.6640
Epoch 2/200
110/110 - 4s - loss: 1.2424 - accuracy: 0.4949 - val_loss: 0.9631 - val_accuracy: 0.7620
Epoch 3/200
110/110 - 4s - loss: 0.9779 - accuracy: 0.6056 - val_loss: 0.7081 - val_accuracy: 0.8600
Epoch 4/200
110/110 - 4s - loss: 0.7964 - accuracy: 0.6920 - val_loss: 0.5762 - val_accuracy: 0.8520
Epoch 5/200
110/110 - 4s - loss: 0.6811 - accuracy: 0.7309 - val_loss: 0.5423 - val_accuracy: 0.8680
Epoch 6/200
110/110 - 4s - loss: 0.6048 - accuracy: 0.7652 - val_loss: 0.4952 - val_accuracy: 0.8820
Epoch 7/200
110/110 - 4s - loss: 0.5592 - accuracy: 0.7753 - val_loss: 0.4726 - val_accuracy: 0.8840
Epoch 8/200
110/110 - 4s - loss: 0.5132 - accuracy: 0.7999 - val_loss: 0.4633 - val_accuracy: 0.8860
Epoch 9/200
110/110 - 5s - loss: 0.4846 - accuracy: 0.8065 - val_loss: 0.4549 - val_accuracy: 0.8780
Epoch 10/200
110/110 - 5s - loss: 0.4617 - accuracy: 0.8133 - val_loss: 0.4648 - val_accura

Epoch 1/200
110/110 - 6s - loss: 1.7154 - accuracy: 0.2425 - val_loss: 1.4937 - val_accuracy: 0.4580
Epoch 2/200
110/110 - 6s - loss: 1.3110 - accuracy: 0.4406 - val_loss: 0.9345 - val_accuracy: 0.8000
Epoch 3/200
110/110 - 5s - loss: 0.9650 - accuracy: 0.5994 - val_loss: 0.6372 - val_accuracy: 0.8780
Epoch 4/200
110/110 - 5s - loss: 0.7752 - accuracy: 0.6889 - val_loss: 0.5680 - val_accuracy: 0.8900
Epoch 5/200
110/110 - 5s - loss: 0.6863 - accuracy: 0.7062 - val_loss: 0.4913 - val_accuracy: 0.8840
Epoch 6/200
110/110 - 5s - loss: 0.6109 - accuracy: 0.7379 - val_loss: 0.4747 - val_accuracy: 0.8920
Epoch 7/200
110/110 - 5s - loss: 0.5547 - accuracy: 0.7469 - val_loss: 0.4319 - val_accuracy: 0.8960
Epoch 8/200
110/110 - 5s - loss: 0.5004 - accuracy: 0.7663 - val_loss: 0.4203 - val_accuracy: 0.8820
Epoch 9/200
110/110 - 5s - loss: 0.4797 - accuracy: 0.7716 - val_loss: 0.3991 - val_accuracy: 0.9000
Epoch 10/200
110/110 - 5s - loss: 0.4423 - accuracy: 0.7808 - val_loss: 0.4113 - val_accura

Epoch 1/200
110/110 - 6s - loss: 1.6720 - accuracy: 0.2619 - val_loss: 1.4465 - val_accuracy: 0.6560
Epoch 2/200
110/110 - 5s - loss: 1.2612 - accuracy: 0.4784 - val_loss: 0.9203 - val_accuracy: 0.8120
Epoch 3/200
110/110 - 5s - loss: 0.9249 - accuracy: 0.6451 - val_loss: 0.6477 - val_accuracy: 0.8540
Epoch 4/200
110/110 - 5s - loss: 0.7256 - accuracy: 0.7232 - val_loss: 0.5076 - val_accuracy: 0.8460
Epoch 5/200
110/110 - 5s - loss: 0.6170 - accuracy: 0.7434 - val_loss: 0.4689 - val_accuracy: 0.8600
Epoch 6/200
110/110 - 5s - loss: 0.5558 - accuracy: 0.7595 - val_loss: 0.4467 - val_accuracy: 0.8760
Epoch 7/200
110/110 - 5s - loss: 0.5386 - accuracy: 0.7612 - val_loss: 0.4528 - val_accuracy: 0.8640
Epoch 8/200
110/110 - 5s - loss: 0.4685 - accuracy: 0.8083 - val_loss: 0.4472 - val_accuracy: 0.8660
Epoch 9/200
110/110 - 5s - loss: 0.4057 - accuracy: 0.8322 - val_loss: 0.4570 - val_accuracy: 0.8720
Epoch 10/200
110/110 - 5s - loss: 0.3908 - accuracy: 0.8353 - val_loss: 0.4618 - val_accura

Epoch 36/200
110/110 - 5s - loss: 0.2546 - accuracy: 0.8988 - val_loss: 0.6547 - val_accuracy: 0.8720
Epoch 37/200
110/110 - 5s - loss: 0.2611 - accuracy: 0.8722 - val_loss: 0.6417 - val_accuracy: 0.8840
Epoch 38/200
110/110 - 5s - loss: 0.2452 - accuracy: 0.8755 - val_loss: 0.7549 - val_accuracy: 0.8660
Epoch 39/200
110/110 - 6s - loss: 0.2579 - accuracy: 0.8632 - val_loss: 0.6498 - val_accuracy: 0.8760
Epoch 40/200
110/110 - 5s - loss: 0.2637 - accuracy: 0.8659 - val_loss: 0.6238 - val_accuracy: 0.8880
Epoch 41/200
110/110 - 6s - loss: 0.2670 - accuracy: 0.8654 - val_loss: 0.6745 - val_accuracy: 0.8820
Epoch 42/200
110/110 - 5s - loss: 0.2595 - accuracy: 0.8692 - val_loss: 0.6150 - val_accuracy: 0.8880
Restoring model weights from the end of the best epoch.
Epoch 00042: early stopping
Test Accuracy: 90.39999842643738

  Activation Filters        Acc
0       relu       1  88.599998
1       relu       2  90.200001
2       relu       3  91.000003
3       relu       4  90.200001
4       

Epoch 25/200
110/110 - 5s - loss: 0.2651 - accuracy: 0.9046 - val_loss: 0.5754 - val_accuracy: 0.8800
Epoch 26/200
110/110 - 6s - loss: 0.2621 - accuracy: 0.9066 - val_loss: 0.5495 - val_accuracy: 0.8720
Epoch 27/200
110/110 - 6s - loss: 0.2652 - accuracy: 0.9103 - val_loss: 0.4674 - val_accuracy: 0.8900
Epoch 28/200
110/110 - 6s - loss: 0.2748 - accuracy: 0.9109 - val_loss: 0.6235 - val_accuracy: 0.8720
Epoch 29/200
110/110 - 6s - loss: 0.2549 - accuracy: 0.9127 - val_loss: 0.5282 - val_accuracy: 0.8900
Epoch 30/200
110/110 - 5s - loss: 0.2578 - accuracy: 0.9087 - val_loss: 0.5377 - val_accuracy: 0.8780
Epoch 31/200
110/110 - 5s - loss: 0.2542 - accuracy: 0.9127 - val_loss: 0.5173 - val_accuracy: 0.8860
Epoch 32/200
110/110 - 5s - loss: 0.2495 - accuracy: 0.9131 - val_loss: 0.6545 - val_accuracy: 0.8640
Epoch 33/200
110/110 - 5s - loss: 0.2445 - accuracy: 0.9165 - val_loss: 0.6003 - val_accuracy: 0.8820
Epoch 34/200
110/110 - 6s - loss: 0.2485 - accuracy: 0.9105 - val_loss: 0.6341 - v

## Summary

In [33]:
record3.sort_values(by='Acc', ascending=False)

Unnamed: 0,Activation,Filters,Acc
2,relu,3,91.000003
6,relu,7,90.600002
5,relu,6,90.399998
1,relu,2,90.200001
3,relu,4,90.200001
7,relu,8,89.200002
0,relu,1,88.599998
4,relu,5,88.599998


In [34]:
report = record3.sort_values(by='Acc', ascending=False)
report = report.to_excel('CNN_TREC_3.xlsx', sheet_name='dynamic')