# Data Set (Training and Test set)

We used IMDb Dataset which can be downloaded from [here](https://ai.stanford.edu/~amaas/data/sentiment/).
This data set contains 50,000 reviews which is evenly split into two groups: 25,000 reviews for each of training and testing. The reviews for training and testing data sets contain a disjoint set of movies. Therefore, we can assume that the validation result with testing data set can be applicable for other movie reviews.

Each group has the same number of positive and negative reviews: a positive review has a score from 7 to 10 while a negative review has a score from 1 to 4. The reviews having score 5 or 6 are excluded to avoid vagueness.


# Environment

For this project, we used my own Linux machine having AMD Ryzen 7 2700X, 16GB Memory, Geforce RTX 2070.
In addition, Keras with Tensorflow backend is used for making a deep learning model.

In [1]:
from string import punctuation
import os 
#from os import listdir
from numpy import array
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
#
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import Embedding
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
#
from keras.callbacks import EarlyStopping, ModelCheckpoint
import keras.backend.tensorflow_backend as K
import tensorflow as tf
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer

import string
from nltk.corpus import stopwords
from nltk.corpus import words
from nltk.tokenize import word_tokenize
import glob
#from tqdm import tqdm
from tqdm import tqdm_notebook as tqdm
from nltk.stem.porter import PorterStemmer
from collections import Counter
from operator import itemgetter
import numpy as np
from multiprocessing import Pool
import sys

from IPython.display import HTML, display
import tabulate
import pandas as pd
from keras_tqdm import TQDMNotebookCallback


Using TensorFlow backend.


# Tensorflow initial setup

To allow Tensorflow to use enough GPU memory, *allow_growth* option is turned on.

In [2]:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
K.set_session(sess)

# Loading data set files

First of all, all the documents are loaded. The data sets for training and testing are stored in *data/train* and *data/test*, respectively. For each data set, positive and negative reviews are stored in *pos* and *neg* sub-directories.

I have attached the progress bars using the [tqdm](https://github.com/tqdm/tqdm), which is useful in dealting with large data by allowing us to estimate each time of the stages.

Referenced article for tqdm: https://towardsdatascience.com/progress-bars-in-python-4b44e8a4c482

In [3]:
# load all docs in a directory
def load_docs(directory):
    documents = list()
    # walk through all files in the folder
    for filename in tqdm(os.listdir(directory)):
        # create the full path of the file to open
        path = directory + '/' + filename
        with open(path, 'r') as f:
            # load the doc
            doc = f.read()
            # add to list
            documents.append(doc)
    return documents

# load all training reviews
print("Loading training-positive-docs")
global_train_positive_docs = load_docs('data/train/pos')
print("Loading training-negative-docs")
global_train_negative_docs = load_docs('data/train/neg')
# load all test reviews
print("Loading test-positive-docs")
global_test_positive_docs = load_docs('data/test/pos')
print("Loading test-negative-docs")
global_test_negative_docs = load_docs('data/test/neg')

Loading training-positive-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Loading training-negative-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Loading test-positive-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Loading test-negative-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))




# Cleaning documents

### Pre-processing techniques

In most of NLP releated works, documents are normally pre-processed to get better performance.
We tried to apply several techniques which are well-known as follows:

**1. Removing punctuations**  
Normally punctuations do not have any meaning, but they exist for understandability. Therefore, such punctuations should be removed. But, we did not remove the apostrophe mark (') since such removing caused the incorrect stemming.

**2. Removing stopwords**  
We filtered out the stopwords.
The stop words are those words that do not contribute to the deeper meaning of the phrase.
They are the most common words such as: “the“, “a“, and “is“.
NLTK provides a list of commonly agreed upon stop words for a variety of languages.

**3. Stemming**    
The *PorterStemmer* is provided in *NLTK python package*.
We made the words into lowercases and used the stemming method in order to both reduce the vocabulary and to focus on the sense or sentiment of a document rather than deeper meaning.

**4. Removing non-frequent words**   
It is important to define a vocabulary of known words when using a bag-of-words or embedding model.
The more words, the larger the representation of documents, therefore it is important to constrain the words to only those believed to be predictive. 

In this project, **we set up the vocabulary dictionary by removing the non-frequent words to prevent a model from overfitting.** 
This is implemeted in [*vocab.ipynb*](https://github.com/ahrimhan/data-science-project/tree/master/project2/src/vocab.ipynb).

* After removing all words that have a length <= 1 character, we first construct the vocabulary dictionary based on only reviews in the training dataset (Number of vocabularies: 52,826).
* Then, we iterate the vocabulary dictionary again for counting the word occurrences and removing the non-frequent words that have a low occurrence, such as only being used once or none. Thus, remaining vocabularies have the two or more occurrences (Number of filtered vocabularies: 30,819). These filtered vocabularies are saved in [*vocab.txt*](https://github.com/ahrimhan/data-science-project/tree/master/project2/src/vocab.txt).

In [4]:
remove_punctuation_table = str.maketrans('', '', '\'"!.,?:;')
stop_words = set(stopwords.words('english'))
# turn a doc into clean tokens
vocab = []

with open('./vocab/vocab.txt') as f:
    vocab = f.read().split() 
    
def clean_doc(doc):
    # split into tokens by white space
    tokens = word_tokenize(doc)
    
    # remove punctuation from each token
    tokens = [w.translate(remove_punctuation_table) for w in tokens]
    
    # remove stop words
    tokens = [w for w in tokens if w not in stop_words]
    
    # stemming
    porter = PorterStemmer()
    tokens = [porter.stem(w.lower()) for w in tokens]

    # filter out tokens not in vocab
    if len(vocab) > 0:
        tokens = [w for w in tokens if w in vocab]
    
    tokens = ' '.join(tokens)
    return tokens

### Multiprocessing

Pre-processing mentioned above requires heavy computation. 
To improve the speed, we parallelized the pre-processing using the *Pool module* in *multiprocessing package*.
Since we use a CPU having 8 cores, the size of Pool is set as 8.
By using this technique, **we could achieve 6~7 times speed up.** 
Using the single thread, it takes 10~12 minutes for cleaning up 12500 documents, whereas, using the multiple threads, it takes only 1 minute and 20~40 seconds.

In [5]:
# Serial version of clean_docs function
# def clean_docs(documents):
#     for doc in tqdm(documents):
#         clean_doc(doc)

# Parallel version of clean_docs function
def clean_docs(documents):
    # Since we use a CPU having 8 cores, the size of Pool is set as 8
    with Pool(8) as p:
        return list(tqdm(p.imap(clean_doc, documents), total=len(documents)))

print("Cleaning up for training-positive-docs")
cleaned_train_positive_docs = clean_docs(global_train_positive_docs)
print("Cleaning up for training-negative-docs")
cleaned_train_negative_docs = clean_docs(global_train_negative_docs)
print("Cleaning up for test-positive-docs")
cleaned_test_positive_docs = clean_docs(global_test_positive_docs)
print("Cleaning up for test-negative-docs")
cleaned_test_negative_docs = clean_docs(global_test_negative_docs) 

Cleaning up for training-positive-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Cleaning up for training-negative-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Cleaning up for test-positive-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Cleaning up for test-negative-docs


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))




In [6]:
print(cleaned_test_negative_docs[0].split()[0:100])

['nt', 'realli', 'consid', 'conserv', 'nt', 'person', 'offend', 'film', 'pretti', 'clear', 'plot', 'character', 'film', 'secondari', 'messag', 'and', 'messag', 'conserv', 'either', 'evil', 'stupid', 'charact', 'either', 'good', 'american', 'brainless', 'greedi', 'evil', 'conserv', 'there', 'noth', 'clever', 'creativ', 'nt', 'realli', 'mind', 'polit', 'bia', 'nt', 'purpos', 'behind', 'movi', 'and', 'clearli', 'br', 'br', 'on', 'posit', 'side', 'cast', 'wonder', 'chri', 'cooper', 'impress', 'funni', 'first', 'two', 'three', 'time', 'old', 'joke', 'told', 'br', 'br', 'so', 'realli', 'hate', 'conserv', 'probabl', 'enjoy', 'film', 'look', 'someth', 'realist', 'charact', 'stori', 'less', 'better', 'watch', 'someth', 'els']


# Encoding data set into sequence

To use documents as an input of a model, each document is encoded as a sequence object of Keras.
The function below encodes the documents as sequence objects as well as creates a list of labels: '0' for negative reviews and '1' for positive reviews.
We do not need the one hot encoding process (a function called *to_categorical()* in Keras) because there is only two classes of positive and negative.

In [7]:
def encode_data_set(tokenizer, positive_docs, negative_docs, max_word_length):
    docs = negative_docs + positive_docs
    # sequence encode
    encoded_docs = tokenizer.texts_to_sequences(docs)
    # pad sequences
    x = pad_sequences(encoded_docs, maxlen=max_word_length, padding='post')
    # define training labels
    y = array(([0] * len(negative_docs)) + ([1] * len(positive_docs)))
    return x, y

# Word Embedding
Word embedding is a common technique to deal with texts in Deep Learning.
To compare the effectiveness of use of pre-trained word embedding, here, both of pre-trained word embedding and new (not-trained) embedding will be used.

## Pre-trained word embedding
In this project, we will be using GloVe embeddings, which you can read about [here](https://nlp.stanford.edu/projects/glove/). GloVe stands for "Global Vectors for Word Representation". It's a somewhat popular embedding technique based on factorizing a matrix of word co-occurence statistics.

Specifically, we will use the 200-dimensional GloVe embeddings of 400k words computed on a 2014 dump of English Wikipedia. You can download them [here](http://nlp.stanford.edu/data/glove.6B.zip).

In addition, to check whether the pre-trained word embedding needs to be trained or not, we made the function below configurable for *trainable* parameter of Embedding object.

In [8]:
EMBEDDING_DIM = 200
def load_pre_trained_embedding(word_index, max_word_length, trainable_for_embedding):
    embeddings_index = {}
    with open('./glove/glove.6B.200d.txt') as f:
        for line in f:
            values = line.split()
            word = values[0]
            coefs = np.asarray(values[1:], dtype='float32')
            embeddings_index[word] = coefs

    embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
    for word, i in word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            # words not found in embedding index will be all-zeros.
            embedding_matrix[i] = embedding_vector
    return Embedding(len(word_index) + 1, EMBEDDING_DIM, input_length=max_word_length, weights=[embedding_matrix], trainable=trainable_for_embedding)

## New (not-trained) word embedding
New word embedding is created with no pre-trained weights, and it should be trainable always.

In [9]:
def new_embedding(word_index, max_word_length, trainable_for_embedding):
    # This is new embedding layer, so trainable must be True regardless the value of trainable_for_embedding
    return Embedding(len(word_index) + 1, EMBEDDING_DIM, input_length=max_word_length, trainable=True)

# Building a Deep Learning Model

To build a deep learning model, we basically use the sequential model of Keras.

First, the Embedding layer is located. There are two options of setting embedding layers: using the pre-trained word embedding or training new embedding from scratch.

Second, a series of convolution 1D and pooling layers are added according to typical CNN for text analysis. 

In order to check the effects of the number of convolution layers, we made the function below configurable to set the number of additional convolution layers.

Then, after flattening layer, fully connected dense layers are added.
Since this is a binary classification problem, we use the sigmoid function as an activation function for the final dense layer. If you try to predict a score of a review, it would be better to use 'softmax' function as the activation function.

In [10]:
def build_model(word_index, max_word_length, number_of_additional_conv_layers, use_pre_trained_embedding, trainable_for_embedding, number_of_filters, use_dropout):
    # define model
    model = Sequential()
    embedding_func = new_embedding
    if use_pre_trained_embedding: 
        embedding_func = load_pre_trained_embedding
    model.add(embedding_func(word_index, max_word_length, trainable_for_embedding))
    if use_dropout:
        model.add(Dropout(0.5))
    
    for i in range(number_of_additional_conv_layers):
        model.add(Conv1D(filters=number_of_filters, kernel_size=5, activation='relu'))
        model.add(MaxPooling1D(pool_size=4))
        if use_dropout:
            model.add(Dropout(0.5))
    
    model.add(Conv1D(filters=number_of_filters, kernel_size=5, activation='relu'))
    model.add(MaxPooling1D(pool_size=10))
    if use_dropout:
        model.add(Dropout(0.5))

    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.summary()
    # compile network
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# Building Deep Learning Models using Various Parameters

Now, all the functions defined before are combined in the function *build_and_train_model* as below.
To compare ML results under each combination of settings, it takes four parameters, then makes a model and train it according to the parameters. The length of combinations is 12.

The meaning of the parameters are as follows:
* **use_cleaned_docs**: cleaning review documents or not. 
* **number_of_additional_conv_layers**: the number of additional convolution layers. Basically, one convolution layer is used, but if you want to add more convolution layers, set this parameter to 1 or higher number.
* **use_pre_trained_embedding**: Using pre-trained embedding or not. If True, the GloVe embedding will be used as mentioned above.
* **trainable_for_embedding**: Training the embedding layer with training data set or freezing. Note, when using the new embedding layer, then trainable_for_embedding should be True. 

In [11]:
MODEL_DIR = './model/'
if not os.path.exists(MODEL_DIR):
    os.mkdir(MODEL_DIR)
    
#modelpath = "./model/{epoch:02d}-{val_loss:.4f}.hdf5"
modelpath = "./model/{epoch:02d}-{val_acc:.4f}.hdf5"
checkpointer = ModelCheckpoint(filepath=modelpath, monitor='val_acc', verbose=1, save_best_only=True)
#early_stopping_callback = EarlyStopping(monitor='val_loss', mode='min', patience=10)
early_stopping_callback = EarlyStopping(monitor='val_acc', mode='max', patience=2)

In [12]:
def build_and_train_model(use_cleaned_docs=False, 
                          number_of_additional_conv_layers=2, 
                          use_pre_trained_embedding=True,
                          trainable_for_embedding=True,
                          number_of_filters=64, 
                          use_dropout=True):
    train_positive_docs = global_train_positive_docs
    train_negative_docs = global_train_negative_docs
    test_positive_docs = global_test_positive_docs
    test_negative_docs = global_test_negative_docs
    
    # clean up documents if required
    if use_cleaned_docs:
        train_positive_docs = cleaned_train_positive_docs
        train_negative_docs = cleaned_train_negative_docs
        test_positive_docs = cleaned_test_positive_docs
        test_negative_docs = cleaned_test_negative_docs
    
    # create the tokenizer
    tokenizer = Tokenizer()
    train_docs = train_positive_docs + train_negative_docs
    # fit the tokenizer on the documents
    tokenizer.fit_on_texts(train_docs)
    #
    print('Fitted tokenizer on {} documents'.format(tokenizer.document_count))
    #print('{} words in dictionary'.format(tokenizer.num_words))
    print('Top 5 most common words are:', Counter(tokenizer.word_counts).most_common(5))
    
    # calculate maximum length of words in training docs
    max_word_length = max([len(s.split()) for s in train_docs])
    # get word_index
    word_index = tokenizer.word_index
    #print('Found %s unique tokens.' % len(word_index))

    # encode data into two sequences: x = input, y = output
    x_train, y_train = encode_data_set(tokenizer, train_positive_docs, train_negative_docs, max_word_length)
    x_test, y_test = encode_data_set(tokenizer, test_positive_docs, test_negative_docs, max_word_length)

    # build a model
    model = build_model(word_index, max_word_length, number_of_additional_conv_layers, use_pre_trained_embedding, trainable_for_embedding, number_of_filters, use_dropout)
    # fit network (Training)
    history = model.fit(x_train, y_train, epochs=16, verbose=2, validation_data=(x_test, y_test), batch_size=96, callbacks=[TQDMNotebookCallback(), early_stopping_callback, checkpointer])
    # evaluate
    loss, acc = model.evaluate(x_test, y_test, verbose=0)
    print('Test Accuracy: %.2f%%' % (acc*100))
    return history

In [13]:
exp_conditions = []
for use_cleaned_docs in [False]:
    for number_of_additional_conv_layers in [2, 0]:
        for use_pre_trained_embedding in [True, False]:
            for trainable_for_embedding in [True, False]:
                if (not use_pre_trained_embedding) and (not trainable_for_embedding):
                    continue
                for number_of_filters in [48, 96]:
                    for use_dropout in [True, False]:
                        exp_conditions.append({
                            'use_cleaned_docs': use_cleaned_docs, 
                            'number_of_additional_conv_layers': number_of_additional_conv_layers,
                            'use_pre_trained_embedding': use_pre_trained_embedding,
                            'trainable_for_embedding': trainable_for_embedding,
                            'number_of_filters': number_of_filters,
                            'use_dropout': use_dropout
                        })
     

columns_display = ['use_cleaned_docs','number_of_additional_conv_layers', 'use_pre_trained_embedding', 'trainable_for_embedding','number_of_filters','use_dropout']
exp_cond_df = pd.DataFrame(exp_conditions, columns=columns_display)
exp_cond_df

Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
0,False,2,True,True,48,True
1,False,2,True,True,48,False
2,False,2,True,True,96,True
3,False,2,True,True,96,False
4,False,2,True,False,48,True
5,False,2,True,False,48,False
6,False,2,True,False,96,True
7,False,2,True,False,96,False
8,False,2,False,True,48,True
9,False,2,False,True,48,False


The final combinations are 12, because the cases when no embedding layers 
(`use_pre_trained_embedding`= **False** and `trainable_for_embedding` = **False**)
will produce error and should be eliminated.

In [14]:
print(len(exp_cond_df))

24


In [None]:
exp_result = []
exp_model_config = []

for i, exp_cond in enumerate(exp_conditions):
    K.clear_session()
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    K.set_session(sess)
    display(exp_cond_df[i:i+1])
    history = build_and_train_model(**exp_cond)
    print(history)
    exp_result.append(history)


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
0,False,2,True,True,48,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 48)           0         
_________________________________________________________________
dropou

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 20s - loss: 0.6915 - acc: 0.5357 - val_loss: 0.6372 - val_acc: 0.6770

Epoch 00001: val_acc improved from -inf to 0.67696, saving model to ./model/01-0.6770.hdf5
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.4743 - acc: 0.7781 - val_loss: 0.3933 - val_acc: 0.8651

Epoch 00002: val_acc improved from 0.67696 to 0.86508, saving model to ./model/02-0.8651.hdf5
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.3288 - acc: 0.8643 - val_loss: 0.3507 - val_acc: 0.8848

Epoch 00003: val_acc improved from 0.86508 to 0.88476, saving model to ./model/03-0.8848.hdf5
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.2650 - acc: 0.8965 - val_loss: 0.3228 - val_acc: 0.8882

Epoch 00004: val_acc improved from 0.88476 to 0.88824, saving model to ./model/04-0.8882.hdf5
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.2203 - acc: 0.9146 - val_loss: 0.3026 - val_acc: 0.8935

Epoch 00005: val_acc improved from 0.88824 to 0.89352, saving model to ./model/05-0.8935.hdf5
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.1843 - acc: 0.9293 - val_loss: 0.3143 - val_acc: 0.8871

Epoch 00006: val_acc did not improve from 0.89352
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.1591 - acc: 0.9407 - val_loss: 0.2999 - val_acc: 0.8830

Epoch 00007: val_acc did not improve from 0.89352

Test Accuracy: 88.30%
<keras.callbacks.History object at 0x7f98aa6b2ef0>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
1,False,2,True,True,48,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 48)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 48)           11568     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 97, 48)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 93, 48)            11568 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4381 - acc: 0.7732 - val_loss: 0.2825 - val_acc: 0.8825

Epoch 00001: val_acc did not improve from 0.89352
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.2048 - acc: 0.9204 - val_loss: 0.2664 - val_acc: 0.8924

Epoch 00002: val_acc did not improve from 0.89352
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.1007 - acc: 0.9652 - val_loss: 0.3972 - val_acc: 0.8672

Epoch 00003: val_acc did not improve from 0.89352
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.0394 - acc: 0.9884 - val_loss: 0.3957 - val_acc: 0.8924

Epoch 00004: val_acc did not improve from 0.89352
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.0159 - acc: 0.9954 - val_loss: 0.4943 - val_acc: 0.8865

Epoch 00005: val_acc did not improve from 0.89352
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.0040 - acc: 0.9992 - val_loss: 0.5592 - val_acc: 0.8868

Epoch 00006: val_acc did not improve from 0.89352

Test Accuracy: 88.68%
<keras.callbacks.History object at 0x7f98a9977e80>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
2,False,2,True,True,96,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 96)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 493, 96)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 96)           46176 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 22s - loss: 0.6725 - acc: 0.5682 - val_loss: 0.5590 - val_acc: 0.7941

Epoch 00001: val_acc did not improve from 0.89352
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.3991 - acc: 0.8249 - val_loss: 0.3578 - val_acc: 0.8723

Epoch 00002: val_acc did not improve from 0.89352
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.2888 - acc: 0.8836 - val_loss: 0.3143 - val_acc: 0.8872

Epoch 00003: val_acc did not improve from 0.89352
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.2269 - acc: 0.9117 - val_loss: 0.3063 - val_acc: 0.8899

Epoch 00004: val_acc did not improve from 0.89352
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.1888 - acc: 0.9298 - val_loss: 0.2762 - val_acc: 0.8918

Epoch 00005: val_acc did not improve from 0.89352
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.1492 - acc: 0.9444 - val_loss: 0.2761 - val_acc: 0.8878

Epoch 00006: val_acc did not improve from 0.89352
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.1197 - acc: 0.9559 - val_loss: 0.2846 - val_acc: 0.8828

Epoch 00007: val_acc did not improve from 0.89352

Test Accuracy: 88.28%
<keras.callbacks.History object at 0x7f98b11fbc88>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
3,False,2,True,True,96,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 96)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 96)           46176     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 97, 96)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 93, 96)            46176 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 18s - loss: 0.4407 - acc: 0.7672 - val_loss: 0.3243 - val_acc: 0.8588

Epoch 00001: val_acc did not improve from 0.89352
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.2009 - acc: 0.9221 - val_loss: 0.2456 - val_acc: 0.9008

Epoch 00002: val_acc improved from 0.89352 to 0.90076, saving model to ./model/02-0.9008.hdf5
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.0959 - acc: 0.9676 - val_loss: 0.3139 - val_acc: 0.8846

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.0287 - acc: 0.9913 - val_loss: 0.4159 - val_acc: 0.8912

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 89.12%
<keras.callbacks.History object at 0x7f98b13ce4e0>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
4,False,2,True,False,48,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 48)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 493, 48)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 48)           11568 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.6995 - acc: 0.5121 - val_loss: 0.6912 - val_acc: 0.6064

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.5970 - acc: 0.6712 - val_loss: 0.5170 - val_acc: 0.8091

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.4796 - acc: 0.7774 - val_loss: 0.4513 - val_acc: 0.8341

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.4372 - acc: 0.7997 - val_loss: 0.4476 - val_acc: 0.8300

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.4173 - acc: 0.8123 - val_loss: 0.3858 - val_acc: 0.8516

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.4040 - acc: 0.8204 - val_loss: 0.4067 - val_acc: 0.8592

Epoch 00006: val_acc did not improve from 0.90076
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.3956 - acc: 0.8246 - val_loss: 0.4248 - val_acc: 0.8561

Epoch 00007: val_acc did not improve from 0.90076
Epoch 8/16


HBox(children=(IntProgress(value=0, description='Epoch 7', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.3894 - acc: 0.8268 - val_loss: 0.4034 - val_acc: 0.8610

Epoch 00008: val_acc did not improve from 0.90076
Epoch 9/16


HBox(children=(IntProgress(value=0, description='Epoch 8', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.3808 - acc: 0.8336 - val_loss: 0.3890 - val_acc: 0.8690

Epoch 00009: val_acc did not improve from 0.90076
Epoch 10/16


HBox(children=(IntProgress(value=0, description='Epoch 9', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.3766 - acc: 0.8362 - val_loss: 0.4332 - val_acc: 0.8296

Epoch 00010: val_acc did not improve from 0.90076
Epoch 11/16


HBox(children=(IntProgress(value=0, description='Epoch 10', max=25000, style=ProgressStyle(description_width='…

 - 11s - loss: 0.3737 - acc: 0.8367 - val_loss: 0.3902 - val_acc: 0.8586

Epoch 00011: val_acc did not improve from 0.90076

Test Accuracy: 85.86%
<keras.callbacks.History object at 0x7f98263ad5f8>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
5,False,2,True,False,48,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 48)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 48)           11568     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 97, 48)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 93, 48)            11568 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 9s - loss: 0.4921 - acc: 0.7346 - val_loss: 0.3601 - val_acc: 0.8431

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 9s - loss: 0.3204 - acc: 0.8635 - val_loss: 0.3393 - val_acc: 0.8496

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 9s - loss: 0.2712 - acc: 0.8878 - val_loss: 0.2919 - val_acc: 0.8746

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 9s - loss: 0.2241 - acc: 0.9090 - val_loss: 0.3157 - val_acc: 0.8655

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 9s - loss: 0.1761 - acc: 0.9324 - val_loss: 0.3397 - val_acc: 0.8604

Epoch 00005: val_acc did not improve from 0.90076

Test Accuracy: 86.04%
<keras.callbacks.History object at 0x7f97344ee550>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
6,False,2,True,False,96,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 96)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 493, 96)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 96)           46176 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.7009 - acc: 0.5149 - val_loss: 0.6832 - val_acc: 0.5532

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.5476 - acc: 0.7220 - val_loss: 0.4513 - val_acc: 0.8271

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.4486 - acc: 0.7939 - val_loss: 0.4102 - val_acc: 0.8407

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.4104 - acc: 0.8150 - val_loss: 0.3802 - val_acc: 0.8570

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.4001 - acc: 0.8235 - val_loss: 0.3762 - val_acc: 0.8577

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.3915 - acc: 0.8274 - val_loss: 0.3851 - val_acc: 0.8627

Epoch 00006: val_acc did not improve from 0.90076
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.3762 - acc: 0.8337 - val_loss: 0.3647 - val_acc: 0.8588

Epoch 00007: val_acc did not improve from 0.90076
Epoch 8/16


HBox(children=(IntProgress(value=0, description='Epoch 7', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.3717 - acc: 0.8368 - val_loss: 0.3478 - val_acc: 0.8643

Epoch 00008: val_acc did not improve from 0.90076
Epoch 9/16


HBox(children=(IntProgress(value=0, description='Epoch 8', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.3583 - acc: 0.8438 - val_loss: 0.3429 - val_acc: 0.8704

Epoch 00009: val_acc did not improve from 0.90076
Epoch 10/16


HBox(children=(IntProgress(value=0, description='Epoch 9', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.3586 - acc: 0.8445 - val_loss: 0.3549 - val_acc: 0.8736

Epoch 00010: val_acc did not improve from 0.90076
Epoch 11/16


HBox(children=(IntProgress(value=0, description='Epoch 10', max=25000, style=ProgressStyle(description_width='…

 - 15s - loss: 0.3535 - acc: 0.8468 - val_loss: 0.3100 - val_acc: 0.8777

Epoch 00011: val_acc did not improve from 0.90076
Epoch 12/16


HBox(children=(IntProgress(value=0, description='Epoch 11', max=25000, style=ProgressStyle(description_width='…

 - 15s - loss: 0.3435 - acc: 0.8499 - val_loss: 0.3355 - val_acc: 0.8671

Epoch 00012: val_acc did not improve from 0.90076
Epoch 13/16


HBox(children=(IntProgress(value=0, description='Epoch 12', max=25000, style=ProgressStyle(description_width='…

 - 15s - loss: 0.3446 - acc: 0.8501 - val_loss: 0.3108 - val_acc: 0.8783

Epoch 00013: val_acc did not improve from 0.90076
Epoch 14/16


HBox(children=(IntProgress(value=0, description='Epoch 13', max=25000, style=ProgressStyle(description_width='…

 - 15s - loss: 0.3378 - acc: 0.8545 - val_loss: 0.3215 - val_acc: 0.8791

Epoch 00014: val_acc did not improve from 0.90076
Epoch 15/16


HBox(children=(IntProgress(value=0, description='Epoch 14', max=25000, style=ProgressStyle(description_width='…

 - 15s - loss: 0.3358 - acc: 0.8567 - val_loss: 0.3042 - val_acc: 0.8808

Epoch 00015: val_acc did not improve from 0.90076
Epoch 16/16


HBox(children=(IntProgress(value=0, description='Epoch 15', max=25000, style=ProgressStyle(description_width='…

 - 15s - loss: 0.3257 - acc: 0.8602 - val_loss: 0.3155 - val_acc: 0.8728

Epoch 00016: val_acc did not improve from 0.90076

Test Accuracy: 87.28%
<keras.callbacks.History object at 0x7f97e35359e8>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
7,False,2,True,False,96,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 96)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 96)           46176     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 97, 96)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 93, 96)            46176 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4876 - acc: 0.7501 - val_loss: 0.3266 - val_acc: 0.8595

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3002 - acc: 0.8719 - val_loss: 0.2978 - val_acc: 0.8729

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.2313 - acc: 0.9075 - val_loss: 0.3219 - val_acc: 0.8674

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.1666 - acc: 0.9371 - val_loss: 0.3089 - val_acc: 0.8798

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.1163 - acc: 0.9573 - val_loss: 0.3288 - val_acc: 0.8808

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.0609 - acc: 0.9808 - val_loss: 0.4670 - val_acc: 0.8699

Epoch 00006: val_acc did not improve from 0.90076
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.0585 - acc: 0.9789 - val_loss: 0.4020 - val_acc: 0.8676

Epoch 00007: val_acc did not improve from 0.90076

Test Accuracy: 86.76%
<keras.callbacks.History object at 0x7f979c553908>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
8,False,2,False,True,48,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 48)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 493, 48)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 48)           11568 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.6925 - acc: 0.5142 - val_loss: 0.6741 - val_acc: 0.7074

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.3643 - acc: 0.8440 - val_loss: 0.3061 - val_acc: 0.8768

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.1804 - acc: 0.9342 - val_loss: 0.3063 - val_acc: 0.8671

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.1051 - acc: 0.9626 - val_loss: 0.3277 - val_acc: 0.8718

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 87.18%
<keras.callbacks.History object at 0x7f97947ea8d0>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
9,False,2,False,True,48,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 48)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 48)           11568     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 97, 48)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 93, 48)            11568 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4129 - acc: 0.7760 - val_loss: 0.2745 - val_acc: 0.8834

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.1295 - acc: 0.9545 - val_loss: 0.3371 - val_acc: 0.8644

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.0277 - acc: 0.9921 - val_loss: 0.4477 - val_acc: 0.8697

Epoch 00003: val_acc did not improve from 0.90076

Test Accuracy: 86.97%
<keras.callbacks.History object at 0x7f98b159eba8>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
10,False,2,False,True,96,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 96)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 493, 96)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 96)           46176 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 22s - loss: 0.6828 - acc: 0.5302 - val_loss: 0.5427 - val_acc: 0.7465

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.3214 - acc: 0.8653 - val_loss: 0.2910 - val_acc: 0.8892

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.1548 - acc: 0.9426 - val_loss: 0.3386 - val_acc: 0.8607

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 21s - loss: 0.0905 - acc: 0.9680 - val_loss: 0.3708 - val_acc: 0.8606

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 86.06%
<keras.callbacks.History object at 0x7f97583285f8>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
11,False,2,False,True,96,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 493, 96)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 489, 96)           46176     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 97, 96)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 93, 96)            46176 

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 18s - loss: 0.4487 - acc: 0.7451 - val_loss: 0.2682 - val_acc: 0.8860

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.1438 - acc: 0.9487 - val_loss: 0.2838 - val_acc: 0.8888

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.0348 - acc: 0.9889 - val_loss: 0.4624 - val_acc: 0.8657

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 17s - loss: 0.0082 - acc: 0.9978 - val_loss: 0.6121 - val_acc: 0.8747

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 87.47%
<keras.callbacks.History object at 0x7f98a2e4e7b8>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
12,False,0,True,True,48,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 48)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 48)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 11808)             0     

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.5539 - acc: 0.6979 - val_loss: 0.3557 - val_acc: 0.8600

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.3224 - acc: 0.8650 - val_loss: 0.3038 - val_acc: 0.8790

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.2517 - acc: 0.8992 - val_loss: 0.2696 - val_acc: 0.8931

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.2041 - acc: 0.9200 - val_loss: 0.2601 - val_acc: 0.8965

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.1630 - acc: 0.9370 - val_loss: 0.2626 - val_acc: 0.8943

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.1286 - acc: 0.9514 - val_loss: 0.3165 - val_acc: 0.8709

Epoch 00006: val_acc did not improve from 0.90076

Test Accuracy: 87.09%
<keras.callbacks.History object at 0x7f98b10e1c18>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
13,False,0,True,True,48,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 48)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 11808)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1511552   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 129   

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4730 - acc: 0.7539 - val_loss: 0.2836 - val_acc: 0.8824

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 12s - loss: 0.2044 - acc: 0.9214 - val_loss: 0.2695 - val_acc: 0.8896

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 12s - loss: 0.0832 - acc: 0.9728 - val_loss: 0.3435 - val_acc: 0.8746

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 12s - loss: 0.0218 - acc: 0.9956 - val_loss: 0.4261 - val_acc: 0.8790

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 87.90%
<keras.callbacks.History object at 0x7f98b16cd7b8>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
14,False,0,True,True,96,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 96)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 96)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 23616)             0     

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 20s - loss: 0.7080 - acc: 0.5195 - val_loss: 0.6329 - val_acc: 0.6774

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 19s - loss: 0.3941 - acc: 0.8252 - val_loss: 0.3069 - val_acc: 0.8778

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 19s - loss: 0.2601 - acc: 0.8966 - val_loss: 0.2839 - val_acc: 0.8840

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 20s - loss: 0.1938 - acc: 0.9263 - val_loss: 0.2699 - val_acc: 0.8907

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 19s - loss: 0.1432 - acc: 0.9485 - val_loss: 0.3056 - val_acc: 0.8816

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 19s - loss: 0.1092 - acc: 0.9607 - val_loss: 0.2941 - val_acc: 0.8877

Epoch 00006: val_acc did not improve from 0.90076

Test Accuracy: 88.77%
<keras.callbacks.History object at 0x7f98b16a9e10>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
15,False,0,True,True,96,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 96)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 23616)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               3022976   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 129   

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.4726 - acc: 0.7578 - val_loss: 0.2991 - val_acc: 0.8752

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.2025 - acc: 0.9216 - val_loss: 0.2922 - val_acc: 0.8822

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.0695 - acc: 0.9785 - val_loss: 0.3372 - val_acc: 0.8811

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.0166 - acc: 0.9962 - val_loss: 0.4271 - val_acc: 0.8806

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 88.06%
<keras.callbacks.History object at 0x7f97845d9438>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
16,False,0,True,False,48,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 48)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 48)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 11808)             0     

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.6296 - acc: 0.6316 - val_loss: 0.4535 - val_acc: 0.8091

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.4459 - acc: 0.7928 - val_loss: 0.3897 - val_acc: 0.8362

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.4077 - acc: 0.8150 - val_loss: 0.3572 - val_acc: 0.8544

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3897 - acc: 0.8259 - val_loss: 0.3387 - val_acc: 0.8597

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3772 - acc: 0.8328 - val_loss: 0.3344 - val_acc: 0.8564

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3645 - acc: 0.8398 - val_loss: 0.3250 - val_acc: 0.8636

Epoch 00006: val_acc did not improve from 0.90076
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3593 - acc: 0.8417 - val_loss: 0.3175 - val_acc: 0.8671

Epoch 00007: val_acc did not improve from 0.90076
Epoch 8/16


HBox(children=(IntProgress(value=0, description='Epoch 7', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3547 - acc: 0.8437 - val_loss: 0.3147 - val_acc: 0.8693

Epoch 00008: val_acc did not improve from 0.90076
Epoch 9/16


HBox(children=(IntProgress(value=0, description='Epoch 8', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3433 - acc: 0.8487 - val_loss: 0.3141 - val_acc: 0.8711

Epoch 00009: val_acc did not improve from 0.90076
Epoch 10/16


HBox(children=(IntProgress(value=0, description='Epoch 9', max=25000, style=ProgressStyle(description_width='i…

 - 10s - loss: 0.3420 - acc: 0.8500 - val_loss: 0.3143 - val_acc: 0.8693

Epoch 00010: val_acc did not improve from 0.90076
Epoch 11/16


HBox(children=(IntProgress(value=0, description='Epoch 10', max=25000, style=ProgressStyle(description_width='…

 - 10s - loss: 0.3361 - acc: 0.8525 - val_loss: 0.3088 - val_acc: 0.8704

Epoch 00011: val_acc did not improve from 0.90076

Test Accuracy: 87.04%
<keras.callbacks.History object at 0x7f98b12f2048>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
17,False,0,True,False,48,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 48)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 11808)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1511552   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 129   

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 8s - loss: 0.5704 - acc: 0.6750 - val_loss: 0.3943 - val_acc: 0.8249

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 8s - loss: 0.3290 - acc: 0.8594 - val_loss: 0.3355 - val_acc: 0.8522

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 8s - loss: 0.2536 - acc: 0.8982 - val_loss: 0.3270 - val_acc: 0.8592

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 8s - loss: 0.1746 - acc: 0.9340 - val_loss: 0.3494 - val_acc: 0.8587

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 8s - loss: 0.1073 - acc: 0.9627 - val_loss: 0.4468 - val_acc: 0.8400

Epoch 00005: val_acc did not improve from 0.90076

Test Accuracy: 84.00%
<keras.callbacks.History object at 0x7f97e3558358>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
18,False,0,True,False,96,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 96)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 96)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 23616)             0     

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.7023 - acc: 0.5392 - val_loss: 0.5971 - val_acc: 0.7317

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4800 - acc: 0.7751 - val_loss: 0.3929 - val_acc: 0.8345

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4015 - acc: 0.8198 - val_loss: 0.3634 - val_acc: 0.8445

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3800 - acc: 0.8309 - val_loss: 0.3506 - val_acc: 0.8526

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3622 - acc: 0.8421 - val_loss: 0.3377 - val_acc: 0.8565

Epoch 00005: val_acc did not improve from 0.90076
Epoch 6/16


HBox(children=(IntProgress(value=0, description='Epoch 5', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3488 - acc: 0.8488 - val_loss: 0.3232 - val_acc: 0.8652

Epoch 00006: val_acc did not improve from 0.90076
Epoch 7/16


HBox(children=(IntProgress(value=0, description='Epoch 6', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3404 - acc: 0.8522 - val_loss: 0.3137 - val_acc: 0.8653

Epoch 00007: val_acc did not improve from 0.90076
Epoch 8/16


HBox(children=(IntProgress(value=0, description='Epoch 7', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3348 - acc: 0.8551 - val_loss: 0.3236 - val_acc: 0.8606

Epoch 00008: val_acc did not improve from 0.90076
Epoch 9/16


HBox(children=(IntProgress(value=0, description='Epoch 8', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.3220 - acc: 0.8619 - val_loss: 0.3678 - val_acc: 0.8301

Epoch 00009: val_acc did not improve from 0.90076

Test Accuracy: 83.01%
<keras.callbacks.History object at 0x7f95b44a5278>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
19,False,0,True,False,96,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 96)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 23616)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               3022976   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 129   

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.4921 - acc: 0.7552 - val_loss: 0.3835 - val_acc: 0.8292

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.3130 - acc: 0.8684 - val_loss: 0.3492 - val_acc: 0.8476

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.2189 - acc: 0.9160 - val_loss: 0.3388 - val_acc: 0.8576

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.1263 - acc: 0.9548 - val_loss: 0.4241 - val_acc: 0.8453

Epoch 00004: val_acc did not improve from 0.90076
Epoch 5/16


HBox(children=(IntProgress(value=0, description='Epoch 4', max=25000, style=ProgressStyle(description_width='i…

 - 11s - loss: 0.0623 - acc: 0.9798 - val_loss: 0.4952 - val_acc: 0.8444

Epoch 00005: val_acc did not improve from 0.90076

Test Accuracy: 84.44%
<keras.callbacks.History object at 0x7f9750174668>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
20,False,0,False,True,48,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 48)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 48)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 11808)             0     

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 16s - loss: 0.5136 - acc: 0.7028 - val_loss: 0.2892 - val_acc: 0.8795

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.2048 - acc: 0.9207 - val_loss: 0.2774 - val_acc: 0.8842

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.0962 - acc: 0.9663 - val_loss: 0.3478 - val_acc: 0.8713

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 15s - loss: 0.0542 - acc: 0.9816 - val_loss: 0.4167 - val_acc: 0.8654

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 86.54%
<keras.callbacks.History object at 0x7f95de0e3e48>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
21,False,0,False,True,48,False


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 48)          48048     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 48)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 11808)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1511552   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 129   

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 13s - loss: 0.4388 - acc: 0.7699 - val_loss: 0.3073 - val_acc: 0.8715

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 12s - loss: 0.1406 - acc: 0.9497 - val_loss: 0.3061 - val_acc: 0.8797

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

 - 12s - loss: 0.0356 - acc: 0.9898 - val_loss: 0.4306 - val_acc: 0.8711

Epoch 00003: val_acc did not improve from 0.90076
Epoch 4/16


HBox(children=(IntProgress(value=0, description='Epoch 3', max=25000, style=ProgressStyle(description_width='i…

 - 12s - loss: 0.0068 - acc: 0.9983 - val_loss: 0.5587 - val_acc: 0.8702

Epoch 00004: val_acc did not improve from 0.90076

Test Accuracy: 87.02%
<keras.callbacks.History object at 0x7f98b1456198>


Unnamed: 0,use_cleaned_docs,number_of_additional_conv_layers,use_pre_trained_embedding,trainable_for_embedding,number_of_filters,use_dropout
22,False,0,False,True,96,True


Fitted tokenizer on 25000 documents
Top 5 most common words are: [('the', 336148), ('and', 164097), ('a', 163040), ('of', 145847), ('to', 135708)]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 2470, 200)         17716600  
_________________________________________________________________
dropout_1 (Dropout)          (None, 2470, 200)         0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2466, 96)          96096     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 246, 96)           0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 96)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 23616)             0     

HBox(children=(IntProgress(value=0, description='Training', max=16, style=ProgressStyle(description_width='ini…

Epoch 1/16


HBox(children=(IntProgress(value=0, description='Epoch 0', max=25000, style=ProgressStyle(description_width='i…

 - 20s - loss: 0.5752 - acc: 0.6496 - val_loss: 0.3135 - val_acc: 0.8721

Epoch 00001: val_acc did not improve from 0.90076
Epoch 2/16


HBox(children=(IntProgress(value=0, description='Epoch 1', max=25000, style=ProgressStyle(description_width='i…

 - 19s - loss: 0.2211 - acc: 0.9164 - val_loss: 0.2868 - val_acc: 0.8835

Epoch 00002: val_acc did not improve from 0.90076
Epoch 3/16


HBox(children=(IntProgress(value=0, description='Epoch 2', max=25000, style=ProgressStyle(description_width='i…

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt

for history in exp_result:
    plt.figure()
    # summarize history for accuracy
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])

    plt.title('Model accuracy and loss')
    plt.ylabel('accuracy or loss')
    plt.xlabel('epoch')
    plt.legend(['train_acc', 'test_acc', 'train_loss', 'test_loss'], loc='upper left')
    plt.show()
    
    
    # Plot training & validation loss values
#     plt.figure()
#     plt.plot(history.history['loss'])
#     plt.plot(history.history['val_loss'])
#     plt.title('Model loss')
#     plt.ylabel('loss')
#     plt.xlabel('epoch')
#     plt.legend(['train', 'test'], loc='upper left')
#     plt.show()

In [None]:
# evaluate
# loss, acc = model.evaluate(x_test, y_test, verbose=0)
# print('Test Accuracy: %f' % (acc*100))