#   Homework 3
## Sentiment analysis using Neural Networks

Total: 50 Points


In this homework we will perform sentiment analysis using a few simple Neural Network based architectures.
For this problem we use the IMDB Large Movie Review Dataset. The dataset contains 25,000 highly polar movie reviews for both train and test dataset, each with 12,500 positive (greater than equal to 7/10 rating) and 12,500 negative reviews(less than equal to 4/10 rating). 

Use "https://keras.io/" for keras documentation. Please use Python 3. GPU is not required but it will help improve the training speed for each problem.

Please save the notebook with your cell outputs. You will not be graded if your outputs are not present below the homework cell. Also note your outputs will be unique since you will be using your the last numbers of your uni as your random seed (In the third cell). Make sure you submit this iPython file, with the saved outputs. The submission format must be 'hw3/hw3.ipynb'. You will not submit any other files. If you do save your model weights, you will not submit them. You will however, make sure your model weights do get saved in the 'weights' folder and can be retrieved from there as well.

Please fill your details below.



Name: Sihui Huang

Uni: sh3573

Email: sh3573@columbia.edu


In [1]:
from os import listdir
import random
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Dense, Dropout, Reshape, Merge, BatchNormalization, TimeDistributed, Lambda, Activation, LSTM, Flatten, Convolution1D, GRU, MaxPooling1D
from keras.regularizers import l2
from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping
#from keras import initializers
from keras import backend as K
from keras.optimizers import SGD
from keras.optimizers import Adadelta
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras import optimizers
import numpy as np

Using TensorFlow backend.


In [2]:


#we retrieve train and test file names

train_dir = "./aclImdb/train/"
test_dir = "./aclImdb/test/"
tr_review = [re_filename for re_filename in listdir(train_dir)]
te_review = [re_filename for re_filename in listdir(test_dir)]

#we initialize the train and test arrays

tr_X = []
tr_Y = []
te_X = []
te_Y = []

#we arrange the reviews into the train and test arrays 

for review_file in tr_review:
    f_review = open(train_dir+review_file, "r")
    str_review = f_review.readline()
    str_review = " ".join(str_review.split(' '))
    tr_X.append(str_review)
    y_truth = int (review_file.split('.')[0].split('_')[1])
    if y_truth>=7:
        tr_Y.append(1)
    else:
        tr_Y.append(0)
        
for review_file in te_review:
    f_review = open(test_dir+review_file, "r")
    str_review = f_review.readline()
    str_review = " ".join(str_review.split(' '))
    te_X.append(str_review)
    y_truth = int (review_file.split('.')[0].split('_')[1])
    if y_truth>=7:
        te_Y.append(1)
    else:
        te_Y.append(0)
        

We will now create the validation set from the train set

use the last 4 numbers of your uni for the seed value seed to ensure all answers remain unique.

In [14]:
#replace 2 (SEED) with the last 4 numbers of your Uni
#Uni: sh3573
SEED = 3573
seed_counter = 0
while(1):

    shuffle_combine = list(zip(tr_X, tr_Y))
    random.seed(SEED+seed_counter)
    seed_counter+=1
    random.shuffle(shuffle_combine)

    tr_X, tr_Y = zip(*shuffle_combine)

    val_X = tr_X[:5000]
    val_Y = tr_Y[:5000]

    counter = 0
    for label in val_Y:
        counter+=label

    print (counter)
    print (seed_counter)
    if(counter>2400 and counter <2600):
        tr_X = tr_X[5000:]
        tr_Y = tr_Y[5000:]
        break

2500
1


In [15]:


print("Length of Train review set : " + str(len(tr_X)))
print("Length of Train label set : " + str(len(tr_Y)))
print("Length of Validation review set : " + str(len(val_X)))
print("Length of Validation label set : " + str(len(val_Y)))
print("Length of Test review set : " + str(len(te_X)))
print("Length of Test label set : " + str(len(te_Y)))
print("*****************************************")
print("Some sample Reviews Train sets and their labels")
print(tr_X[0][:150])
print(tr_Y[0])
print(tr_X[1][:150])
print(tr_Y[1])
print(tr_X[2][:150])
print(tr_Y[2])
print(tr_X[3][:150])
print(tr_Y[3])
print(tr_X[4][:150])
print(tr_Y[4])

Length of Train review set : 15000
Length of Train label set : 15000
Length of Validation review set : 5000
Length of Validation label set : 5000
Length of Test review set : 25000
Length of Test label set : 25000
*****************************************
Some sample Reviews Train sets and their labels
The action in this movie beats Sunny bhai in Gadar. Akshay Kumar possess the superpowers of Leonidus in 300, Neo in Matrix along with Spiderman and Su
0
The first hour or so of the movie was mostly boring to say the least. However it improved afterwards as the Valentine Party commenced. Apart from the 
1
But this movie was a bore. The history part was fine but the musical part was not. Not one song I cared about and no soundtrack to be heard.<br /><br 
0
Pier Paolo Pasolini, or Pee-pee-pee as I prefer to call him (due to his love of showing male genitals), is perhaps THE most overrated European Marxist
0
I must tell you right up front, I am certainly NOT an authority on Bollywood films an

In [16]:
#we collect all the reviews from train validation and test set to generate 
texts = []
texts += tr_X 
texts += te_X 
texts += val_X
len(texts)



#we clip the sentence length to first 250 words. 
MAX_SEQUENCE_LENGTH = 250

#length of vocab, Tokenizer will only use vocab_len most common words
vocab_len = 25000

#we tokenize the texts and convert all the words to tokens
tokenizer = Tokenizer(num_words=vocab_len)
tokenizer.fit_on_texts(texts)

token_tr_X = tokenizer.texts_to_sequences(tr_X)
token_te_X = tokenizer.texts_to_sequences(te_X)
token_val_X = tokenizer.texts_to_sequences(val_X)

#to ensure all reviews have the same length, we pad the smaller reviews with 0, 
#and cut the larger reviews to a max length 
#(we clip from the top, as the end of the reviews generally have a conclusion which provides better features)
x_train = sequence.pad_sequences(token_tr_X, maxlen=MAX_SEQUENCE_LENGTH)
x_test = sequence.pad_sequences(token_te_X, maxlen=MAX_SEQUENCE_LENGTH)
x_val = sequence.pad_sequences(token_val_X, maxlen=MAX_SEQUENCE_LENGTH)


#changes the labels to one-hot encoding
y_train = np_utils.to_categorical(tr_Y)
y_test = np_utils.to_categorical(te_Y)
y_val = np_utils.to_categorical(val_Y)


In [17]:
print('X_train shape:', x_train.shape)
print('X_test shape:', x_test.shape)
print('X_val shape:', x_val.shape)

print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)
print('y_val shape:', y_val.shape)


print("*****************************************")
print("Tokenized Reviews Train sets and their labels")
print(x_train[0][:20])
print(y_train[0])
print()
print(x_train[1][:20])
print(y_train[1])
print()
print(x_train[2][:20])
print(y_train[2])
print()
print(x_train[3][:20])
print(y_train[3])
print()
print(x_train[4][:20])
print(y_train[4])
print()

X_train shape: (15000, 250)
X_test shape: (25000, 250)
X_val shape: (5000, 250)
y_train shape: (15000, 2)
y_test shape: (25000, 2)
y_val shape: (5000, 2)
*****************************************
Tokenized Reviews Train sets and their labels
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1.  0.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1.  0.]

[1161   46  894   22  165  850 9019    8    3 3881   46    6    4  264  161
 3562   72  256   16    2]
[ 1.  0.]

[23082    60    10   111    58    25  2471    58    25    74   636    28
     6    34   623    28  4705     8     3   547]
[ 1.  0.]



********************************************

As you can see the reviews have now been transformed into indices to tokenized vocabulary and the labels have been converted to one-hot encoding. We can now go ahead and feed these sequences to Neural Network Models.

********************************************

# Part A

Building your first model (5 Points)

Construct this sequential model using Keras :

![title](img/model1.jpg)

In [7]:
print('Build model...')

## implement model here

model = Sequential()
# Embedding: Dimension=128
model.add(Embedding(input_dim=vocab_len, output_dim=128, input_length=MAX_SEQUENCE_LENGTH))
# Flatten Embedding
model.add(Flatten())
# Dense: Dimention = 200
model.add(Dense(200))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 2
model.add(Dense(2))
# Activation:Softmax
model.add(Activation('softmax'))
## compille it here according to instructions
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 250, 128)          3200000   
_________________________________________________________________
flatten_1 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               6400200   
_________________________________________________________________
activation_1 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 402       
_________________________________________________________________
activation_2 (Activation)    (None, 2)                 0         
Total params: 9,600,602
Trainable params: 9,600,602
Non-trainable params: 0
___________________________________________________

In [8]:
from datetime import datetime
start_time = datetime.now()

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

end_time = datetime.now()
print('Time Used: {}'.format(end_time - start_time))

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Time Used: 0:03:55.007180


# Part B

Stacking Fully Connected Layers (5 points)

Construct this sequential model using Keras :

![title](img/model2.jpg)

In [9]:
print('Build model...')

## implement model here

model = Sequential()
# Embedding: Dimension=128
model.add(Embedding(input_dim=vocab_len, output_dim=128, input_length=MAX_SEQUENCE_LENGTH))
# Flatten Embedding
model.add(Flatten())
# Dense: Dimention = 200
model.add(Dense(200))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 200
model.add(Dense(200))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 2
model.add(Dense(2))
# Activation:Softmax
model.add(Activation('softmax'))
## compille it here according to instructions
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 250, 128)          3200000   
_________________________________________________________________
flatten_2 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 200)               6400200   
_________________________________________________________________
activation_3 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 200)               40200     
_________________________________________________________________
activation_4 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 402   

In [10]:
start_time = datetime.now()

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

end_time = datetime.now()
print('Time Used: {}'.format(end_time - start_time))

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Time Used: 0:03:56.617889


# Part C

Using LSTMS based networks(5 Points) 

Construct this sequential model using Keras :

![title](img/model3.jpg)

In [11]:
print('Build model...')

## implement model here

model = Sequential()
# Embedding: Dimension=128
model.add(Embedding(input_dim=vocab_len, output_dim=128, input_length=MAX_SEQUENCE_LENGTH))
# LSTM: Dimension=128
model.add(LSTM(units=128))
# Dense: Dimention = 128
model.add(Dense(128))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 2
model.add(Dense(2))
# Activation:Softmax
model.add(Activation('softmax'))
## compille it here according to instructions
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 250, 128)          3200000   
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_6 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_6 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_7 (Activation)    (None, 2)                 0         
Total params: 3,348,354
Trainable params: 3,348,354
Non-trainable params: 0
___________________________________________________

In [12]:
start_time = datetime.now()

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

end_time = datetime.now()
print('Time Used: {}'.format(end_time - start_time))

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Time Used: 0:21:04.937462


# Part D

Adding Pretrained Word Embeddings(10 Points)

Construct this sequential model using Keras :

Correction: The Embedding Layer Dimension (1st box) is 300, not 128.

![title](img/model4.jpg)

In [7]:
import codecs

#dimension of Glove Embeddings.
EMBEDDING_DIM = 300

word_index = tokenizer.word_index
print('Found %s unique tokens' % len(word_index))

#load glove embeddings
gembeddings_index = {}
with codecs.open('glove.42B.300d.txt', encoding='utf-8') as f:
    for line in f:
        values = line.split(' ')
        word = values[0]
        gembedding = np.asarray(values[1:], dtype='float32')
        gembeddings_index[word] = gembedding
#
f.close()
print('G Word embeddings:', len(gembeddings_index))

# nb_words contains the total length of vocab
nb_words = len(word_index) +1

#get glove embeddings for each word in tokenizer.
#g_word_embedding_matrix holds the embeddings dictionary
g_word_embedding_matrix = np.zeros((nb_words, EMBEDDING_DIM))

for word, i in word_index.items():
    gembedding_vector = gembeddings_index.get(word)
    if gembedding_vector is not None:
        g_word_embedding_matrix[i] = gembedding_vector
        
#total words in the tokenizer not in Embedding matrix
print('G Null word embeddings: %d' % np.sum(np.sum(g_word_embedding_matrix, axis=1) == 0))



Found 124252 unique tokens
G Word embeddings: 1101020
G Null word embeddings: 118498


In [14]:

print('Build model...')

## implement model here

model = Sequential()

## to use the glove embeddings, your embedding layer would take the vocab size as input dimension, 
## Glove embedding dimension as the output dimsion
## and you will provide the  embedding dictionary as the 'weights' parameter (!important) to the embedding layer.

# Embedding: Dimension=300
model.add(Embedding(input_dim=nb_words, output_dim=EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH, weights=[g_word_embedding_matrix]))
# LSTM: Dimension=128
model.add(LSTM(units=128, recurrent_dropout=.2))
# Dropout:.2
model.add(Dropout(.2))
# Dense: Dimention = 128
model.add(Dense(128))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 2
model.add(Dense(2))
# Activation:Softmax
model.add(Activation('softmax'))
## compille it here according to instructions
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
lstm_2 (LSTM)                (None, 128)               219648    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_8 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_9 (Activation)    (None, 2)                 0     

In [15]:
from datetime import datetime
start_time = datetime.now()

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

end_time = datetime.now()
print('Time Used: {}'.format(end_time - start_time))

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Time Used: 0:38:23.702303


# Dont attempt this

Stacking LSTM layers

Unfortunately it takes very long to train, be aware we can stack LTMSs over each other like this.
This requires bottom LSTM to return a sequences instead instead of single vector, which becomes input for the top LSTM.


![title](img/model5.jpg)

# Part E

Using Convolutional Networks (10 points)

Construct the model, shown below. Use the same loss functions and optimizers as before

Correction: The Embedding Layer Dimension (1st box) is 300, not 128.

![title](img/model6.jpg)

In [16]:
print('Build model...')

## implement model here

model = Sequential()
# Embedding: Dimension=300
model.add(Embedding(input_dim=nb_words, output_dim=EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH, weights=[g_word_embedding_matrix]))
# Convolution 1D: Filter=128, Kernel=3
model.add(Convolution1D(filters=128, kernel_size=3))
# Dropout:.2
model.add(Dropout(.2))
# Activation:Rectilinear
model.add(Activation('relu'))
# Convolution 1D: Filter=64, Kernel=3
model.add(Convolution1D(filters=64, kernel_size=3))
# Dropout:.2
model.add(Dropout(.2))
# Activation:Rectilinear
model.add(Activation('relu'))
# Convolution 1D: Filter=32, Kernel=3
model.add(Convolution1D(filters=32, kernel_size=3))
# Dropout:.2
model.add(Dropout(.2))
# Activation:Rectilinear
model.add(Activation('relu'))
# Flatten: 
model.add(Flatten())
# Dense: Dimention = 256
model.add(Dense(256))
# Dropout:.2
model.add(Dropout(.2))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 2
model.add(Dense(2))
# Activation:Softmax
model.add(Activation('softmax'))
## compille it here according to instructions
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 248, 128)          115328    
_________________________________________________________________
dropout_2 (Dropout)          (None, 248, 128)          0         
_________________________________________________________________
activation_10 (Activation)   (None, 248, 128)          0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 246, 64)           24640     
_________________________________________________________________
dropout_3 (Dropout)          (None, 246, 64)           0         
_________________________________________________________________
activation_11 (Activation)   (None, 246, 64)           0     

In [17]:
start_time = datetime.now()

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

end_time =datetime.now()
print("Time used:", end_time-start_time)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Time used: 0:23:27.623486


# Part F

Model constructed : (5 points)

Test Accuracy Over 87.5%: (5 Points)

Bonus: Min(10, Square of (test_score - 88%))

Create your best model, use Validation score to judge your best model and check accuracy on test set


In [9]:
print('Build model...')

## implement model here

model = Sequential()
# Embedding: Dimension=300
model.add(Embedding(input_dim=nb_words, output_dim=EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH, weights=[g_word_embedding_matrix]))

# Convolution 1D: Filter=128, Kernel=3
model.add(Convolution1D(filters=128, kernel_size=5))
# Dropout:.2
model.add(Dropout(.4))
# Activation:Rectilinear
model.add(Activation('relu'))
# Convolution 1D: Filter=64, Kernel=3
model.add(Convolution1D(filters=64, kernel_size=5))
# Dropout:.2
model.add(Dropout(.4))
# Activation:Rectilinear
model.add(Activation('relu'))

#model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(64, recurrent_dropout=.4))
model.add(Dropout(.4))
#model.add(Dense(1, activation='sigmoid'))
# Flatten: 
#model.add(Flatten())
# Dense: Dimention = 256
model.add(Dense(256))
# Dropout:.2
model.add(Dropout(.2))
# Activation:Rectilinear
model.add(Activation('relu'))
# Dense: Dimention = 2
model.add(Dense(2))
# Activation:Softmax
model.add(Activation('softmax'))
## compille it here according to instructions
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

print("Model Built")


Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 246, 128)          192128    
_________________________________________________________________
dropout_4 (Dropout)          (None, 246, 128)          0         
_________________________________________________________________
activation_4 (Activation)    (None, 246, 128)          0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 242, 64)           41024     
_________________________________________________________________
dropout_5 (Dropout)          (None, 242, 64)           0         
_________________________________________________________________
activation_5 (Activation)    (None, 242, 64)           0     

In [10]:
from datetime import datetime
start_time = datetime.now()

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

end_time = datetime.now()
print('Time Used: {}'.format(end_time - start_time))

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Time Used: 0:32:31.598721


You can keep saving models with different names in model_name, 

so you can retrieve their weights again for testing, you dont have to retrain 
(You would have to initialize the model definition again).

If you plan on using Ensemble averaging, feel free to edit the code below or add multiple models.

Make sure they get saved and can be retrieved when executing serially.

In [11]:
#model.load_weights(bst_model_path)
scores = model.evaluate(x_test, y_test, verbose=2)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 86.07%


# Part G

Explain how Dense, LSTM and Convolution Layers work.

Explain Relu, Dropout, and Softmax work.

Analyze the architectures you constructed, with the accuracies you achieved and the training time it took. 

What are some insights you gained with these experiments? 

(5 Points)


Dense Layer: a linear operation in which every input is connected to every output by a weight. Generally followed by a non-linear activation function.

LSTM: a special kind of RNN layer that enables support for time series and sequence data in a network and is capable of learning long-term dependencies, which means LSTM is able to connect previous information to the present task. The layer performs additive interactions, which can help improve gradient flow over long sequences during training.

Convolution Layer: a linear operation using a subset of the weights of a dense layer. Nearby inputs are connected to nearby outputs. The weights for the convolutions at each location are shared. Due to the weight sharing, and the use of a subset of the weights of a dense layer, there's far less weights than in a dense layer. Generally followed by a non-linear activation function.

Relu: Relu stands for Rectified linear unit and is a non-linear operation. Usually used after every Convolution operation. Output of Relu is max(0, Input), which implies that it replaces all negative values by 0. The purpose is to introduce non-linearity since most of the real-world data we want to learn would be non-linear.

Dropout: Dropout is a regularization technique, which aims to reduce the complexity of the model with the goal to prevent overfitting.

Softmax: use in output layer, and can handle classifications with more than two classes. The softmax function squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1. The output of the softmax function is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true.


Analyze the architectures you constructed, with the accuracies you achieved and the training time it took. 
The best model combined Convolution1D and LSTM together. Layers include glove embedding, Convolution1D(filters=128, kernel_size=5, Dropout(.4), Activation('relu'), Convolution1D(filters=64, kernel_size=5), Dropout(.4), Activation('relu'), LSTM(64, recurrent_dropout=.4), Dropout(.4), Dense(256), Dropout(.2), Activation('relu'), Dense(2), Activation('relu'), compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']). It tooks 32.5 minutes to train and the accuracy rate on test data is 86.07%.


Insights:
Adding more layers does not always improve the accuracy of the test dataset, but does tend to overfit. With Dropout( ), the model is less likely to overfit. The trainning time is longer if we increse the kerner size. 