# HW8.1 Text classification (sentiment analysis) with deep learning models

In this homework, we will practice building various deep learning model architectures for text classification. We will be using the IMBD movie review data for the sentiment classification task. First we will load the data. Second, you will be able to use various model types we've learned so far to perform text classification.

Remember to use GPUs for this one, otherwise it will be slow to train.

Tips:
- print out the data types and shapes after each step to verify you are doing what you expected to do.
- when you build the model, pass in the validation data in `model.fit()` as an argument `validation_data=(X_val, Y_val)`. The test data should be reserved until after you trained the model and then you can test it with X_test and report the test accuracy. **Always report accuracy on Test data for each experiment you do and make the comparison based on Test data.**

## 1. Loading data

In [2]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.layers import Conv1D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import MaxPooling1D
# fix random seed for reproducibility
tf.random.set_seed(7)


### Preprocessing data

There are a few hyperparameters you have to set and then preprocess the text data accordingly. These hyperparameters are:

1. `vocab_size` : if set to 5000, the model will only keep the number of top 5000 most frequent words in the vocablulary while processing the text. Any less frequent words are thrwon out.
2. `max_review_length` : if set to 500, the model will keep the maxmium length of a review text at 500. Any reviews longer than 500 words will be truncated at 500 words, and any reviews shorter than 500 words will be padded with 0s or special padding token to match the 500 token length. This is to makes sure all input sentences during training in a batch have the same size, a requirement of the batched input.
3. `embedding_vector_length` : this is the embedding vector you want to project each word too. For instance, we can set it to 256 or 512.



In [3]:

vocab_size = 5000
max_review_length = 500

# load the dataset but only keep the top n words with the argument num_words
# please read documentation here: https://keras.io/api/datasets/imdb/
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)



Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In [4]:
print('Training dataset:', type(X_train))
print('Training dataset label:', type(y_train))
print('Dimension Train:', X_train.shape)
print('Dimension Train label:', y_train.shape)
print('Dimension Test:', X_test.shape)
print('Dimension Test label:', y_test.shape)


Training dataset: <class 'numpy.ndarray'>
Training dataset label: <class 'numpy.ndarray'>
Dimension Train: (25000,)
Dimension Train label: (25000,)
Dimension Test: (25000,)
Dimension Test label: (25000,)


### task 0: review length before and after padding

print out the length of first 20 reviews below. Then execute the next cell to pad them and then print out again the lengths. what do you see?

In [5]:
# YOUR CODE HERE
for i, review in enumerate(X_train[:20], start=1):
    print(f"Review {i}: Length = {len(review)}")

Review 1: Length = 218
Review 2: Length = 189
Review 3: Length = 141
Review 4: Length = 550
Review 5: Length = 147
Review 6: Length = 43
Review 7: Length = 123
Review 8: Length = 562
Review 9: Length = 233
Review 10: Length = 130
Review 11: Length = 450
Review 12: Length = 99
Review 13: Length = 117
Review 14: Length = 238
Review 15: Length = 109
Review 16: Length = 129
Review 17: Length = 163
Review 18: Length = 752
Review 19: Length = 212
Review 20: Length = 177


In [6]:
for i, review in enumerate(X_test[:20], start=1):
    print(f"Review {i}: Length = {len(review)}")

Review 1: Length = 68
Review 2: Length = 260
Review 3: Length = 603
Review 4: Length = 181
Review 5: Length = 108
Review 6: Length = 132
Review 7: Length = 761
Review 8: Length = 180
Review 9: Length = 134
Review 10: Length = 370
Review 11: Length = 209
Review 12: Length = 248
Review 13: Length = 398
Review 14: Length = 326
Review 15: Length = 131
Review 16: Length = 255
Review 17: Length = 127
Review 18: Length = 184
Review 19: Length = 188
Review 20: Length = 105


In [7]:
# truncate and pad input sequences
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

for i, review in enumerate(X_train[:20], start=1):
    print(f"Train Review {i}: Length = {len(review)}")

for i, review in enumerate(X_test[:20], start=1):
    print(f"Test Review {i}: Length = {len(review)}")

Train Review 1: Length = 500
Train Review 2: Length = 500
Train Review 3: Length = 500
Train Review 4: Length = 500
Train Review 5: Length = 500
Train Review 6: Length = 500
Train Review 7: Length = 500
Train Review 8: Length = 500
Train Review 9: Length = 500
Train Review 10: Length = 500
Train Review 11: Length = 500
Train Review 12: Length = 500
Train Review 13: Length = 500
Train Review 14: Length = 500
Train Review 15: Length = 500
Train Review 16: Length = 500
Train Review 17: Length = 500
Train Review 18: Length = 500
Train Review 19: Length = 500
Train Review 20: Length = 500
Test Review 1: Length = 500
Test Review 2: Length = 500
Test Review 3: Length = 500
Test Review 4: Length = 500
Test Review 5: Length = 500
Test Review 6: Length = 500
Test Review 7: Length = 500
Test Review 8: Length = 500
Test Review 9: Length = 500
Test Review 10: Length = 500
Test Review 11: Length = 500
Test Review 12: Length = 500
Test Review 13: Length = 500
Test Review 14: Length = 500
Test Review 

I observed that after truncating and padding, the review length for all the for all the first 20 review length is now 500, as opposed to when they initially had their initial individual actual review length. Review lengths initially more than 500 was truncated to fit the specified maximum length of 500, and review lengths initially less than 500 was padded with zeros to meet the desired sequence length of 500.

### task 1: split training data and label into train and validation sets, X_train and X_val.

Use 20% of your training data for validation, and the rest 80% for training. Then, also pad your new training data and validation data the same way as you did for train and test. You can leave the Test data alone and reserve that portion for testing only after training the model.

In [5]:
# YOUR CODE HERE
from sklearn.model_selection import train_test_split

x_train, x_val, Y_train, Y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=25)

x_train = sequence.pad_sequences(x_train, maxlen=max_review_length)

x_val = sequence.pad_sequences(x_val, maxlen=max_review_length)

In [9]:
for i, review in enumerate(x_train[:20], start=1):
    print(f"New_Train Review {i}: Length = {len(review)}")

for i, review in enumerate(x_val[:20], start=1):
    print(f"Val Review {i}: Length = {len(review)}")

New_Train Review 1: Length = 500
New_Train Review 2: Length = 500
New_Train Review 3: Length = 500
New_Train Review 4: Length = 500
New_Train Review 5: Length = 500
New_Train Review 6: Length = 500
New_Train Review 7: Length = 500
New_Train Review 8: Length = 500
New_Train Review 9: Length = 500
New_Train Review 10: Length = 500
New_Train Review 11: Length = 500
New_Train Review 12: Length = 500
New_Train Review 13: Length = 500
New_Train Review 14: Length = 500
New_Train Review 15: Length = 500
New_Train Review 16: Length = 500
New_Train Review 17: Length = 500
New_Train Review 18: Length = 500
New_Train Review 19: Length = 500
New_Train Review 20: Length = 500
Val Review 1: Length = 500
Val Review 2: Length = 500
Val Review 3: Length = 500
Val Review 4: Length = 500
Val Review 5: Length = 500
Val Review 6: Length = 500
Val Review 7: Length = 500
Val Review 8: Length = 500
Val Review 9: Length = 500
Val Review 10: Length = 500
Val Review 11: Length = 500
Val Review 12: Length = 500
Va

### Understanding the text data format

The text data for deep learning is represented in a way that each word is mapped into a integer index first, 1, 2, 3, ..., N, assuming the text has N words in the vocabulary. Then we construct a giant embedding matrix where each word has an entry. At training time, you just use the index to retrieve the word embedding correspond to that entry (such as the k-th embedding) from this matrix. Let's inspect the input data representation before it is projected into the embedding space.

In [11]:
print(x_train.shape, Y_train.shape, 'train sequences')
print(x_val.shape, Y_val.shape,'val sequences')
print(X_test.shape, y_test.shape, 'test sequences')

(20000, 500) (20000,) train sequences
(5000, 500) (5000,) val sequences
(25000, 500) (25000,) test sequences


In [12]:
# inspect the 24th review:
print(x_train[23])

# inspect the 2nd review:
print(x_train[1])

[   2   19    4    2   14  390   62  434  967   17 1732   94 1573    2
 2775  684   24    2  101    2   33    2  130   42  127   12 4562  125
   19    4 1295   20  812   57 4148  807   21  260  621    2   11   14
  420   45   23    4  481 1406   24    2    4  395    7    2   42 3654
   34   35   23 1682    2   14  173    7    4  390  367   19    4  277
 2424   82   93 3697  361 1604  425 4320   43   40    4  154   58  102
  137  134 2818   26    6  227 3694    5 3701   36   26   93   38 4645
    5   19   35    2   18  247   74  101    2   18    4 1295   22   10
   10 2496  212   22  167 3766 4543    2   16    4  132   11    4 3039
   18   14  320  534 3668   29   69   77    4  167    7  111    7    4
  833 1290    7   32   58 1890   84   40    4    2 1094 1992 1148    2
    2 3414    5 1738    2   29   16   57 3128    8    8  248   17   29
   69  224    6  176    7  157   23  699  201   10   10   12  152  977
   15   29    5  443 2065   69  126  952  295  159   17   13  566  169
  101 

### Q1: Do you see the difference between these two examples in terms of padding? explain what is happening.

# YOUR ANSWER HERE
Yes I can see the difference. Here, the 24th review has the max review length of words which is 500 (it could be more, and was truncated to 500), hence no need for padding this particular review. However, the second review has fewer words than the max review length, hence the first entries were padded with zeros before the actual first word entry. This is to ensure that all sequences have thesame length.


In [13]:
# now let's look at these reviews in actual words

INDEX_FROM = 3
word_index = imdb.get_word_index()
word_index = {key:(value+INDEX_FROM) for key,value in word_index.items()}
word_index["<PAD>"] = 0    # the padding token
word_index["<START>"] = 1  # the starting token
word_index["<UNK>"] = 2    # the unknown token
reverse_word_index = {value:key for key, value in word_index.items()}

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

decode_review(X_train[23])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


"story is also true to life if only it was developed a little better i felt that the more promising stories in this <UNK> <UNK> were told from the male point of view which is fine but it brings down the emotional <UNK> of these stories because the female characters <UNK> <UNK> and <UNK> <UNK> in particular have all the depth of a half filled bath <UNK> wasn't this film supposed to be about <UNK> different <UNK> now the actors john <UNK> still needs to learn acting while <UNK> <UNK> is <UNK> and endearing as ever <UNK> kapoor gets a role written just for him but sometimes <UNK> the boredom of his character since she didn't get a <UNK> character to portray <UNK> <UNK> uses her charming smile and natural acting style to cover up for it <UNK> <UNK> is fine despite going a <UNK> over the top in a few scenes <UNK> <UNK> has nothing much to do but she does remind us that she's the same girl who surprised us with her <UNK> performance in <UNK> <UNK> tries to make up for that huge mistake calle

## 2. Build models

In this homework you will demonstrate your ability to build various kinds of models for sequence (text) classification. Specifically:

- Using single architectures:
  - CNN 1d layer (https://keras.io/api/layers/convolution_layers/convolution1d/)
  - LSTM (https://keras.io/api/layers/recurrent_layers/lstm/)
  - Bidirectional LSTM (https://keras.io/api/layers/recurrent_layers/bidirectional/) (for this one, you want to do something like `Bidirectional(LSTM(num_units))`)

- Stacking these layers together: Conv-LSTM, Conv-BiLSTM: it just means once you have your conv1d layers, add another (or several) LSTM or Bidirectional LSTM on top of it.

#### 2.1 Conv1d

Tips: to get started, first build a Sequential model. Then add a `Embedding` layer (https://keras.io/api/layers/core_layers/embedding/) with the `input_dim` equal to your vocab_size, and the `output_dim` equal to your `embedding_vector_length`. You should also add an argument `input_length` being equal to your `max_review_length`. Then add a conv1d layer with multiple filters (maybe 64), then a `MaxPooling1D` layer with a pooling factor of 2. You can feel free to repeat this structure another 1 to 3 times if you want. Then before you go into the Dense layer, you need to `Flatten` the output from Conv layers. Once you flattened the output, you can add either another (nonlinear) Dense layer with some units (such as 128) or not, before you add the final Dense layer with a sigmoid activation.

Overall the flow is:

Embedding -> (Conv1D->MaxPooling1D) * K times -> Flatten -> (Dense with relu activation) * M times -> output Dense layer with sigmoid activation.

Note that K>=1 but M>=0.

In [14]:
# task 2:CNN
from tensorflow.keras.layers import Flatten

model = Sequential()
model.add(Embedding(5000, 256, input_length=max_review_length))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(2))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(2))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(2))

model.add(Flatten())
model.add(Dense(256))
model.add(Dense(1, activation='sigmoid'))

# see the model status now
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 500, 256)          1280000   
                                                                 
 conv1d (Conv1D)             (None, 498, 64)           49216     
                                                                 
 max_pooling1d (MaxPooling1  (None, 249, 64)           0         
 D)                                                              
                                                                 
 conv1d_1 (Conv1D)           (None, 247, 64)           12352     
                                                                 
 max_pooling1d_1 (MaxPoolin  (None, 123, 64)           0         
 g1D)                                                            
                                                                 
 conv1d_2 (Conv1D)           (None, 121, 64)           1

In [15]:
model.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 500, 256)          1280000   
                                                                 
 conv1d (Conv1D)             (None, 498, 64)           49216     
                                                                 
 max_pooling1d (MaxPooling1  (None, 249, 64)           0         
 D)                                                              
                                                                 
 conv1d_1 (Conv1D)           (None, 247, 64)           12352     
                                                                 
 max_pooling1d_1 (MaxPoolin  (None, 123, 64)           0         
 g1D)                                                            
                                                                 
 conv1d_2 (Conv1D)           (None, 121, 64)           1

In [16]:
callback = tf.keras.callbacks.EarlyStopping(
    monitor="loss",
    patience=3,
    mode="auto",
    start_from_epoch=2,
)

In [17]:
his = model.fit(x_train, Y_train, epochs=10, batch_size=128, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [18]:
#evaluate on test set
evaluation = model.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 0.8265305757522583
accuracy ---> 0.861240029335022


#### 2.2 LSTM and Bi-LSTM

Once you built the network with CNN, this will be easy. Simply replace the Conv1d layer with the LSTM layer and the Bidirectional LSTM layer (read the documentations linked above).Try using 64 units for the LSTM layer and try 128 as well.

In [37]:
# task 3: LSTM and Bi-LSTM
model_lstm = Sequential()
model_lstm.add(Embedding(5000, 256, input_length=max_review_length))
#add lstm layer with 64 units
model_lstm.add(LSTM(64, activation='tanh'))

model_lstm.add(Flatten())
model_lstm.add(Dense(256))
model_lstm.add(Dense(1, activation='sigmoid'))

#model summary
model_lstm.summary()


Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_9 (Embedding)     (None, 500, 256)          1280000   
                                                                 
 lstm_7 (LSTM)               (None, 64)                82176     
                                                                 
 flatten_6 (Flatten)         (None, 64)                0         
                                                                 
 dense_12 (Dense)            (None, 256)               16640     
                                                                 
 dense_13 (Dense)            (None, 1)                 257       
                                                                 
Total params: 1379073 (5.26 MB)
Trainable params: 1379073 (5.26 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [38]:
#compile the model
model_lstm.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model training
his_lstm = model_lstm.fit(x_train, Y_train, epochs=10, batch_size=128, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [39]:
#evaluate on test set
evaluation = model_lstm.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 0.7022024393081665
accuracy ---> 0.8384400010108948


In [8]:
# Test for the error I asked you about after class

model_test = Sequential()
model_test.add(Embedding(5000, 256, input_length=max_review_length))
#add lstm layer with 64 units
model_test.add(LSTM(64, activation='tanh'))
model_test.add(LSTM(128, return_sequences=True))

model_test.add(Flatten())
model_test.add(Dense(256))
model_test.add(Dense(1, activation='sigmoid'))

#model summary
model_test.summary()


ValueError: Input 0 of layer "lstm_5" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 64)

In [22]:
# task 3: LSTM and Bi-LSTM
model_lstm2 = Sequential()
model_lstm2.add(Embedding(5000, 512, input_length=max_review_length))
#add lstm layer with 128 units
model_lstm2.add(LSTM(128, activation='tanh'))

model_lstm2.add(Flatten())
model_lstm2.add(Dense(256))
model_lstm2.add(Dense(1, activation='sigmoid'))

#model summary
model_lstm2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 500, 512)          2560000   
                                                                 
 lstm_1 (LSTM)               (None, 128)               328192    
                                                                 
 flatten_2 (Flatten)         (None, 128)               0         
                                                                 
 dense_4 (Dense)             (None, 256)               33024     
                                                                 
 dense_5 (Dense)             (None, 1)                 257       
                                                                 
Total params: 2921473 (11.14 MB)
Trainable params: 2921473 (11.14 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [23]:
#compile the model
model_lstm2.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model training
his_lstm2 = model_lstm2.fit(x_train, Y_train, epochs=10, batch_size=64, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [24]:
#evaluate on test set
evaluation = model_lstm2.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 0.6367759108543396
accuracy ---> 0.8531200289726257


In [28]:
# task 3: LSTM and Bi-LSTM
from keras.layers import Bidirectional

model_bilstm = Sequential()
model_bilstm.add(Embedding(5000, 256, input_length=max_review_length))
#add bilstm layer with 64 units
model_bilstm.add(Bidirectional(LSTM(64)))

model_bilstm.add(Flatten())
model_bilstm.add(Dense(256))
model_bilstm.add(Dense(1, activation='sigmoid'))

#model summary
model_bilstm.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_6 (Embedding)     (None, 500, 256)          1280000   
                                                                 
 bidirectional (Bidirection  (None, 128)               164352    
 al)                                                             
                                                                 
 flatten_3 (Flatten)         (None, 128)               0         
                                                                 
 dense_6 (Dense)             (None, 256)               33024     
                                                                 
 dense_7 (Dense)             (None, 1)                 257       
                                                                 
Total params: 1477633 (5.64 MB)
Trainable params: 1477633 (5.64 MB)
Non-trainable params: 0 (0.00 Byte)
________________

In [29]:
#compile the model
model_bilstm.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model training
his_lstm2 = model_bilstm.fit(x_train, Y_train, epochs=15, batch_size=128, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [30]:
#evaluate on test set
evaluation = model_bilstm.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 0.9016522765159607
accuracy ---> 0.8438799977302551


#### 2.3 Conv-LSTM and Conv-BiLSTM

In this last task, you will stack together the Conv1d layers and the LSTM layers. Try adding the LSTM layer after the Conv1d layers and see if it works. You can try using two Conv1d layers and then add a LSTM layers. Then replace the LSTM layer with Bi-LSTM layer. Try different combinations and see what's the best accuracy you can get on Test data.

In [40]:
# task 3: Conv-LSTM
model_clstm = Sequential()
model_clstm.add(Embedding(5000, 256, input_length=max_review_length))
model_clstm.add(Conv1D(64, 3, activation='relu'))
model_clstm.add(LSTM(128, activation='tanh'))

model_clstm.add(Flatten())
model_clstm.add(Dense(128))
model_clstm.add(Dense(1, activation='sigmoid'))

# see the model status now
model_clstm.summary()


Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_10 (Embedding)    (None, 500, 256)          1280000   
                                                                 
 conv1d_3 (Conv1D)           (None, 498, 64)           49216     
                                                                 
 lstm_8 (LSTM)               (None, 128)               98816     
                                                                 
 flatten_7 (Flatten)         (None, 128)               0         
                                                                 
 dense_14 (Dense)            (None, 128)               16512     
                                                                 
 dense_15 (Dense)            (None, 1)                 129       
                                                                 
Total params: 1444673 (5.51 MB)
Trainable params: 144

In [41]:
#compile the model
model_clstm.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model training
his_clstm = model_clstm.fit(x_train, Y_train, epochs=10, batch_size=128, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [42]:
#evaluate on test set
evaluation = model_clstm.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 0.8754912614822388
accuracy ---> 0.8568400144577026


In [43]:
#stacking 3 Conv1D layers and one LSTM layer
model_clstm2 = Sequential()
model_clstm2.add(Embedding(5000, 256, input_length=max_review_length))
model_clstm2.add(Conv1D(64, 3, activation='relu'))
model_clstm2.add(MaxPooling1D(2))
model_clstm2.add(Conv1D(64, 3, activation='relu'))
model_clstm2.add(MaxPooling1D(2))
model_clstm2.add(Conv1D(128, 3, activation='relu'))
model_clstm2.add(MaxPooling1D(2))
model_clstm2.add(LSTM(128, activation='tanh'))

model_clstm2.add(Flatten())
model_clstm2.add(Dense(256))
model_clstm2.add(Dense(1, activation='sigmoid'))

# see the model status now
model_clstm2.summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_11 (Embedding)    (None, 500, 256)          1280000   
                                                                 
 conv1d_4 (Conv1D)           (None, 498, 64)           49216     
                                                                 
 max_pooling1d_3 (MaxPoolin  (None, 249, 64)           0         
 g1D)                                                            
                                                                 
 conv1d_5 (Conv1D)           (None, 247, 64)           12352     
                                                                 
 max_pooling1d_4 (MaxPoolin  (None, 123, 64)           0         
 g1D)                                                            
                                                                 
 conv1d_6 (Conv1D)           (None, 121, 128)        

In [44]:
#compile the model
model_clstm2.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model training
his_clstm2 = model_clstm2.fit(x_train, Y_train, epochs=15, batch_size=128, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [45]:
#evaluate on test set
evaluation = model_clstm2.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 1.000058650970459
accuracy ---> 0.8634799718856812


In [63]:
#stacking 3 Conv1D layers and one BiLSTM layer
model_clstm3 = Sequential()
model_clstm3.add(Embedding(5000, 256, input_length=max_review_length))
model_clstm3.add(Conv1D(64, 3, activation='relu'))
model_clstm3.add(MaxPooling1D(2))
model_clstm3.add(Conv1D(64, 3, activation='relu'))
model_clstm3.add(MaxPooling1D(2))
model_clstm3.add(Conv1D(128, 3, activation='relu'))
model_clstm3.add(MaxPooling1D(2))
model_clstm3.add(Bidirectional(LSTM(128)))

model_clstm3.add(Flatten())
model_clstm3.add(Dense(128))
model_clstm3.add(Dense(1, activation='sigmoid'))

# see the model status now
model_clstm3.summary()

Model: "sequential_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_16 (Embedding)    (None, 500, 256)          1280000   
                                                                 
 conv1d_17 (Conv1D)          (None, 498, 64)           49216     
                                                                 
 max_pooling1d_16 (MaxPooli  (None, 249, 64)           0         
 ng1D)                                                           
                                                                 
 conv1d_18 (Conv1D)          (None, 247, 64)           12352     
                                                                 
 max_pooling1d_17 (MaxPooli  (None, 123, 64)           0         
 ng1D)                                                           
                                                                 
 conv1d_19 (Conv1D)          (None, 121, 128)        

In [64]:
#compile the model
model_clstm3.compile(
    optimizer= 'adam',
    loss= 'binary_crossentropy',
    metrics=['accuracy'],
)

#model training
his_clstm3 = model_clstm3.fit(x_train, Y_train, epochs=15, batch_size=128, callbacks=[callback],
validation_data=(x_val, Y_val))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [65]:
#evaluate on test set
evaluation = model_clstm3.evaluate(X_test, y_test)
for i in range(len(evaluation)):
  print(f'{model.metrics_names[i]} ---> {evaluation[i]}')

loss ---> 1.0317059755325317
accuracy ---> 0.8666399717330933


# Wrap up

Report the accuracies you got from different architectures and write down any insights you have learned.

The Convolution BiLSTM outperformed other models. This model had an accuracy of 86.7% on the test data. This is followed by the Convolution LSTM model, with an accuracy of 86.3%. The next better performing model is the CNN model, with an accuracy of 86.1%. However, the LSTM and BiLSTM models had the leact performance, with an accuracy of 85.3% and 84.3% respectively on the test data.

Hence, from this dataset and model architectures, I can deduce that the CNN model has a better performance on the test data. Also, increasing the number of CNN layers also helps improve the model performance, as can be seen in the Conv1D and ConvLSTM and ConvBiLSTM models.
