## Using RNNs to classify sentiment on IMDB data
For this exercise, we will train a "vanilla" RNN to predict the sentiment on IMDB reviews.  Our data consists of 25000 training sequences and 25000 test sequences.  The outcome is binary (positive/negative) and both outcomes are equally represented in both the training and the test set.

Keras provides a convenient interface to load the data and immediately encode the words into integers (based on the most common words).  This will save us a lot of the drudgery that is usually involved when working with raw text.

We will walk through the preparation of the data and the building of an RNN model.  Then it will be your turn to build your own models (and prepare the data how you see fit).

In [3]:
from __future__ import print_function
import keras
from keras.utils import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import SimpleRNN
from keras.datasets import imdb
from keras import initializers

### Load imdb Data

In [4]:
max_features = 20000  # This is used in loading the data, picks the most common (max_features) words
maxlen = 30  # maximum length of a sequence - truncate after this
batch_size = 32

In [5]:
## Load in the data.  The function automatically tokenizes the text into distinct integers
# tf.keras.datasets.imdb.load_data(
#     path='imdb.npz',
#     num_words=None,
#     skip_top=0,
#     maxlen=None,
#     seed=113,
#     start_char=1,
#     oov_char=2,
#     index_from=3,
#     **kwargs)
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
25000 train sequences
25000 test sequences


### Decode data into Sentence

In [6]:
word_index = imdb.get_word_index()
word_index["<PAD>"] = -3
word_index["<START>"] = -2
word_index["<UNK>"] = -1
word_index["<UNUSED>"] = 0
# Offset the indices by 3 because of the value of starting index_from
# and imdb.get_word_index() dictionary begins at 1
inv_word_index = {v+3: k for k, v in word_index.items()}

def decode_review(encoded_review):
    decoded_review = " ".join([inv_word_index.get(i, "?") for i in encoded_review])
    return decoded_review

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[1m1641221/1641221[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [7]:
print(decode_review(X_train[0]))
print(y_train[0])
print(decode_review(X_train[1]))
print(y_train[1])
print(decode_review(X_train[2]))
print(y_train[2])

<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be p

### Pad (or Truncate) Sequences

In [8]:
sequence = [[1], [2, 3], [4, 5, 6]]
pad_sequences(sequence, maxlen=2, padding='pre', truncating='post')

array([[0, 1],
       [2, 3],
       [4, 5]], dtype=int32)

In [9]:
# This pads (or truncates) the sequences so that they are of the maximum length
X_train = pad_sequences(X_train, maxlen=maxlen)
X_test = pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

X_train shape: (25000, 30)
X_test shape: (25000, 30)


In [10]:
print(X_train[0, :])  #Here's what an example sequence looks like
print(decode_review(X_train[0]))

[  18   51   36   28  224   92   25  104    4  226   65   16   38 1334
   88   12   16  283    5   16 4472  113  103   32   15   16 5345   19
  178   32]
for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all


## Keras layers for (Vanilla) RNNs

In this exercise, we will not use pre-trained word vectors.  Rather we will learn an embedding as part of the Neural Network.  This is represented by the Embedding Layer below.

### Embedding Layer
`keras.layers.embeddings.Embedding(input_dim, output_dim, embeddings_initializer='uniform', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None)`

- This layer maps each integer into a distinct (dense) word vector of length `output_dim`.
- Can think of this as learning a word vector embedding "on the fly" rather than using an existing mapping (like GloVe)
- The `input_dim` should be the size of the vocabulary.
- The `input_length` specifies the length of the sequences that the network expects.

### SimpleRNN Layer
`keras.layers.recurrent.SimpleRNN(units, activation='tanh', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)`

- This is the basic RNN, where the output is also fed back as the "hidden state" to the next iteration.
- The parameter `units` gives the dimensionality of the output (and therefore the hidden state).  Note that typically there will be another layer after the RNN mapping the (RNN) output to the network output.  So we should think of this value as the desired dimensionality of the hidden state and not necessarily the desired output of the network.
- Recall that there are two sets of weights, one for the "recurrent" phase and the other for the "kernel" phase.  These can be configured separately in terms of their initialization, regularization, etc.






In [18]:
## Let's build a RNN
rnn_hidden_dim = 5
word_embedding_dim = 50
model_rnn = Sequential()
model_rnn.add(Embedding(max_features, word_embedding_dim))  # This layer takes each integer in the sequence and
                                                            # embeds it in a 50-dimensional vector
model_rnn.add(SimpleRNN(rnn_hidden_dim,
                        kernel_initializer=initializers.RandomNormal(stddev=0.001),
                        recurrent_initializer=initializers.Identity(gain=1.0),
                        activation='relu',
                        input_shape=X_train.shape[1:]))

model_rnn.add(Dense(1, activation='sigmoid'))

In [19]:
## Note that most of the parameters come from the embedding layer
model_rnn.summary()

In [20]:
rmsprop = keras.optimizers.RMSprop(learning_rate=0.0001)

model_rnn.compile(loss='binary_crossentropy',
                  optimizer=rmsprop,
                  metrics=['accuracy'])

In [21]:
model_rnn.fit(X_train, y_train,
              batch_size=batch_size,
              epochs=10,
              validation_data=(X_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 17ms/step - accuracy: 0.5450 - loss: 0.6884 - val_accuracy: 0.6413 - val_loss: 0.6444
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 16ms/step - accuracy: 0.6706 - loss: 0.6144 - val_accuracy: 0.7015 - val_loss: 0.5720
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 16ms/step - accuracy: 0.7281 - loss: 0.5454 - val_accuracy: 0.7259 - val_loss: 0.5329
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 16ms/step - accuracy: 0.7636 - loss: 0.4922 - val_accuracy: 0.7457 - val_loss: 0.5061
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 16ms/step - accuracy: 0.7884 - loss: 0.4556 - val_accuracy: 0.7604 - val_loss: 0.4878
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 16ms/step - accuracy: 0.8041 - loss: 0.4318 - val_accuracy: 0.7687 - val_loss: 0.4754
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x7b25199be910>

In [22]:
score, acc = model_rnn.evaluate(X_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.7814 - loss: 0.4569
Test score: 0.4550922214984894
Test accuracy: 0.7826399803161621


## Exercise
### Your Turn

Now do it yourself:
- Prepare the data to use sequences of length 80 rather than length 30.  Did it improve the performance?
- Try different values of the "max_features".  Can you improve the performance?
- Try smaller and larger sizes of the RNN hidden dimension.  How does it affect the model performance?  How does it affect the run time?

### Question 01
*   Prepare the data to use sequences of length 80 rather than length 30. Did it improve the performance?



In [23]:
max_features = 20000  # This is used in loading the data, picks the most common (max_features) words
maxlen = 80  # maximum length of a sequence - truncate after this
batch_size = 32

In [24]:
## Let's build a RNN
rnn_hidden_dim = 5
word_embedding_dim = 50
model_rnnQ1 = Sequential()
model_rnnQ1.add(Embedding(max_features, word_embedding_dim))  # This layer takes each integer in the sequence and
                                                            # embeds it in a 50-dimensional vector
model_rnnQ1.add(SimpleRNN(rnn_hidden_dim,
                        kernel_initializer=initializers.RandomNormal(stddev=0.001),
                        recurrent_initializer=initializers.Identity(gain=1.0),
                        activation='relu',
                        input_shape=X_train.shape[1:]))

model_rnnQ1.add(Dense(1, activation='sigmoid'))

  super().__init__(**kwargs)


In [25]:
## Note that most of the parameters come from the embedding layer
model_rnnQ1.summary()

In [26]:
rmsprop = keras.optimizers.RMSprop(learning_rate=0.0001)

model_rnnQ1.compile(loss='binary_crossentropy',
                  optimizer=rmsprop,
                  metrics=['accuracy'])

In [27]:
model_rnnQ1.fit(X_train, y_train,
              batch_size=batch_size,
              epochs=10,
              validation_data=(X_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 22ms/step - accuracy: 0.5551 - loss: 0.6882 - val_accuracy: 0.6504 - val_loss: 0.6356
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 17ms/step - accuracy: 0.6726 - loss: 0.6103 - val_accuracy: 0.7046 - val_loss: 0.5645
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 18ms/step - accuracy: 0.7293 - loss: 0.5367 - val_accuracy: 0.7324 - val_loss: 0.5302
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 21ms/step - accuracy: 0.7677 - loss: 0.4890 - val_accuracy: 0.7504 - val_loss: 0.4999
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 17ms/step - accuracy: 0.7879 - loss: 0.4556 - val_accuracy: 0.7616 - val_loss: 0.4838
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 18ms/step - accuracy: 0.8051 - loss: 0.4261 - val_accuracy: 0.7684 - val_loss: 0.4729
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x7f645366a790>

In [28]:
scoreQ1, accQ1 = model_rnnQ1.evaluate(X_test, y_test, batch_size=batch_size)
print(" - - - > length 30 < - - - ")
print('Test score(loss):', score)
print('Test accuracy:', acc)
print(" - - - > length 80 < - - - ")
print('Test score(loss):', scoreQ1)
print('Test accuracy:', accQ1)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 6ms/step - accuracy: 0.7833 - loss: 0.4520
 - - - > length 30 < - - - 
Test score(loss): 0.45039695501327515
Test accuracy: 0.7870399951934814
 - - - > length 80 < - - - 
Test score(loss): 0.45059385895729065
Test accuracy: 0.7842400074005127


**Ans:** maxlen increasing it doesn't improve the performance, because maxlen is just a lenght of word that we define for input so changing maxlen is doesn't improve anything.

### Question 02
*   Try different values of the "max_features". Can you improve the performance?



In [29]:
max_features = 30000  # This is used in loading the data, picks the most common (max_features) words
maxlen = 30  # maximum length of a sequence - truncate after this
batch_size = 32

In [30]:
## Let's build a RNN
rnn_hidden_dim = 5
word_embedding_dim = 50
model_rnnQ2 = Sequential()
model_rnnQ2.add(Embedding(max_features, word_embedding_dim))  # This layer takes each integer in the sequence and
                                                            # embeds it in a 50-dimensional vector
model_rnnQ2.add(SimpleRNN(rnn_hidden_dim,
                        kernel_initializer=initializers.RandomNormal(stddev=0.001),
                        recurrent_initializer=initializers.Identity(gain=1.0),
                        activation='relu',
                        input_shape=X_train.shape[1:]))

model_rnnQ2.add(Dense(1, activation='sigmoid'))

  super().__init__(**kwargs)


In [31]:
## Note that most of the parameters come from the embedding layer
model_rnnQ2.summary()

In [32]:
rmsprop = keras.optimizers.RMSprop(learning_rate=0.0001)

model_rnnQ2.compile(loss='binary_crossentropy',
                  optimizer=rmsprop,
                  metrics=['accuracy'])

In [33]:
model_rnnQ2.fit(X_train, y_train,
              batch_size=batch_size,
              epochs=10,
              validation_data=(X_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 25ms/step - accuracy: 0.5452 - loss: 0.6869 - val_accuracy: 0.6616 - val_loss: 0.6210
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 22ms/step - accuracy: 0.6911 - loss: 0.5904 - val_accuracy: 0.7224 - val_loss: 0.5466
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 22ms/step - accuracy: 0.7531 - loss: 0.5097 - val_accuracy: 0.7452 - val_loss: 0.5096
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 24ms/step - accuracy: 0.7808 - loss: 0.4682 - val_accuracy: 0.7594 - val_loss: 0.4882
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 24ms/step - accuracy: 0.8032 - loss: 0.4325 - val_accuracy: 0.7706 - val_loss: 0.4722
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 22ms/step - accuracy: 0.8128 - loss: 0.4098 - val_accuracy: 0.7763 - val_loss: 0.4623
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x7f6453d58dd0>

In [34]:
scoreQ2, accQ2 = model_rnnQ2.evaluate(X_test, y_test, batch_size=batch_size)
print(" - - - > Base RNN < - - - ")
print('Test score(loss):', score)
print('Test accuracy:', acc)
print(" - - - > RNN with change max length < - - - ")
print('Test score(loss):', scoreQ2)
print('Test accuracy:', accQ2)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 5ms/step - accuracy: 0.7824 - loss: 0.4578
 - - - > Base RNN < - - - 
Test score(loss): 0.45039695501327515
Test accuracy: 0.7870399951934814
 - - - > RNN with change max length < - - - 
Test score(loss): 0.45388925075531006
Test accuracy: 0.7842400074005127


### Question 03
*   Try smaller and larger sizes of the RNN hidden dimension. How does it affect the model performance? How does it affect the run time?




In [35]:
max_features = 20000  # This is used in loading the data, picks the most common (max_features) words
maxlen = 30  # maximum length of a sequence - truncate after this
batch_size = 32

In [36]:
## Let's build a RNN
rnn_hidden_dim = 10
word_embedding_dim = 50
model_rnnQ3 = Sequential()
model_rnnQ3.add(Embedding(max_features, word_embedding_dim))  # This layer takes each integer in the sequence and
                                                            # embeds it in a 50-dimensional vector
model_rnnQ3.add(SimpleRNN(rnn_hidden_dim,
                        kernel_initializer=initializers.RandomNormal(stddev=0.001),
                        recurrent_initializer=initializers.Identity(gain=1.0),
                        activation='relu',
                        input_shape=X_train.shape[1:]))

model_rnnQ3.add(Dense(1, activation='sigmoid'))

  super().__init__(**kwargs)


In [37]:
## Note that most of the parameters come from the embedding layer
model_rnnQ3.summary()

In [38]:
rmsprop = keras.optimizers.RMSprop(learning_rate=0.0001)

model_rnnQ3.compile(loss='binary_crossentropy',
                  optimizer=rmsprop,
                  metrics=['accuracy'])

In [39]:
model_rnnQ3.fit(X_train, y_train,
              batch_size=batch_size,
              epochs=10,
              validation_data=(X_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 21ms/step - accuracy: 0.5612 - loss: 0.6819 - val_accuracy: 0.6762 - val_loss: 0.6079
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 20ms/step - accuracy: 0.7045 - loss: 0.5762 - val_accuracy: 0.7250 - val_loss: 0.5380
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 18ms/step - accuracy: 0.7609 - loss: 0.4987 - val_accuracy: 0.7534 - val_loss: 0.4975
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 20ms/step - accuracy: 0.7915 - loss: 0.4522 - val_accuracy: 0.7708 - val_loss: 0.4743
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 18ms/step - accuracy: 0.8066 - loss: 0.4248 - val_accuracy: 0.7737 - val_loss: 0.4676
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 22ms/step - accuracy: 0.8200 - loss: 0.3969 - val_accuracy: 0.7832 - val_loss: 0.4545
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x7f64543df610>

In [40]:
scoreQ3, accQ3 = model_rnnQ3.evaluate(X_test, y_test, batch_size=batch_size)
print(" - - - > Base RNN < - - - ")
print('Test score(loss):', score)
print('Test accuracy:', acc)
print(" - - - > RNN with increase hidden dimension < - - - ")
print('Test score(loss):', scoreQ3)
print('Test accuracy:', accQ3)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.7870 - loss: 0.4602
 - - - > Base RNN < - - - 
Test score(loss): 0.45039695501327515
Test accuracy: 0.7870399951934814
 - - - > RNN with increase hidden dimension < - - - 
Test score(loss): 0.4550718367099762
Test accuracy: 0.788320004940033


**Ans:** hidden dimention is a value of dimension that use to store the previous word.If we increase dimention model will take more time to computation because it has many dimension to calculate, but it hidden dimension is high the model will know the relation between current word and previous word more.

### Test by myself

In [11]:
max_features = 20000  # This is used in loading the data, picks the most common (max_features) words
maxlen = 30  # maximum length of a sequence - truncate after this
batch_size = 32

In [12]:
# Reload the data using imdb.load_data with updated max_features
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

# Pad the sequences with the updated maxlen
X_train = pad_sequences(X_train, maxlen=maxlen)
X_test = pad_sequences(X_test, maxlen=maxlen)

In [13]:
## Let's build a RNN
rnn_hidden_dim = 15
word_embedding_dim = 50
model_rnnQ4 = Sequential()
model_rnnQ4.add(Embedding(max_features, word_embedding_dim))  # This layer takes each integer in the sequence and
                                                            # embeds it in a 50-dimensional vector
model_rnnQ4.add(SimpleRNN(rnn_hidden_dim,
                        kernel_initializer=initializers.RandomNormal(stddev=0.001),
                        recurrent_initializer=initializers.Identity(gain=1.0),
                        activation='relu',
                        input_shape=X_train.shape[1:]))

model_rnnQ4.add(Dense(1, activation='sigmoid'))

  super().__init__(**kwargs)


In [14]:
## Note that most of the parameters come from the embedding layer
model_rnnQ4.summary()

In [15]:
rmsprop = keras.optimizers.RMSprop(learning_rate=0.0001)

model_rnnQ4.compile(loss='binary_crossentropy',
                  optimizer=rmsprop,
                  metrics=['accuracy'])

In [16]:
model_rnnQ4.fit(X_train, y_train,
              batch_size=batch_size,
              epochs=15,
              validation_data=(X_test, y_test))

Epoch 1/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 21ms/step - accuracy: 0.5577 - loss: 0.6769 - val_accuracy: 0.6821 - val_loss: 0.5953
Epoch 2/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 17ms/step - accuracy: 0.7095 - loss: 0.5623 - val_accuracy: 0.7301 - val_loss: 0.5311
Epoch 3/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 17ms/step - accuracy: 0.7652 - loss: 0.4896 - val_accuracy: 0.7548 - val_loss: 0.4924
Epoch 4/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 21ms/step - accuracy: 0.7922 - loss: 0.4438 - val_accuracy: 0.7694 - val_loss: 0.4715
Epoch 5/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 16ms/step - accuracy: 0.8122 - loss: 0.4129 - val_accuracy: 0.7788 - val_loss: 0.4583
Epoch 6/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 16ms/step - accuracy: 0.8240 - loss: 0.3940 - val_accuracy: 0.7800 - val_loss: 0.4561
Epoch 7/15
[1m7

<keras.src.callbacks.history.History at 0x7b251b082250>

In [23]:
scoreQ4, accQ4 = model_rnnQ4.evaluate(X_test, y_test, batch_size=batch_size)
print(" - - - > Base RNN < - - - ")
print('Test score(loss):', score)
print('Test accuracy:', acc)
print(" - - - > RNN with changing parameters < - - - ")
print('Test score(loss):', scoreQ4)
print('Test accuracy:', accQ4)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.7840 - loss: 0.4714
 - - - > Base RNN < - - - 
Test score(loss): 0.4550922214984894
Test accuracy: 0.7826399803161621
 - - - > RNN with changing parameters < - - - 
Test score(loss): 0.46902716159820557
Test accuracy: 0.7831599712371826


Example of right and wrong prediction.

In [None]:
import numpy as np

predictions = model_rnnQ4.predict(X_test)

predicted_labels_correct = (predictions > 0.5).astype(int)
predicted_labels_incorrect = (predictions < 0.5).astype(int)

correct_indices = np.where(predicted_labels_correct == y_test)
incorrect_indices = np.where(predicted_labels_incorrect == y_test)

print("Correct Prediction:")
print(decode_review(X_test[correct_indices[0]]))  # Print the first correctly predicted review

print("\nIncorrect Prediction:")
print(decode_review(X_test[incorrect_indices[0]]))  # Print the first incorrectly predicted review

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step
