# RNN vs LSTM

All RNNs have feedback loops in the recurrent layer. This lets them maintain information in 'memory' over time. But, it can be difficult to train standard RNNs to solve problems that require learning long-term temporal dependencies. This is because the gradient of the loss function decays exponentially with time (called the vanishing gradient problem). LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a 'memory cell' that can maintain information in memory for long periods of time. A set of gates is used to control when information enters the memory, when it's output, and when it's forgotten.

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
dataset_name = 'yelp_polarity_reviews/subwords8k'
text_feature = 'text'
encoder_subwords = 50
delimiter = '---------'
example = "the park is nice and quiet" # 1, .., 13, .., 3, ..
examples_are_correct = "examples are correct"
examples_are_not_correct = "examples are not correct"
activation_type = 'relu'
learning_rate = 1e-4
metrics_type = 'accuracy'
model_name = 'lab7.h5'
model_weights_name = "lab7_weights.h5"

# Load Dataset

In [None]:
(train_dataset, test_dataset), dataset_info = tfds.load(name=dataset_name,
                                          split=(tfds.Split.TRAIN, tfds.Split.TEST),
                                          with_info=True,
                                          as_supervised=True)

In [None]:
encoder = dataset_info.features[text_feature].encoder

print(dataset_info.splits)
print(delimiter)
print(encoder.vocab_size)
print(delimiter)
print(encoder.subwords[:encoder_subwords])

{'test': <tfds.core.SplitInfo num_examples=38000>, 'train': <tfds.core.SplitInfo num_examples=560000>}
---------
8176
---------
['the_', ', ', 'and_', '. ', 'I_', 'a_', 'to_', 'was_', 'of_', '.  ', 's_', 'in_', 'is_', 'for_', 'it_', 'that_', 't_', 'my_', 'with_', 'on_', 'but_', 'The_', 'you_', 'this_', 'have_', 'they_', 'not_', 'we_', 'had_', 'at_', 'were_', '.\\', 'are_', 'be_', 'so_', 'as_', 'it', 'd_', 'place_', 'like_', 'me_', ' (', 'just_', 'get_', '. \\', 'ing_', 'ed_', 'our_', 'food_', 'or_']


In [None]:
example_ids = encoder.encode(example)
print(example_ids)

[1, 1984, 13, 151, 3, 5122]


In [None]:
example_from_ids = encoder.decode(example_ids)
print(example_from_ids)

the park is nice and quiet


In [None]:
if (example == example_from_ids):
  print(examples_are_correct)
else:
  print(examples_are_not_correct)

examples are correct


# Training and Validation

In [None]:
buffer_size = 800
batch_size = 50

In [None]:
train_data = train_dataset.shuffle(buffer_size).padded_batch(batch_size = batch_size, padded_shapes = ([None],[]))
test_data = test_dataset.shuffle(buffer_size).padded_batch(batch_size = batch_size, padded_shapes = ([None],[]))

# Model Definition

In [None]:
# 1) Word Embeddings = trandsforms integer = [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
# 2) Bi-directional layer =  LSTMs have been one-way models, also
  # called unidirectional ones. In other words, sequences such as
  # tokens (i.e. words) are read in a left-to-right or right-to-left fashion.
  # This does not necessarily reflect good practice, as more recent Transformer
  # based approaches like BERT suggest. In fact, bidirectionality - or processing 
  # the input in a left-to-right and a right-to-left fashion,
  # can improve the performance of your Machine Learning model.
# 3) Dense Layer = Just your regular densely-connected NN layer
# 4) Binary Output
model = tf.keras.Sequential([tf.keras.layers.Embedding(encoder.vocab_size, batch_size),
                           tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units = 64)),
                           tf.keras.layers.Dense(units = 64, activation = activation_type),
                           tf.keras.layers.Dense(units = 1)
])

In [None]:
# BinaryCrossEntropy = two label classes
model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate),
              loss = tf.keras.losses.BinaryCrossentropy(from_logits = True),
              metrics = [metrics_type])

# Model Training & Saving

In [None]:
epochs = 5
validation_cycles = 10

# workers = maximum number of processes to spin up when using process-based threading
hist = model.fit(train_data,
                 epochs = epochs,
                 validation_data = test_data,
                 validation_steps = validation_cycles,
                 workers = 8)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.save(model_name)
model.save_weights(model_weights_name)

# Trained Model Performance Evaluation

In [None]:
test_loss, test_acc = model.evaluate(test_data)



In [None]:
print('Accuracy:', test_acc)
print('Loss:', test_loss)

Accuracy: 0.9496579170227051
Loss: 0.13735729455947876


## Model Evaluation

If the prediction is >= 0.5, it is positive else it is negative.

In [None]:
def predict(text):
    encoded = encoder.encode(text)
    encoded = tf.cast(encoded, tf.float32)
    return (model.predict(tf.expand_dims(encoded, 0)))

In [None]:
example_texts = ["This book is good",
                 "This book is bad",
                 "I'd rather have paid to prevent them from releasing this",
                 "this game came with none of the promised improvements and didn't even fix the old bugs",
                 "What an incredible game this is a wholesome openworld game I dont understand why some of the idiots are writing emotional review how could people without rational judgment write a review?",
                 "Great feeling of exploration, the world is huge",
                 "I really like this food from the store"]

In [None]:
for text in example_texts:
  print(predict(text))

[[0.53918487]]
[[-1.7114202]]
[[-1.8774989]]
[[-2.3849142]]
[[1.4655215]]
[[3.0931892]]
[[0.05660355]]
