#### Long short-term memory (LSTM) networks

LSTM is a type of RNN that can detain long-term dependencies in sequence data, and use a memory cell to control the flow of information.

<u>Memory cell gates</u>

* Input gate decides which information to store in the memory cell. It is trained to open when the input is important and close when it is not.
* Forget gate decides which information to discard from the memory cell. It is trained to open when the information is no longer important and close when it is. 
* Output gate is responsible for deciding which information to use for the output of the LSTM. It is trained to open when the information is important and close when it is not.

Keep 1000 most frequent words, replace the less frequent with 0

In [1]:
from keras.datasets import imdb

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = 1000, oov_char = 0)

In [2]:
word_index = imdb.get_word_index()

In [3]:
print(len(x_train[0]), x_train[0], y_train[0])

218 [1, 14, 22, 16, 43, 530, 973, 0, 0, 65, 458, 0, 66, 0, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 0, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 0, 336, 385, 39, 4, 172, 0, 0, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 0, 19, 14, 22, 4, 0, 0, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 0, 4, 22, 17, 515, 17, 12, 16, 626, 18, 0, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 0, 0, 16, 480, 66, 0, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 0, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 0, 8, 4, 107, 117, 0, 15, 256, 4, 0, 7, 0, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 0, 0, 13, 104, 88, 4, 381, 15, 297, 98, 32, 0, 56, 26, 141, 6, 194, 0, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 0, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 0, 88, 12, 16, 283, 5, 16, 0, 113, 103, 32, 15, 16, 0, 19, 178, 32] 1


In [4]:
# infinty
min_length = float("inf")
max_length = 0

for sequence in x_train:
  sequence_length = len(sequence)
  min_length = min(min_length, sequence_length)
  max_length = max(max_length, sequence_length)

min_length, max_length

(11, 2494)

Pad reviews with length < 512 with 0s at the left

In [5]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_sequence_length = 512

x_train = pad_sequences(x_train, maxlen = max_sequence_length)
x_test = pad_sequences(x_test, maxlen = max_sequence_length)

##### Embeddings

Embeddings are mathematical form (vectors) representations of values or objects like text, images, and audio that are designed to be consumed by machine learning models.

They allow models to find similar objects like photos or documents, and make it possible to understand the relationship between words or other objects

For example, documents near each other in an embedding may be relevent to each other.

**Embedding** layer will convert the every word in the sequence into a numeric array (vector)

* input_dim: unique words count
* output_dim: vector size

In [6]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding, Input

# num_words = 1000
input_dim = 1000
output_dim = 128

model = Sequential([
  Input(shape = (max_sequence_length, )),
  Embedding(input_dim = input_dim, output_dim = output_dim),
  LSTM(128),
  Dense(1, activation = "sigmoid")
])

model.compile(optimizer = "adam", loss = "binary_crossentropy", metrics = ["accuracy"])

In [7]:
model_output = model.fit(x_train, y_train, batch_size = 32, epochs = 20)

Epoch 1/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m379s[0m 480ms/step - accuracy: 0.6738 - loss: 0.5986
Epoch 2/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m335s[0m 429ms/step - accuracy: 0.8038 - loss: 0.4372
Epoch 3/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m374s[0m 478ms/step - accuracy: 0.7866 - loss: 0.4495
Epoch 4/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m467s[0m 586ms/step - accuracy: 0.8777 - loss: 0.2908
Epoch 5/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m349s[0m 447ms/step - accuracy: 0.8902 - loss: 0.2714
Epoch 6/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m437s[0m 559ms/step - accuracy: 0.8965 - loss: 0.2578
Epoch 7/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m448s[0m 573ms/step - accuracy: 0.9068 - loss: 0.2304
Epoch 8/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m421s[0m 539ms/step - accuracy: 0.9154 - loss: 0.2172
Epoch 9/

In [8]:
loss, accuracy = model.evaluate(x_test, y_test)

loss, accuracy

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m151s[0m 193ms/step - accuracy: 0.8605 - loss: 0.5990


(0.5712215900421143, 0.8644800186157227)

In [9]:
model.save("18-dumps/model.keras")

In [24]:
def process_review(review):
  review_words = review.lower().split()

  # the first 3 indexes are reserved in the imdb dataset
  # for example: "the" is the most frequent word, but actually its index is 4
  review = [word_index.get(word, 0) + 3 if word in word_index else 0 for word in review_words]
  review = [x if x <= 1000 else 0 for x in review]
  review = pad_sequences([review], maxlen = max_sequence_length)

  return review

In [15]:
from keras.models import load_model

model = load_model("18-dumps/model.keras")

In [16]:
def predict_sentiment(review):
  review = process_review(review)

  predictions = model.predict(review)

  return "Positive" if predictions[0] > 0.5 else "Negative"

In [25]:
sample_data = [
  "The food was fantastic",
  "The movie was terrible",
  "I love this place, it is a great one"
]

predictions = [predict_sentiment(review) for review in sample_data]

for review, prediction in zip(sample_data, predictions):
  print(f"{prediction} sentiment: {review}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 180ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step
Negative sentiment: The food was fantastic
Negative sentiment: The movie was terrible
Positive sentiment: I love this place, it is a great one
