# Sentiment Analysis using LSTM Tensorflow


# How to create a Neural Network with LSTM layers in TensorFlow and Keras
We understand how LSTMs work and how they are represented within TensorFlow, it’s time to actually build one with Python, TensorFlow and its Keras APIs. We’ll walk you through the process with step-by-step examples. The process is composed of the following steps:
<ol>
<li>Importing the Keras functionality that we need into the Python script.</li>
<li>Listing the configuration for our LSTM model and preparing for training.</li>
<li>Loading and preparing a dataset; we’ll use the IMDB dataset today.</li>
<li>Defining the Keras model.</li>
<li>Compiling the Keras model.</li>
<li>Training the Keras model.</li>
<li>Evaluating the Keras model.</li>
</ol>

## Defining the model imports
Let’s specify the model imports first:

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.layers import Embedding, Dense, LSTM
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


<ul>
    <li> We’ll need TensorFlow so we import it as tf.</li>
<li>From the TensorFlow Keras Datasets, we import the imdb one.</li>
<li>We’ll need word embeddings (Embedding), MLP layers (Dense) and LSTM layers (LSTM), so we import them as well.</li>
<li>Our loss function will be binary cross entropy.</li>
<li>As we’ll stack all layers on top of each other with model.add, we need Sequential (the Keras Sequential API) for constructing our model variable in the first place.</li>
<li>For optimization we use an extension of classic gradient descent called Adam.</li>
<li>Finally, we need to import pad_sequences. We’re going to use the IMDB dataset which has sequences of reviews. While we’ll specify a maximum length, this can mean that shorter sequences are present as well; these are not cutoff and therefore have different sizes than our desired one (i.e. the maximum length). We’ll have to pad them with zeroes in order to make them of equal length.</li></ul>

## Listing model configuration

The next step is specifying the model configuration. While strictly not necessary (we can also specify them hardcoded), I always think it’s a good idea to group them together. This way, you can easily see how your model is configured, without having to take a look through all the aspects.

Below, we can see that our model will be trained with a batch size of 128, using binary crossentropy loss and Adam optimization, and only for five epochs (we only have to show you that it works). 20% of our training data will be used for validation purposes, and the output will be verbose, with verbosity mode set to 1 out of 0, 1 and 2. Our learned word embedding will have 15 hidden dimensions and each sequence passed through the model is 300 characters at max. Our vocabulary will contain 5000 words at max.

In [2]:
# Model configuration
additional_metrics = ['accuracy']
batch_size = 128
embedding_output_dims = 15
loss_function = BinaryCrossentropy()
max_sequence_length = 300
num_distinct_words = 5000
number_of_epochs = 5
optimizer = Adam()
validation_split = 0.20
verbosity_mode = 1

You might now also want to disable Eager Execution in TensorFlow. While it doesn’t work for all, some people report that the training process speeds up after using it. However, it’s not necessary to do so – simply test how it behaves on your machine:

In [3]:
# Disable eager execution
tf.compat.v1.disable_eager_execution()

## Loading and preparing the data
Once this is complete, we can load and prepare the data. To make things easier, Keras comes with a standard set of datasets, of which the IMDB dataset can be used for sentiment analysis (essentially text classification with two classes). Using imdb.load_data(...), we can load the data.

Once the data has been loaded, we apply pad_sequences. This ensures that sentences shorter than the maximum sentence length are brought to equal length by applying padding with, in this case, zeroes, because that often corresponds with the padding character.

In [4]:
# Load dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_distinct_words)
print(x_train.shape)
print(x_test.shape)

# Pad all sequences
padded_inputs = pad_sequences(x_train, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>
padded_inputs_test = pad_sequences(x_test, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
(25000,)
(25000,)


In [12]:
len(x_train[890])

129

In [13]:
len(padded_inputs[890])

300

## Defining the Keras model
We can then define the Keras model. As we are using the Sequential API, we can initialize the model variable with Sequential(). The first layer is an Embedding layer, which learns a word embedding that in our case has a dimensionality of 15. This is followed by an LSTM layer providing the recurrent segment (with default tanh activation enabled), and a Dense layer that has one output – through Sigmoid a number between 0 and 1, representing an orientation towards a class.

In [14]:
# Define the Keras model
model = Sequential()
model.add(Embedding(num_distinct_words, embedding_output_dims, input_length=max_sequence_length))
model.add(LSTM(10))
model.add(Dense(1, activation='sigmoid'))

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


## Compiling the Keras model
The model can then be compiled. This initializes the model that has so far been a skeleton, a foundation, but no actual model yet. We do so by specifying the optimizer, the loss function, and the additional metrics that we had specified before.

In [15]:
# Compile the model
model.compile(optimizer=optimizer, loss=loss_function, metrics=additional_metrics)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


This is also a good place to generate a summary of what the model looks like.

In [16]:
# Give a summary
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 300, 15)           75000     
_________________________________________________________________
lstm (LSTM)                  (None, 10)                1040      
_________________________________________________________________
dense (Dense)                (None, 1)                 11        
Total params: 76,051
Trainable params: 76,051
Non-trainable params: 0
_________________________________________________________________


## Training the Keras model
Then, we can instruct TensorFlow to start the training process.

In [17]:
# Train the model
history = model.fit(padded_inputs, y_train, batch_size=batch_size, epochs=number_of_epochs, verbose=verbosity_mode, validation_split=validation_split)

Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


The (input, output) pairs passed to the model are the padded inputs and their corresponding class labels. Training happens with the batch size, number of epochs, verbosity mode and validation split that were also defined in the configuration section above.

## Evaluating the Keras model
We cannot evaluate the model on the same dataset that was used for training it. We fortunately have testing data available through the train/test split performed in the load_data(...) section, and can use built-in evaluation facilities to evaluate the model. We then print the test results on screen.

In [18]:
# Test the model after training
test_results = model.evaluate(padded_inputs_test, y_test, verbose=False)
print(f'Test results - Loss: {test_results[0]} - Accuracy: {100*test_results[1]}%')

Test results - Loss: 0.3647162175369263 - Accuracy: 85.40400266647339%


## Summary
Long Short-Term Memory Networks (LSTMs) are a type of recurrent neural network that can be used in Natural Language Processing, time series and other sequence modeling tasks. In this article, we covered their usage within TensorFlow and Keras in a step-by-step fashion.

We first briefly looked at LSTMs in general. What are they? What can they be used for? How do they improve compared to previous RNN based approaches? This analysis gives you the necessary context in order to understand what is going on within your code.

We then looked at how LSTMs are represented in TensorFlow and Keras. We saw that there is a separate LSTM layer that can be configured with a wide variety of attributes. In the article, we looked at the meaning for each attribute and saw how everything interrelates. Once understanding this, we moved on to actually implementing the model with TensorFlow. In a step-by-step phased approach, we explained in detail why we made certain choices, allowing you to see exactly how the model was constructed.

After training on the IMDB dataset, we saw that the model achieves an accuracy of approximately 85.40% on the evaluation set.