# Sentiment Analysis with TensorFlow Hub

In [1]:
# In this article, I explained sentiment analysis with TensorFlow Hub using IMDBÂ dataset.

In [2]:
# TensorFlow Hub is a repository of pre-trained machine learning models. 
# You can use models like BERT in TensorFlow Hub for your own work with a few lines of code.

## What is Sentiment Analysis?
Analyzing whether a text is positive or negative is called sentiment analysis.Sentiment analysis is a subfield of natural language processing and is one of the popular fields today.

In [3]:
#Instaling Tensorflow Hub
!pip install -q tensorflow-hub  #for stable version
!pip install -q tensorflow-datasets #Contains latest version of datasets

In [4]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")

Version:  2.5.0
Eager mode:  True
Hub version:  0.12.0
GPU is available


## Downloading the IMDB dataset

The IMDB dataset is available on [imdb reviews](https://www.tensorflow.org/datasets/catalog/imdb_reviews) or on [TensorFlow datasets](https://www.tensorflow.org/datasets). Let's download The following code downloads the IMDB dataset.

In [5]:
# Split the training set into 60% and 40% to end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews", 
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True)

## Exploring the Dataset 

Each example in the IMDB dataset consists of a movie review and a label for that movie review. Labels consist of 0 or 1's. 0 indicates the comment is negative and 1 indicates the comment is positive.

Let's see first 10 examples.

In [6]:
train_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))
train_examples_batch

<tf.Tensor: shape=(10,), dtype=string, numpy=
array([b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.",
       b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell 

Let's also print the first 10 labels.

In [7]:
train_labels_batch

<tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 0, 0, 1, 1, 1, 0, 0, 0, 0])>

## Building the Model

The first layer in text classification must pre-trained text embedding. With this layer, text preprocessing is done. Thus, the text preprocessing process, which is one of the most tiring processes, is easily done.

For this analysis I am gonna use a **pre-trained text embedding model** from [TensorFlow Hub](https://tfhub.dev) called [google/nnlm-en-dim50/2](https://tfhub.dev/google/nnlm-en-dim50/2).

Let's first create a Keras layer and see the output of the first three samples in the training dataset according to the hub_layer layer.

In [8]:
embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])

<tf.Tensor: shape=(3, 50), dtype=float32, numpy=
array([[ 0.5423195 , -0.0119017 ,  0.06337538,  0.06862972, -0.16776837,
        -0.10581174,  0.16865303, -0.04998824, -0.31148055,  0.07910346,
         0.15442263,  0.01488662,  0.03930153,  0.19772711, -0.12215476,
        -0.04120981, -0.2704109 , -0.21922152,  0.26517662, -0.80739075,
         0.25833532, -0.3100421 ,  0.28683215,  0.1943387 , -0.29036492,
         0.03862849, -0.7844411 , -0.0479324 ,  0.4110299 , -0.36388892,
        -0.58034706,  0.30269456,  0.3630897 , -0.15227164, -0.44391504,
         0.19462997,  0.19528408,  0.05666234,  0.2890704 , -0.28468323,
        -0.00531206,  0.0571938 , -0.3201318 , -0.04418665, -0.08550783,
        -0.55847436, -0.23336391, -0.20782952, -0.03543064, -0.17533456],
       [ 0.56338924, -0.12339553, -0.10862679,  0.7753425 , -0.07667089,
        -0.15752277,  0.01872335, -0.08169781, -0.3521876 ,  0.4637341 ,
        -0.08492756,  0.07166859, -0.00670817,  0.12686075, -0.19326553,
 

I am gonna build the full model. Let's use Sequential API.

In [9]:
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer (KerasLayer)     (None, 50)                48190600  
_________________________________________________________________
dense (Dense)                (None, 16)                816       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
Total params: 48,191,433
Trainable params: 48,191,433
Non-trainable params: 0
_________________________________________________________________


## Compiling the Model

Now we can compile the model. The compile() method is used to compile the model. When compiling the model, we need to specify the loss function and an optimizer. Since the problem we are considering is a classification problem, we can use binary_crossentropy as the loss function.

Let's use the recently popular 'adam' as the optimizer. And let's use the "accuracy" metric to see the performance of the model in each last epoch.

In [10]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

## Training the Model

The fit() method is called to train the model. We will use the training data to train the model.

In [11]:
history = model.fit(train_data.shuffle(10000).batch(512),
                    epochs=10,
                    validation_data=validation_data.batch(512),
                    verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Evaluate the model

Let's find the accuracy value of the model on the test data by calling the evaluate method.

In [12]:
results = model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(model.metrics_names, results):
  print("%s: %.3f" % (name, value))

49/49 - 2s - loss: 0.3545 - accuracy: 0.8566
loss: 0.355
accuracy: 0.857


## Conclusion
In this post, I trained a machine learning model for sentiment analysis using pre-trained nnlm-en-dim128/2 embedding layer in TensorFlow Hub. The accuracy of the model I built was 99 percent on the training data set and 86 percent on the test data set.

Since the accuracy value of the model on the training dataset is greater than the accuracy value on the test dataset, the model has a memorization problem. Memorizing models cannot be generalized well. You can use regularization techniques such as dropout to overcome the memorization problem. I hope you enjoyed this article.

## References

* https://www.tensorflow.org/tutorials/keras/text_classification_with_hub

Don't forget to follow on Tirendaz Academy [YouTube-Tr](https://youtube.com/c/tirendazakademi), [YouTube-Eng](https://www.youtube.com/channel/UCFU9Go20p01kC64w-tmFORw), [Twitter](https://twitter.com/TirendazAcademy), [Medium](https://tirendazacademy.medium.com), [GitHub](https://github.com/TirendazAcademy) and [LinkedIn](https://www.linkedin.com/in/tirendaz-academy)