#TensorFlow Hub Text Classification

Source: https://github.com/tensorflow/hub/blob/master/examples/colab/tf2_text_classification.ipynb

Dataset Information: https://www.kaggle.com/iarunava/imdb-movie-reviews-dataset


We are going to classify movie reviews as *positive* or *negative* using the text of the review. 

We'll use the [IMDB dataset](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb) that contains the text of 50,000 movie reviews from the [Internet Movie Database](https://www.imdb.com/). These are split into 25,000 reviews for training and 25,000 reviews for testing. 

This notebook uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow, and [TensorFlow Hub](https://www.tensorflow.org/hub), a library and platform for transfer learning. 

Libraries we'll need:
* numpy
* TensorFlow
* TensorFlow Hub
* TensorFlow datasets
* matplotlib pyplot

In [0]:
!pip install tensorflow==2.0.0-alpha0
!pip install "tensorflow-hub>=0.3"

In [0]:
from __future__ import absolute_import, division, print_function

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

print("Version: ", tf.__version__)
print("Hub version: ", hub.__version__)

## Download the IMDB dataset

The IMDB dataset is available on [TensorFlow datasets](https://github.com/tensorflow/datasets). The following code downloads the IMDB dataset.

We will also need to convert the datasets into numpy arrays using [tfds.as_numpy](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_numpy)

In [0]:
train_data, test_data = tfds.load(name="imdb_reviews", split=["train", "test"], 
                                  batch_size=-1, as_supervised=True)

train_examples, train_labels = tfds.as_numpy(train_data)
test_examples, test_labels = tfds.as_numpy(test_data)

## Explore the data 

Let's take a moment to understand the format of the data. Each example is a sentence representing the movie review and a corresponding label. The sentence is not preprocessed in any way. The label is an integer value of either 0 or 1, where 0 is a negative review, and 1 is a positive review.

In [0]:
print("Training entries: {}, test entries: {}".format(len(train_examples), len(test_examples)))

Let's print first 10 examples.

In [0]:
train_examples[:10]

Let's also print the first 10 labels.

In [0]:
train_labels[:10]

## Build the model

In this example, the input data consists of sentences. The labels to predict are either 0 or 1.

One way to represent the text is to convert sentences into embeddings vectors. We can use a pre-trained text embedding as the first layer, which will have two advantages:
*   we don't have to worry about text preprocessing,
*   we can benefit from transfer learning.

We need to use embeddings because our computer works in 1s and 0s but our dataset is text. Through embedding our text, we are converting it into numbers our computer can understand. 

For this example we will use a model from [TensorFlow Hub](https://www.tensorflow.org/hub) called [google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1).


If you want to play around, there are three other models you can test:
* [google/tf2-preview/gnews-swivel-20dim-with-oov/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1) - same as [google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1), but with 2.5% vocabulary converted to OOV buckets. This can help if vocabulary of the task and vocabulary of the model don't fully overlap.
* [google/tf2-preview/nnlm-en-dim50/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1) - A much larger model with ~1M vocabulary size and 50 dimensions.
* [google/tf2-preview/nnlm-en-dim128/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1) - Even larger model with ~1M vocabulary size and 128 dimensions.

Let's first create a Keras layer that uses a TensorFlow Hub model to embed the sentences and  try it out on a couple of input examples. To transform a TensorFlow Hob Model into a Keras layer, we'll use the [hub.KerasLayer ](https://www.tensorflow.org/hub/api_docs/python/hub/KerasLayer)function

Link to the module:  "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"

In [0]:
model =  "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"

hub_layer = hub.KerasLayer(model, output_shape=[20], input_shape=[], dtype=tf.string, trainable=True)

hub_layer(train_examples[:3])

You'll notice that when we pass the first three reviews into our embedding model, it transforms the reviews into a vector of 20 numbers. Those 20 numbers now represent our review. 

Let's now build the full model:


1.  Our embedding hub layer
2.   A dense hidden layer with 16 nodes
3. A dense output layer with 1 node (prediction, use sigmoid)



In [0]:
model = tf.keras.Sequential([
    hub_layer,
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
    
])


model.summary()

### Hidden units

Feel free to adjust the number of nodes in the hidden layer and see how it affects your model performance. 

**Remember:**
If a model has more hidden units (a higher-dimensional representation space), and/or more layers, then the network can learn more complex representations. However, it makes the network more computationally expensive and may lead to learning unwanted patterns—patterns that improve performance on training data but not on the test data. This is called *overfitting*.

### Compile  our model

For our loss function we will use `binary_crossentropy`. It is better for dealing with probabilities—it measures the "distance" between probability distributions, or in our case, between the ground-truth distribution and the predictions.

For this model stick with adam optimization and measuring accuracy. 

In [0]:
model.compile(optimizer= 'adam',
              loss= 'binary_crossentropy',
              metrics= ['accuracy'])

## Create a validation set

When training, we want to check the accuracy of the model on data it hasn't seen before. Create a *validation set* by setting apart 10,000 examples from the original training data. (Why not use the testing set now? Our goal is to develop and tune our model using only the training data, then use the test data just once to evaluate our accuracy).

In [0]:
x_val = train_examples[:10000]
x_train = train_examples[10000:]

y_val = train_labels[:10000]
y_train = train_labels[10000:]

## Train the model

Train the model for 40 epochs in mini-batches of 512 samples. This is 40 iterations over all samples in the `x_train` and `y_train` tensors. While training, monitor the model's loss and accuracy on the 10,000 samples from the validation set:

[model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#fit)

In [0]:
history = model.fit(x_train, y_train,
                    epochs=40, batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1
                   )

## Evaluate the model

And let's see how the model performs. Two values will be returned. Loss (a number which represents our error, lower values are better), and accuracy.

In [0]:
results = model.evaluate(test_data, test_labels)
print(results)

This fairly simple approach achieves an accuracy of about ~80%. With more advanced approaches, the model could get closer to 95%.

## Create a graph of accuracy and loss over time

`model.fit()` returns a `History` object that contains a dictionary with everything that happened during training:

In [0]:
history_dict = history.history
history_dict.keys()

There are four entries: one for each monitored metric during training and validation. We can use these to plot the training and validation loss for comparison, as well as the training and validation accuracy:

Grab from the history object: 


*   accuracy
*  val_accuracy
* loss
* val_loss

And plot these using matplotlib



In [0]:
acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

# "bo" is for "blue dot"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

In [0]:
plt.clf()   # clear figure

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()


In this plot, the dots represent the training loss and accuracy, and the solid lines are the validation loss and accuracy.

Notice the training loss *decreases* with each epoch and the training accuracy *increases* with each epoch. This is expected when using a gradient descent optimization—it should minimize the desired quantity on every iteration.

This isn't the case for the validation loss and accuracy—they seem to peak after about twenty epochs. This is an example of overfitting: the model performs better on the training data than it does on data it has never seen before. After this point, the model over-optimizes and learns representations *specific* to the training data that do not *generalize* to test data.

For this particular case, we could prevent overfitting by simply stopping the training after twenty or so epochs.

**Go back through the model and see if you can improve your results by fixing the overfitting**

  
  

*Hint*:  Tensorflow has a fucntion built in called [tf.keras.callbacks.EarlyStopping](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback) that will monitor your evauation loss and stop training the model once the model stops improving. 

Try going back and adding it as a callback metric: 
`early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=4)`
