<table align="center">
  <td align="center"><a target="_blank" href="https://colab.research.google.com/github/andrew-nash/CS6421-labs/blob/main/Lab8.ipynb">
        <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
  <td align="center"><a target="_blank" href="https://github.com/andrew-nash/CS6421-labs/blob/main/Lab8.ipynb">
        <img src="https://i.ibb.co/xfJbPmL/github.png"  height="70px" style="padding-bottom:5px;"  />View Source on GitHub</a></td>
</table>

# Lab 8 - RNNs

Based on https://www.tensorflow.org/guide/keras/working_with_rnns, https://www.tensorflow.org/text/tutorials/text_classification_rnn

## Introduction

Recurrent neural networks (RNN) are a class of neural networks that is powerful for
modeling sequence data such as time series or natural language.

Schematically, a RNN layer uses a `for` loop to iterate over the timesteps of a
sequence, while maintaining an internal state that encodes information about the
timesteps it has seen so far.

The Keras RNN API is designed with a focus on:

- **Ease of use**: the built-in `keras.layers.RNN`, `keras.layers.LSTM`,
`keras.layers.GRU` layers enable you to quickly build recurrent models without
having to make difficult configuration choices.

- **Ease of customization**: You can also define your own RNN cell layer (the inner
part of the `for` loop) with custom behavior, and use it with the generic
`keras.layers.RNN` layer (the `for` loop itself). This allows you to quickly
prototype different research ideas in a flexible way with minimal code.

In [None]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

In [None]:
!pip install wandb -qU
%load_ext tensorboard
%tensorboard --logdir tboard

In [None]:
import wandb
from wandb.keras import WandbMetricsLogger, WandbModelCheckpoint
wandb.login()

## Dataset

The example we will see will involve using RNNs for **text classification**.

This text classification tutorial trains a recurrent neural network on the IMDB large movie review dataset for sentiment analysis. (Details at https://ai.stanford.edu/~amaas/data/sentiment/). The task here is to, given a review of a particular movie, identify whether that review is either positive or negative.

In [None]:
dataset, info = tfds.load('imdb_reviews', with_info=True,
                          as_supervised=True)


Looking at a sample from this data:

In [None]:
train_dataset, valid_dataset = dataset['train'], dataset['test']

train_dataset.element_spec

In [None]:
for example, label in train_dataset.take(1):
  print('text: ', example.numpy())
  print('label: ', label.numpy())

Finally , shuffle and batch the data

In [None]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
valid_dataset = valid_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

for example, label in train_dataset.take(1):
  print('texts: ', example.numpy()[:3])
  print()
  print('labels: ', label.numpy()[:3])

For this to be usable input to a deep model, this textual data must be converted to some form of numeric input - the details of the best way to accomplish this are outside the scope of this lab. More can be found about this at:

1. https://medium.com/data-science-in-your-pocket/text-vectorization-algorithms-in-nlp-109d728b2b63
2. https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization

#### Text Vectorization

Keras includes a pre-defined model for vectorizing texts.

This model applies the folloing processing:

1. Standardize each example (usually lowercasing + punctuation stripping)
2. Split each example into substrings (usually words)
3. Recombine substrings into tokens (usually ngrams) (by default, this is skipped)
4. Index tokens (associate a unique int value with each token)
5. Transform each example using this index, either into a vector of ints or a dense float vector.

The vocabulary size corresponds to the maximum number of tokens that are allowed to be included in the vectorization - i.e., the maximum size of the encoded vector.

The following code trains a TextVectorization with a given maximum number of tokens on the IMDB data:

In [None]:
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
    max_tokens=VOCAB_SIZE)
encoder.adapt(train_dataset.map(lambda text, label: text))

We can look at the some of the tokens learned ([UNK] corresponds to all words outside the vocabulary)

In [None]:
vocab = np.array(encoder.get_vocabulary())
vocab[:20]

We can also now look at some examples of vectoized sentences

In [None]:
encoded_example = encoder(example)[:3].numpy()
encoded_example.shape,encoded_example

In [None]:
for n in range(3):
  print("Original: ", example[n].numpy())
  print("Round-trip: ", " ".join(vocab[encoded_example[n]]))
  print("First 10 elements of vectorized representation:", encoded_example[n][:10])
  print()

We are now ready to train a model!

## Embedding layers (https://www.tensorflow.org/text/guide/word_embeddings)

The Vectorized inputs are a much more workable format for modelling - but can still be improved. Each element of the vector corresponds to the index of a particular word in the vocabulary (you can think of it as that word's unique integer id).

This does not in itself capture any sense of the *meaning* of the involved words.

An embedding layer is used to create a vector to represent each word

Conside the following example:

We have a vocabulary:

`{cat,mat,on,sat,the}`

And a sentence `"the cat sat"`.

This sentence is vectorized as:

`[4,0,3]`

An embedding would learn a unique **vector** for each word, such as:


<img src='https://www.tensorflow.org/static/text/guide/images/one-hot.png' width=30%/>


<img src='https://www.tensorflow.org/static/text/guide/images/embedding2.png' width=40%/>

The embedding layer can be thought of as a smaller deep model that learns vectors that learn features (i.e. meaning) of the words in a vocabulary.

If we take the 4-D embedding, this sentence would become something like:


[ [-1.054,-0.75, 0.065,2.5] (the), [1.2,-0.1,4.3,3.2] (cat), [-0.75,0.5,1.0,5.0] (sat) ]


### Modelling

We now have a sequence of 'Embdedded Vectors'. Observe that this is temporal data - the embedded vectors are the words of the original sentence, in order.

Therefore, we need a temporal model - such as the RNN (the bidirectional block of the diagram below)

<img src='https://www.tensorflow.org/static/text/tutorials/images/bidirectional.png' />

In [None]:
model = tf.keras.Sequential()
# for simplicity, we will bake the text vectorization and embedding into our model
model.add(encoder)
# Add the text embedding to learn feature vectors on words
# the output_dim is the size of the learned vector (in the above example, it would be 4)
# mask_zero=True allows the sizes of input sentences to be variable
model.add(tf.keras.layers.Embedding(input_dim=len(encoder.get_vocabulary()), output_dim=64,mask_zero=True))
model.add(tf.keras.layers.SimpleRNN(32))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='relu'))

In [None]:
model.summary()

Even before training, we can see the model in action:

In [None]:
sample_text = ('The movie was cool. The animation and the graphics '
               'were out of this world. I would recommend this movie.')
predictions = model.predict(np.array([sample_text]))
print(predictions[0])

In [None]:
run_name="Basic RNN"
#   TextVectorizer is incompatible with tboard weight histograms, so we have to disable these
#   in practice, for this reason it can be better to keep TextVectorizer outside the trainable model
tensorboard_callback = tf.keras.callbacks.TensorBoard(f"./tboard/{run_name}", histogram_freq=0)
wandb.init(
        project = "Lab8",
        name =   run_name)

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

history = model.fit(train_dataset, epochs=1,
                    validation_data=valid_dataset,
                    validation_steps=30, callbacks=[tensorboard_callback,WandbMetricsLogger()])

In [None]:
sample_text = ('The movie was cool. The animation and the graphics '
               'were out of this world. I would recommend this movie.')
predictions = model.predict(np.array([sample_text]))
print(predictions)

In [None]:
sample_text = ('The movie was not good. The animation and the graphics '
               'were terrible. I would not recommend this movie.')
predictions = model.predict(np.array([sample_text]))
print(predictions)

## RNN Hyper-parameters

RNN's can be configured in many ways:

When we say a RNN is set up as XX-to-YY, this means that it takes inputs over XX time steps, and makes outputs over YY time steps. I.e., one-to-many means that the RNN will take inputs in a single time step (in this case, the output of the RNN will be recycled to its input at each time step), and output a series of values.

<img src="https://api.wandb.ai/files/ayush-thakur/images/projects/103390/4fc355be.png"/>

<img src='https://miro.medium.com/v2/resize:fit:640/format:webp/0*VO4DW_vN7ldqgEZg.png'/>

When we have a single, simple RNN layer such as

`model.add(tf.keras.layers.SimpleRNN(32))`

This can be considered, with muliple time steps as input, as a many-to-one model - it will output a single 32 element vector at the last timestep. In our model above, this acted as input to a series of Dense layers to perform classification.

If we want to turb this into a many-to-many model, we can use:

`model.add(tf.keras.layers.SimpleRNN(32, return_sequences=True))`

This will instruct the RNN to make an output at each time step. A major advantage of this is that it allows us to *stack* RNNs, with the output of one being passed to another at each time step.

An example of 3 Stacked LSTM models for a many-to-one problem:

<img src='https://miro.medium.com/v2/resize:fit:720/format:webp/0*sBfgsRRLyknLfca7.jpg'/>

If we were to `return_sequences=True` in our final recurrent layer, the model could become many-to-many!

Other hyper-parameters of the RNN/LSTM/GRU are also alterable:

https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM

1. activation
2. Recurrent activation
2. use_bias
3. regularizers

etc.

In [None]:
model = tf.keras.Sequential()
# for simplicity, we will bake the text vectorization and embedding into our model
model.add(encoder)
# Add the text embedding to learn feature vectors on words
# the output_dim is the size of the learned vector (in the above example, it would be 4)
# mask_zero=True allows the sizes of input sentences to be variable
model.add(tf.keras.layers.Embedding(input_dim=len(encoder.get_vocabulary()), output_dim=64,mask_zero=True))
model.add(tf.keras.layers.LSTM(32, return_sequences=True))
model.add(tf.keras.layers.LSTM(16, return_sequences=True))
model.add(tf.keras.layers.LSTM(8))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))

In [None]:
run_name="Stacked LSTM"
#   TextVectorizer is incompatible with tboard weight histograms, so we have to disable these
#   in practice, for this reason it can be better to keep TextVectorizer outside the trainable model
tensorboard_callback = tf.keras.callbacks.TensorBoard(f"./tboard/{run_name}", histogram_freq=0)
wandb.init(
        project = "Lab8",
        name =   run_name)

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

history = model.fit(train_dataset, epochs=20,
                    validation_data=valid_dataset,
                    validation_steps=30, callbacks=[tensorboard_callback,WandbMetricsLogger()])

In [None]:
sample_text = ('The movie was cool. The animation and the graphics '
               'were out of this world. I would recommend this movie.')
predictions = model.predict(np.array([sample_text]))
print(predictions)

In [None]:
sample_text = ('Great stuff, loved it. Best thing ever.')
predictions = model.predict(np.array([sample_text]))
print(predictions)

In [None]:
sample_text = ("That was the worst thing I've seen in ages")
predictions = model.predict(np.array([sample_text]))
print(predictions)

In [None]:
sample_text = ("Three hours of my life that I won't get back")
predictions = model.predict(np.array([sample_text]))
print(predictions)