# <font color=blue>Text classification with an RNN</font>

References: https://www.tensorflow.org/tutorials/text/text_classification_rnn

__Dataset__:

* The IMDB large movie review dataset is a binary classification dataset.

```python
import tensorflow_datasets as tfds
import tensorflow as tf

BATCH_SIZE = 64

dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
train_dataset = dataset['train'].shuffle(10000).padded_batch(BATCH_SIZE)
test_dataset = dataset['test'].padded_batch(BATCH_SIZE)
```

__Model__:

```python
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, 64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])
```

* Shapes:
    * inputs: (batch_size, seq_length)
    * after Embedding(): (batch_size, seq_length, 64)
    * after Birectional(): (batch_size, 128)
    * after Dense(): (batch_size, 64)
    * after Dense(): (batch_size, 1)
    
```python
model.compile(optimizer=tf.keras.optimizers.Adam(1e-4),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(train_dataset, epochs=10, validation_data=test_dataset, validation_steps=30)

test_loss, test_acc = model.evaluate(test_dataset)
```

* The output of `model.predict()` is not necessarily between 0 and 1, since the final dense layer of the model does not use an activation function such as `tanh`.

* If `model.predict()` > 0, then the predicted label is 1. Otherwise, 0.

__Model__ (using two LSTM layers):

```python
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, 64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1)
])
```

* Shapes:
    * inputs: (batch_size, seq_length)
    * after Embedding: (batch_size, seq_length, 64)
    * after Bidirectional: (batch_size, seq_length, 128)
    * after Bidrectional: (batch_size, 64)
    * after Dense: (batch_size, 64)
    * after Dense: (batch_size, 1)
    
* The first LSTM layer uses `return_sequences=True` so that its output preserves the axis of timesteps and the second LSTM layer can be used.

* We can stack multiple RNN layers in such a way.

# <font color=blue>Text generation with an RNN</font>

References: https://www.tensorflow.org/tutorials/text/text_generation

```python
import tensorflow as tf
```

__Dataset__:

```python
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

text = open(path_to_file, 'rb').read().decode(encoding='utf-8')    # str

vocab = sorted(set(text))
vocab_size = len(vocab)     # 65

char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])             # entries are integers ranging from 0 to 64.


seq_length = 100
BATCH_SIZE = 64

dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
dataset = dataset.batch(seq_length+1, drop_remainder=True)
dataset = dataset.map(lambda chunk: (chunk[:-1], chunk[1:]))
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
```

* The shape of each batch is ((64, 100), (64,100)).
* Entries of a batch are integers ranging from 0 to vocab_size-1.
* Note that we did not use `dataset.shuffle()` before batching the dataset.


__Model__:

```python
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    return tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),
        tf.keras.layers.GRU(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])

embedding_dim = 256
rnn_units = 1024

model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)

history = model.fit(dataset, epochs=10, callbacks=[checkpoint_callback], shuffle=False)
```


* Shapes through the model:
    * inputs: (batch_size, x)
    * after Embedding: (batch_size, x, embedding_dim)
    * after GRU: (batch_size, x, rnn_units)
    * after Dense: (batch_size, x, vocab_size)
    * Here x is set to `seq_length` during training, but it can be set to any number for a test.
* Roughly, if input is a string of length m, then model(input) returns a string of the same length.

* Note that `batch_input_shape` is set as `[batch_size, None]`. Moreover, `drop_remainder=True` was used in building the dataset. 
* Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.
* `model.layers[1].states[0].shape` is (batch_size, rnn_units).
* Note also `stateful=True` in `GRU`. The following is from the source of recurrent.py:

Note on using statefulness in RNNs:
    You can set RNN layers to be 'stateful', which means that the states
    computed for the samples in one batch will be reused as initial states
    for the samples in the next batch. This assumes a one-to-one mapping
    between samples in different successive batches.
    To enable statefulness:
      - Specify `stateful=True` in the layer constructor.
      - Specify a fixed batch size for your model, by passing
        If sequential model:
          `batch_input_shape=(...)` to the first layer in your model.
        Else for functional model with 1 or more Input layers:
          `batch_shape=(...)` to all the first layers in your model.
        This is the expected shape of your inputs
        *including the batch size*.
        It should be a tuple of integers, e.g. `(32, 10, 100)`.
      - Specify `shuffle=False` when calling fit().
    To reset the states of your model, call `.reset_states()` on either
    a specific layer, or on your entire model.
    


__Rebuil the model__ (batch_size=1):

```python
model1 = build_model(vocab_size, embedding_dim, rnn_units, 1)
model1.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model1.build(tf.TensorShape([1,None]))
```

__Generate text__:

```python
start_string = u"ROMEO: "
num_generate = 1000

input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

text_generated = []
model1.reset_states()
for i in range(num_generate):
    predictions = model1(input_eval)                                 # (1, len(input_eval), vocab_size)
    predictions = tf.squeeze(predictions, 0)                         # (len(input_eval), vocab_size)
    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
    input_eval = tf.expand_dims([predicted_id], 0)
    text_generated.append(idx2char[predicted_id])

result = start_string + ''.join(text_generated)
```

* When i=0, the state of the RNN layer is updated by using start_string.
* When i>0, input_eval is a tensor of length 1.
* The state of the RNN layer is updated at each iteration.


__Customized training__:

```python
model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

optimizer = tf.keras.optimizers.Adam()

@tf.function
def train_step(inp, target):
    with tf.GradientTage() as tape:
        predictions = model(inp)
        loss = tf.reduce_mean(tf.keras.losses.sparse_categorical_crossentropy(target, predictions, from_logits=True))
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss

EPOCHS = 10
for epoch in range(EPOCHS):
    hidden = model.reset_states()
    for (batch_n, (inp, target)) in enumerate(dataset):
        loss = train_step(inp, taget)
        if batch_n % 100 == 0:
            print('Epoch {} Batch {} Loss {}'.format(epoch+1, batch_n, loss))
    if (epoch+1) % 5 == 0:
        model.save_weights(checkpoint_prefix.format(epoch=epoch))
    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))

model.save_weights(checkpoint_prefix.format(epoch=epoch))
```

* `sparse_categorical_crossentropy()` returns a tensor of shape (batch_size, seq_length).
* `tf.reduce_mean(x)` is a tensor having the value `x.numpy().flatten().mean()`.