# Recurrent Neural Networks (RNN)

More capable in processing sequential data like text

Commonly used for **natural language processing**

### Internal Loop
- Recurrent Neural Networks does not process entire data at once - processes at different time steps
    - for text, feed one word at a time
- Model maintains an internal memory - remembers what it has seen previously
- Types of RNN layers:

#### Simple RNN Layer
- Data is passed as sequence
- There is a recurrent layer in the network, that has a loop back to itself
- The recurrent layer for input at time 1 has an input from the layer of the input at time 0, from the previous step
    - At time step 0, the only input is the input data, x<sub>0</sub>, and it produces an output h<sub>0</sub>
    - At time step 1, the input to the recurrent layer is x<sub>1</sub> as well as h<sub>0</sub>, which produces an output h<sub>1</sub>
    - this repeats for the next time step 
- Each time step builds on everything seen before
- issue: for long sequences, the impact of the older timestep inputs can be lost, because only the most recent timestep is fed back

#### LSTM layer
- Long-Short Term Memory
- Allows the model to remember the output state at any time in the past

## Data

### Sequence Data
- Long chains of text, weather patterns, videos, or anything where the notion of a step of time is relevent
- The order of the data is important to keep track of

### Textual Data
- A type of sequence data
- need to encode the text into numrical data that can be fed to the neural network
- There are different methods of doing this:

    #### Bag of Words
    - Look at entire training dataset and create a dictionary of the vocabalary
        - every unique word is the vocabulary
        - some integer represents each word
    - keep track of the frequency of each word in a sentence
    - flawed method because the order of words is lost - only keeps frequency and what words they are
    
    #### Word Embedding
    - Tries to represent similar words with similar numbers
    - classify each word in n dimensional vectors (usually 64 or 128)
        - vector tells how similar word is to other words
        - the words "good" and "happy" will be represent by vectors with a small angle between them
        - opposite words will have very different vectors
    - word embedding is implemented in a layer in the neural netword
        - model learns word embeddings through the context of the words in the sentence
    - can use pretrained word embedding layers

## Sentiment Analysis
Analyze how positive or negative a piece of text is

### Movie Review Dataset
- IMDB movie review dataset from keras
- contains 25,000 movie reviews
- reviews are preprocessed and have labels as either positive or negative
    - each review is encoded by integers that represent how common the word is in the entire dataset
    - a word encoded by integer 3 is the 3rd most common word in the dataset

In [1]:
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence
import os
import numpy as np

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
# Load data
from tensorflow.keras.datasets import imdb

VOCAB_SIZE = 88584      # Number of unique words in this dataset

MAX_LEN = 250           # Max word length of review we will use

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=VOCAB_SIZE) # include all of the words

### Preprocessing
- Need to make all samples the same length of words
- if review is greater than 250 words, trim off extra words
- if review is less than 250 words, add 0s to make it 250 (padding to the left)

In [None]:
train_data = sequence.pad_sequences(train_data, MAX_LEN)
test_data = sequence.pad_sequences(test_data, MAX_LEN)

### Create the Model
- First layer is the embedding layer to find a meaningful representation of numbers
- Second layer is the LSTM feedback layer
- Third layer is a Dense classification layer - sigmoid activation to get probabilty of positive or negative

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, 32),      # Embedding layer - words are going to represented as 32 dimen vectors
    tf.keras.layers.LSTM(32),                       # LSTM feedback layer - input is 32 dimensions per word
    tf.keras.layers.Dense(1, activation='sigmoid')  # One output neuron
])

model.summary()

### Train Model

In [None]:
model.compile(
    loss="binary_crossentropy",
    optimizer="rmsprop",
    metrics=['accuracy']
)

history = model.fit(
    train_data, 
    train_labels, 
    epochs=10, 
    validation_split=0.2    # Validate with 20% of data
)

In [None]:
# Test model
results = model.evaluate(test_data, test_labels)
print(f"Accuracy: {results[1]*100:0.2f}%")

### Make Prediction
- Need to preprocess any reviews in same method that original data was encoded in

In [None]:
word_index = imdb.get_word_index()

def encode_text(text):
    tokens = tf.keras.preprocessing.text.text_to_word_sequence(text)
    tokens = [word_index[word] if word in word_index else 0 for word in tokens]
    return sequence.pad_sequences([tokens], MAX_LEN)[0] # returns list of lists, get first one

def predict(text):
    encoded_text = encode_text(text)
    pred = np.zeros((1,MAX_LEN))    # shape of input is 1 review with MAX_LEN (250) words
    pred[0] = encoded_text
    result = model.predict(pred)[0][0]
    if result >= 0.5:
        print(f"Predicted: Positive")
    else:
        print(f"Predicted: Negative")

review = "That movie was so awesome! I really loved it and would watch it again because it was amazingly great"
predict(review)

review = "That movie sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched"
predict(review)
    

## Character Generation
Generate the next characters in a sequence of text

### RNN Play Generator
- Show the neural network an example o something for it create until it learn to write it iteself
- Use character predictive model that will take a variable length input sequence and predict the next character
- Using the model many times in a row with the previous output from the last prediction can generate a sequence

### Dataset
- Romeo and Juliet dataset from keras

In [2]:
from tensorflow.keras.preprocessing import sequence
import tensorflow.keras
import tensorflow as tf
import os
import numpy as np
import requests

# Load dataset
response = requests.get("https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt")
text = response.content.decode(encoding='utf-8')

### Encoding
Encode each character in text with an integer

In [3]:
vocab = sorted(set(text))   # get unique characters

# Create encode mapping

char2idx = {u:i for i,u in enumerate(vocab)}
idx2char = np.array(vocab)

# Convert text to integer encoding

def text_to_int(text):
    return np.array([char2idx[c] for c in text])

def int_to_text(ints):
    try:
        ints = ints.numpy()
    except:
        pass
    return ''.join(idx2char[ints])

text_as_int = text_to_int(text)

### Create Training Examples

Need to split text into shorter sequences to pass to model as training examples

Input will be an *n* length sequence and output will be an *n* length sequence which is the input shifted once letter to the right
- EX: input: Hell -> output: ello

In [4]:
seq_len = 100   # length of each training example
examples_per_epoch = len(text) // (seq_len)
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)
EMBEDDING_DIM = 256     # Dimensions of embedded encoding of word vectors
RNN_UNITS = 1024        # 
BUFFER_SIZE = 10000     # Buffer to use during shuffling

char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)  # create character dataset from text as integer

# Batch character dataset into 101 size batches, drop the extra text at the end
sequences = char_dataset.batch(seq_len+1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]     # Get all but last character (chars 0-100)
    target_text = chunk[1:]     # Get all but first character (chars 1-101) - the value to predict
    return input_text, target_text

dataset = sequences.map(split_input_target)     # split each entry in dataset

# Make Batches for final training sequence
data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

### Building the Model
Use an embedding layer, and LSTM layer, and a dense layer that contains a node for each unique character it can choose from

In [5]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(
            vocab_size, 
            embedding_dim, 
            batch_input_shape=[batch_size, None]        # size BATCH_SIZE x None : None -> don't know length of input sequence when making predictions later
        ),
        tf.keras.layers.LSTM(
            rnn_units, 
            return_sequences=True,      # Return the intermediate stage at every step - want to see intermediate steps, not just final stage
            stateful=True,
            recurrent_initializer='glorot_uniform'  # starting values in LSTM
        ),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (64, None, 256)           16640     
                                                                 
 lstm (LSTM)                 (64, None, 1024)          5246976   
                                                                 
 dense (Dense)               (64, None, 65)            66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


### Creating a Loss Function
Model will output a (64, sequence_length, 65) shaped tensor that represents the probability distribution of each character
- batches are 64, for training, sequence length is 100, and there are 65 vocabulary words
- There are 64 ouput array - 1 for each sequence in the batch
- There are 100 arrays inside each of the 64 - one prediction for each character in the array (remembering all of them)
- There are 65 values in each of these - the probability distribution for each character being the prediction

Since the output shape is not common, need to make our own loss function

In [6]:
# See ouput shape by predicting first batch (model not trained, so prediction is random)
for input, target in data.take(1):
    predictions = model(input)

print(predictions.shape)
# print(predictions)  # array of 64 arrays - one for each sequence in the batch

pred = predictions[0] # array of 100 probability distributions for the first sequence in the batch
print(pred.shape)
print(pred)

time_pred = pred[0]  # probabillity distribution for the prediction for the first character in the sequence (first timestep)
print(time_pred.shape)
print(time_pred)

(64, 100, 65)
(100, 65)
tf.Tensor(
[[ 6.1521893e-03  3.7606128e-03 -8.3994935e-04 ... -2.2678513e-03
   2.8230438e-03  5.0888341e-03]
 [ 5.8572795e-03  5.2563050e-03  5.6081782e-03 ... -2.4424936e-03
  -7.1421098e-03  6.9877715e-05]
 [ 2.3559784e-03  2.4909736e-03  8.3274571e-03 ...  3.3260849e-03
  -2.5877107e-03  4.3944325e-03]
 ...
 [ 5.5909844e-04 -4.5386464e-03 -1.1454150e-04 ... -1.4851980e-03
   2.2543040e-03 -2.2659467e-03]
 [ 2.9147640e-03 -2.4964965e-03 -5.4207887e-04 ... -1.6330150e-03
   1.8414885e-03  3.9077187e-03]
 [ 8.1501398e-03  3.9647724e-03  1.3153135e-03 ... -1.2689475e-03
   4.3814261e-03  4.4938019e-03]], shape=(100, 65), dtype=float32)
(65,)
tf.Tensor(
[ 0.00615219  0.00376061 -0.00083995  0.00180282  0.00013687  0.0022212
 -0.00454704 -0.00905489 -0.00017375  0.00349678 -0.00064473 -0.00045719
  0.00163089 -0.00304316 -0.00170851  0.00442004 -0.00225726  0.001726
  0.00239011  0.00727295 -0.00191833 -0.0019024   0.00155215 -0.0014876
 -0.00635435  0.00471327 -0

In [7]:
# Find the most confident prediction for each timestep:
sampled_indeces = tf.random.categorical(pred, num_samples=1) # sample the output distribution (not the highest probability, that's not good for loss function)

# reshape and convert to integers
sampled_indeces = np.reshape(sampled_indeces, (1,-1))[0]
predicted_chars = int_to_text(sampled_indeces)

predicted_chars

".CJtH'leWh!HB BNl\ngUZQtDp3TC&VlljYQQ&.&ORuhigyoIuA-cnY:y-JUnfUJSL!lYPdzP-FNHdSzDfRSm;I-l:3NjNtvViGl3"

In [8]:
def loss(labels, logits): # logits is probability distributions
    # use built in function to find how different the predicted values were from the labels
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

### Compile Model

In [9]:
model.compile(optimizer='adam', loss=loss, metric=['accuracy'])

#### Set Checkpoints
- allow model to save checkpoints as it trains
- can go back and load model from checkpoint and continue training it

In [12]:
# Directory to save checkpoints
checkpoint_dir = './training_checkpoints'
# name of files
checkpoint_prefix = os.path.join(checkpoint_dir, 'chpt_{epoch}')

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)

### Train Model

In [None]:
# keras.backend.clear_session() # Clears any training to model (destroys current graph and creates new one)

In [11]:
history = model.fit(data, epochs=1, callbacks=[checkpoint_callback], )



## Loading the Model
- Rebuild the model from checkpoint using a batch size of 1 to feed one pice of test to the model to test it
- then, we can load weights from the latest checkpoint to retrain the model

In [14]:
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size = 1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))      # expect input of one and unknown second dimension (sequence length)

### Generate Text

In [13]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 800

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)  # turns single list [1,2,3] into double list [[1,2,3]], as expected by model (single element batch)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
    
      predictions = tf.squeeze(predictions, 0)  # remove exterior dimension of prediction - turns [[1,2,3]] into [1,2,3]

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()  # 

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [15]:
generate_text(model, "romeo")

"romeour, a sealt an wnos, theel'-fost derprcer apminging yo, wet andthe,\nWhest me wis not for sot devame, Bethitquaed rath.\n\nCARIUTI IUCISRE\nID:\nNave the praikit weat of int eanend.\n\nPatore,\nAnd nate and veikntowand,\n: mane not e emaligast bu ge a mes ore priathy thom, sull co murs, miks, what swacling fout you groukly minde youl lomelavi, fory thimy formy,\nand of will som wilg hissing meopt,\nIs nom bgenutndioks: my gaidessnted sond, my is itcengy, I withan; now hak' notheinth brilimdmy, by marderou; th thiallor wongs, whous taeringt\nAf phende bede yours hus cryoling sind dost abusith, of not in woth that in thagh\nI dazentive,\nHome, swe greead, Magiss,\nAnath not andereen'g tor bug thar suthts conest,\nI'd if sorou wuy pepe,\nEnd in you grond.\n\nDOMOGILUS:\nSitt s and hor Ry\nChan, ane fore cinty tho"