# Recurrent Neural Networks (RNN)

More capable in processing sequential data like text

Commonly used for **natural language processing**

### Internal Loop
- Recurrent Neural Networks does not process entire data at once - processes at different time steps
    - for text, feed one word at a time
- Model maintains an internal memory - remembers what it has seen previously
- Types of RNN layers:

#### Simple RNN Layer
- Data is passed as sequence
- There is a recurrent layer in the network, that has a loop back to itself
- The recurrent layer for input at time 1 has an input from the layer of the input at time 0, from the previous step
    - At time step 0, the only input is the input data, x<sub>0</sub>, and it produces an output h<sub>0</sub>
    - At time step 1, the input to the recurrent layer is x<sub>1</sub> as well as h<sub>0</sub>, which produces an output h<sub>1</sub>
    - this repeats for the next time step 
- Each time step builds on everything seen before
- issue: for long sequences, the impact of the older timestep inputs can be lost, because only the most recent timestep is fed back

#### LSTM layer
- Long-Short Term Memory
- Allows the model to remember the output state at any time in the past

## Data

### Sequence Data
- Long chains of text, weather patterns, videos, or anything where the notion of a step of time is relevent
- The order of the data is important to keep track of

### Textual Data
- A type of sequence data
- need to encode the text into numrical data that can be fed to the neural network
- There are different methods of doing this:

    #### Bag of Words
    - Look at entire training dataset and create a dictionary of the vocabalary
        - every unique word is the vocabulary
        - some integer represents each word
    - keep track of the frequency of each word in a sentence
    - flawed method because the order of words is lost - only keeps frequency and what words they are
    
    #### Word Embedding
    - Tries to represent similar words with similar numbers
    - classify each word in n dimensional vectors (usually 64 or 128)
        - vector tells how similar word is to other words
        - the words "good" and "happy" will be represent by vectors with a small angle between them
        - opposite words will have very different vectors
    - word embedding is implemented in a layer in the neural netword
        - model learns word embeddings through the context of the words in the sentence
    - can use pretrained word embedding layers

## Sentiment Analysis
Analyze how positive or negative a piece of text is

### Movie Review Dataset
- IMDB movie review dataset from keras
- contains 25,000 movie reviews
- reviews are preprocessed and have labels as either positive or negative
    - each review is encoded by integers that represent how common the word is in the entire dataset
    - a word encoded by integer 3 is the 3rd most common word in the dataset

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence
import os
import numpy as np

In [None]:
# Load data
from tensorflow.keras.datasets import imdb

VOCAB_SIZE = 88584      # Number of unique words in this dataset

MAX_LEN = 250           # Max word length of review we will use

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=VOCAB_SIZE) # include all of the words

### Preprocessing
- Need to make all samples the same length of words
- if review is greater than 250 words, trim off extra words
- if review is less than 250 words, add 0s to make it 250 (padding to the left)

In [None]:
train_data = sequence.pad_sequences(train_data, MAX_LEN)
test_data = sequence.pad_sequences(test_data, MAX_LEN)

### Create the Model
- First layer is the embedding layer to find a meaningful representation of numbers
- Second layer is the LSTM feedback layer
- Third layer is a Dense classification layer - sigmoid activation to get probabilty of positive or negative

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, 32),      # Embedding layer - words are going to represented as 32 dimen vectors
    tf.keras.layers.LSTM(32),                       # LSTM feedback layer - input is 32 dimensions per word
    tf.keras.layers.Dense(1, activation='sigmoid')  # One output neuron
])

model.summary()

### Train Model

In [None]:
model.compile(
    loss="binary_crossentropy",
    optimizer="rmsprop",
    metrics=['accuracy']
)

history = model.fit(
    train_data, 
    train_labels, 
    epochs=10, 
    validation_split=0.2    # Validate with 20% of data
)

In [None]:
# Test model
results = model.evaluate(test_data, test_labels)
print(f"Accuracy: {results[1]*100:0.2f}%")

### Make Prediction
- Need to preprocess any reviews in same method that original data was encoded in

In [None]:
word_index = imdb.get_word_index()

def encode_text(text):
    tokens = tf.keras.preprocessing.text.text_to_word_sequence(text)
    tokens = [word_index[word] if word in word_index else 0 for word in tokens]
    return sequence.pad_sequences([tokens], MAX_LEN)[0] # returns list of lists, get first one

def predict(text):
    encoded_text = encode_text(text)
    pred = np.zeros((1,MAX_LEN))    # shape of input is 1 review with MAX_LEN (250) words
    pred[0] = encoded_text
    result = model.predict(pred)[0][0]
    if result >= 0.5:
        print(f"Predicted: Positive")
    else:
        print(f"Predicted: Negative")

review = "That movie was so awesome! I really loved it and would watch it again because it was amazingly great"
predict(review)

review = "That movie sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched"
predict(review)
    

## Character Generation
Generate the next characters in a sequence of text

### RNN Play Generator
- Show the neural network an example o something for it create until it learn to write it iteself
- Use character predictive model that will take a variable length input sequence and predict the next character
- Using the model many times in a row with the previous output from the last prediction can generate a sequence

### Dataset
- Romeo and Juliet dataset from keras

In [None]:
from tensorflow.keras.preprocessing import sequence
import tensorflow.keras
import tensorflow as tf
import os
import numpy as np
import requests

# Load dataset
response = requests.get("https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt")
text = response.content.decode(encoding='utf-8')

### Encoding
Encode each character in text with an integer

In [None]:
vocab = sorted(set(text))   # get unique characters

# Create encode mapping

char2idx = {u:i for i,u in enumerate(vocab)}
idx2char = np.array(vocab)

# Convert text to integer encoding

def text_to_int(text):
    return np.array([char2idx[c] for c in text])

text_as_int = text_to_int(text)

### Create Training Examples

Need to split text into shorter sequences to pass to model as training examples

Input will be an *n* length sequence and output will be an *n* length sequence which is the input shifted once letter to the right
- EX: input: Hell -> output: ello

In [None]:
seq_len = 100   # length of each training example
examples_per_epoch = len(text) // (seq_len)
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)
EMBEDDING_DIM = 256     # Dimensions of embedded encoding of word vectors
RNN_UNITS = 1024        # 
BUFFER_SIZE = 10000     # Buffer to use during shuffling

char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)  # create character dataset from text as integer

# Batch character dataset into 101 size batches, drop the extra text at the end
sequences = char_dataset.batch(seq_len+1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]     # Get all but last character (chars 0-100)
    target_text = chunk[1:]     # Get all but first character (chars 1-101) - the value to predict
    return input_text, target_text

dataset = sequences.map(split_input_target)     # split each entry in dataset

# Make Batches for final training sequence
data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

### Building the Model
Use an embedding layer, and LSTM layer, and a dense layer that contains a node for each unique character it can choose from

In [None]:
def built_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(
            vocab_size, 
            embedding_dim, 
            batch_input_shape=[batch_size, None]        # size BATCH_SIZE x None : None -> don't know length of input sequence when making predictions later
        ),
        tf.keras.layers.LSTM(
            rnn_units, 
            return_sequences=True,      # Return the intermediate stage at every step - want to see intermediate steps, not just final stage
            stateful=True,
            recurrent_initializer='glorot_uniform'  # starting values in LSTM
        ),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = built_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()