## Tensorflow Basic RNN Cell usage

In [1]:
import tensorflow as tf
import numpy as np
state_size = 37 # (parameter for the RNN cell; describes how wide the cell should be)
embedding_size = 27 #(parameter for embedding layer; describes the size of embedding for each token)

### Placeholder for input and target

In [2]:
ph_xs = tf.placeholder(shape=[None, None], dtype=tf.int32)
ph_ys = tf.placeholder(shape=[None, None], dtype=tf.int32)
ph_init_state = tf.placeholder(shape=[None, state_size], dtype=tf.float32, name="initial_state")
ph_batch_size = tf.placeholder(dtype=tf.int32)

### Data Acquisition
The data in the files were populated using the *Data_Provider* notebook. *train_x* and *train_y* are both arrays of shape: [10,200], where 200 is a fixed sequence length for this model and 10 is the total number of samples we have available. 

In [3]:
train_x = np.load("train_x.npy")
train_y = np.load("train_y.npy")

def get_train_batches(train_x, train_y, batch_size):
    for i in range(0, train_x.shape[0], batch_size):    
        yield train_x[i : i+batch_size], train_y[i : i+batch_size]

In [4]:
# Import some essential utilites from Data_Provider.py notebook
import pickle
vocab_to_int = pickle.load(open('vocab_to_int.txt','rb'))
int_to_vocab = pickle.load(open('int_to_vocab.txt','rb'))

### Embedding
Given an input data of shape: [batch_size, seq_length], rnn_inputs returns a tensor of shape: [batch_size, seq_length, embedding_size]. This is purely driven by our decision to use an embedding matrix to represent each token (characters). Otherwise, we would've had to one-hot encode each character, and modify our input layer. I'm not sure what the complexity of this task would need to be, but using embedding matrices seem to be the preferred approach in most NLP applications of neural network. I think it massively improves the underlying computational complexity as well.

In [5]:
# number of unique characters.. you'd normally do this by inspecting the data directly
num_classes = 83
embeddings = tf.Variable(initial_value=tf.random_normal(mean=0., stddev=0.1, shape=[num_classes,embedding_size]))
rnn_inputs = tf.nn.embedding_lookup(embeddings, ph_xs)

### Create the RNN layer using the Tensorflow BasicRNNCell API

In [6]:
rnn_cell = tf.contrib.rnn.BasicRNNCell(state_size)

In [7]:
initial_state = rnn_cell.zero_state(ph_batch_size, tf.float32)
outputs, final_state = tf.nn.dynamic_rnn(rnn_cell, rnn_inputs, initial_state=initial_state)

### Output from RNN Cell
- outputs:  [batch_size, Seq_length, state_size]
We need to collapse this to [batch_size x seq_length, state_size] to further process it.

- final_state:  [batch_size, state_size] 
The state is returned only for the *final* time-step. This is what we're interested in primarily during text generation.


In [8]:
outputs_reshaped = tf.reshape(outputs, [-1, state_size])

In [10]:
logits = tf.layers.dense(inputs=outputs_reshaped, units=num_classes, activation=None)
# The shape of predictions will be: [seqlen * batch_size, num_classes]
predictions = tf.nn.softmax(logits)

In [21]:
print(logits.shape.as_list())

[None, 83]


### Optimization
ph_ys is shaped as follows: [batch_size, seq_length]. In order to use it against logits obtained by multiplying hidden_states (states) and output matrix (V), we need to collapse it into a single dimension.

In [12]:
# basically flatted ys into a flat vector
ph_ys_reshaped = tf.reshape(ph_ys, shape=[-1])

In [22]:
print(ph_ys_reshaped.shape.as_list())

[None]


The tensorflow method: https://www.tensorflow.org/api_docs/python/tf/nn/sparse_softmax_cross_entropy_with_logits
is very interesting. It (the method) will basically look at (the integer representation of) each character and evaluate a one-hot representation based on its label. Thus, ph_ys_reshaped, of size [batch_size * seqlen] will effectively be unrolled into a vector of length [batch_size x seqlen x num_classes]. The method then uses this unrolled vector to evaluate the loss against *logits*

In [19]:
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=ph_ys_reshaped, logits=logits)
loss = tf.reduce_mean(losses)
train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)

### Training

In [23]:
batch_size = 100 # Take 100 at a time. We have 10,000 needed, so 100 iterations per epochs
seq_length = 100 # each vector is fixed length of 100 (by inspecting data)
chkpt_path = "ckpts/"

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_loss = 0.
    n_epochs = 5
    for i in range(n_epochs):
        for xs, ys in get_train_batches(train_x, train_y, batch_size=100):
            _, train_loss_val = sess.run([train_op, loss], 
                                         feed_dict={ph_xs: xs, 
                                                    ph_ys: ys,
                                                    ph_init_state: np.zeros([batch_size, state_size]),
                                                    ph_batch_size: batch_size
                                                   })
        print("epochs: ",i, ", loss evaluated: ",train_loss_val)

    # Save the model at the end of the run.
    saver = tf.train.Saver()
    saver.save(sess, chkpt_path+"tensorflow_BasicRNNCell.ckpt", global_step=n_epochs)

epochs:  0 , loss evaluated:  2.32801
epochs:  1 , loss evaluated:  2.14393
epochs:  2 , loss evaluated:  2.05178
epochs:  3 , loss evaluated:  2.01087
epochs:  4 , loss evaluated:  1.98479


### Text Generation

In [24]:
def restore_session(sess):
    ckpt = tf.train.get_checkpoint_state(chkpt_path)
    
    saver = tf.train.Saver()
    if ckpt and ckpt.model_checkpoint_path:
        print("restoring model from ",ckpt.model_checkpoint_path)
        saver.restore(sess, ckpt.model_checkpoint_path)
    return sess

In [25]:
def printChars(chars):
    print('------- Generated Text ----------')
    print(''.join(str(c) for c in chars))
    print('-------      END        ---------')

In [26]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess = restore_session(sess)
    
    current_char = vocab_to_int['g']
    num_chars = 100             # number of char sequence to try generate
    chars = [current_char]     # some input char to get started with text generation
    batch_size = 1             # generate 1 batch a time (i wonder if its possible to specify a larger batch)
    state = np.zeros([batch_size, state_size]) # initial state to start off with
    
    for i in range(num_chars):
        preds, state = sess.run([predictions, final_state], 
                                feed_dict={ph_xs: np.array(current_char).reshape([1,1]),
                                           ph_init_state: state,
                                           ph_batch_size: batch_size})
        current_char = np.random.choice(preds.shape[-1], 1, p=np.squeeze(preds))[0]
        chars.append(int_to_vocab[current_char])
    printChars(chars)

restoring model from  ckpts/tensorflow_BasicRNNCell.ckpt-5
------- Generated Text ----------
21 hotad y we ban'tr
alid ad bfoudaskf he o tinys anerere and abcho
ssktrhawhowhere hSinyesh" g a an w
-------      END        ---------
