## **Language Modeling with RNN using Low Level Implementation**
> Here we are implementing character level RNN. There are two parts of this program
*  Custom training loop
*  Independent text generation 

Our design choices for this implementation are :- 
> * Synchronous RNN architecture
* Hidden to Hidden connection
* Character sequence of 200
* Batch size 128
* Number of hidden layer unit is 512
* Epochs 30
* Softmax cross entropy loss function
* Backpropagation through entire sequence
* Adam optimizer

we haven't deviated from what was prescribed in the assignment 

For the generation part, we have used random choice of character based the output probability(softmax) rather than argmax, as this will fail to generate random text and produce same output for same input over and over. This was then passed as input for the next time stamp as well. We have generated 1000 characters.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import os
os.chdir('/content/drive/My Drive/assign5/')
os.getcwd()

'/content/drive/My Drive/assign5'

In [None]:
!python prepare_data.py Input.txt skp

2020-05-25 02:04:48.065686: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Split input into 22981 sequences...
Serialized 100 sequences...
Serialized 200 sequences...
Serialized 300 sequences...
Serialized 400 sequences...
Serialized 500 sequences...
Serialized 600 sequences...
Serialized 700 sequences...
Serialized 800 sequences...
Serialized 900 sequences...
Serialized 1000 sequences...
Serialized 1100 sequences...
Serialized 1200 sequences...
Serialized 1300 sequences...
Serialized 1400 sequences...
Serialized 1500 sequences...
Serialized 1600 sequences...
Serialized 1700 sequences...
Serialized 1800 sequences...
Serialized 1900 sequences...
Serialized 2000 sequences...
Serialized 2100 sequences...
Serialized 2200 sequences...
Serialized 2300 sequences...
Serialized 2400 sequences...
Serialized 2500 sequences...
Serialized 2600 sequences...
Serialized 2700 sequences...
Serialized 2800 sequences...
Serialized 2900

Preparing data set

In [None]:
from prepare_data import parse_seq
import pickle
import tensorflow as tf
import pandas as pd
import numpy as np

# this is just a datasets of "bytes" (not understandable)
data = tf.data.TFRecordDataset("skp.tfrecords")

# this maps a parser function that properly interprets the bytes over the dataset
# (with fixed sequence length 200)
# if you change the sequence length in preprocessing you also need to change it here
data = data.map(lambda x: parse_seq(x, 200))

# a map from characters to indices
vocab = pickle.load(open("skp_vocab", mode="rb"))
vocab_size = len(vocab)
# inverse mapping: indices to characters
ind_to_ch = {ind: ch for (ch, ind) in vocab.items()}
print(vocab)
print(vocab_size)

{'Q': 1, 'l': 2, 'y': 3, '?': 4, 'S': 5, 'B': 6, 'W': 7, 'n': 8, 'f': 9, 'z': 10, 'A': 11, 'q': 12, 'G': 13, "'": 14, 'Z': 15, 'P': 16, 'o': 17, ':': 18, 'E': 19, ',': 20, 'Y': 21, ']': 22, 'x': 23, 'R': 24, '[': 25, 'I': 26, 'L': 27, 'F': 28, 't': 29, 'd': 30, ' ': 31, '-': 32, 'c': 33, 'e': 34, 'N': 35, 'u': 36, 'a': 37, 'T': 38, 'X': 39, 'j': 40, 'H': 41, '&': 42, '3': 43, 'h': 44, 'V': 45, 'v': 46, '$': 47, 'U': 48, 'g': 49, 'C': 50, 'D': 51, 's': 52, 'p': 53, 'i': 54, '\n': 55, ';': 56, 'm': 57, 'b': 58, 'M': 59, 'r': 60, 'O': 61, 'w': 62, 'k': 63, 'K': 64, '.': 65, '!': 66, 'J': 67, '<S>': 0}
68


RNN low level architecture implementation and custom training loop using Gradient Tape. Trained weights are stored after training

In [None]:
def rnn(learning_rate, seqlen, epochs):
  hidden_size = 512
  opt = tf.keras.optimizers.Adam()
  W_xh = tf.Variable((np.random.randn(vocab_size, hidden_size)).astype('float32')*0.01)
  W_hh = tf.Variable((np.random.randn(hidden_size,hidden_size)).astype('float32')*0.01)
  W_ho = tf.Variable((np.random.randn(hidden_size,vocab_size)).astype('float32')*0.01)
  b_h = tf.Variable(np.zeros((1, hidden_size), dtype= 'float32'))
  b_o = tf.Variable(np.zeros((1, vocab_size), dtype= 'float32'))
  h_init = tf.Variable((np.random.randn(1,hidden_size)).astype('float32')*0.01)
  for epoch in range(epochs):
    batch_count = 0
    for batch_x in data.shuffle(100000).batch(128, drop_remainder=True):
      print("new Batch",batch_count)
      x_seq = batch_x
      seq_loss = 0
      with tf.GradientTape() as tape:
        for i in range(1,seqlen):
          time_step = i-1
          x1 = tf.one_hot(x_seq, vocab_size)
          y = x_seq[:,time_step+1]
          y1 = tf.one_hot(y, vocab_size)
          x_seq_t = x1[:,time_step,:]
          h_init = tf.matmul(x_seq_t, W_xh)+tf.matmul(h_init, W_hh)+b_h
          h_init = tf.nn.tanh(h_init)
          logits = tf.matmul(h_init, W_ho) + b_o
          loss_char = tf.nn.softmax_cross_entropy_with_logits(labels = y1, logits = logits)
          mean = tf.reduce_mean(loss_char)
          seq_loss += mean
      grads = tape.gradient(seq_loss, [W_xh, W_hh, W_ho, b_h, b_o])
      opt.apply_gradients(zip(grads, [W_xh, W_hh, W_ho, b_h, b_o]))
      '''Wih.assign_sub(grads[0]*learning_rate)
      Whh.assign_sub(grads[1]*learning_rate)
      Wyh.assign_sub(grads[2]*learning_rate)
      bh.assign_sub(grads[3]*learning_rate)
      bo.assign_sub(grads[4]*learning_rate)'''
      print("Epoch :{}, Average error: {} ".format(epoch,seq_loss/seqlen))
      batch_count +=1
  Trained_weights = np.array([W_xh, W_hh, W_ho, b_h, b_o])
  np.save('rnn_weights', Trained_weights)

In [None]:
rnn(0.001, 200, 30)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch :16, Average error: 1.7343926429748535 
new Batch 7
Epoch :16, Average error: 1.7362278699874878 
new Batch 8
Epoch :16, Average error: 1.7298883199691772 
new Batch 9
Epoch :16, Average error: 1.7321975231170654 
new Batch 10
Epoch :16, Average error: 1.7282882928848267 
new Batch 11
Epoch :16, Average error: 1.7484843730926514 
new Batch 12
Epoch :16, Average error: 1.7142646312713623 
new Batch 13
Epoch :16, Average error: 1.719982624053955 
new Batch 14
Epoch :16, Average error: 1.7411004304885864 
new Batch 15
Epoch :16, Average error: 1.7304999828338623 
new Batch 16
Epoch :16, Average error: 1.7120027542114258 
new Batch 17
Epoch :16, Average error: 1.7301477193832397 
new Batch 18
Epoch :16, Average error: 1.7493270635604858 
new Batch 19
Epoch :16, Average error: 1.7352381944656372 
new Batch 20
Epoch :16, Average error: 1.7105247974395752 
new Batch 21
Epoch :16, Average error: 1.7346972227096558 
new Batc

In [None]:
# Text generation code
weights = np.load('rnn_weights.npy', allow_pickle = True)
[W_xh, W_hh, W_ho, b_h, b_o] = weights
h_init = tf.Variable((np.random.randn(1,512)).astype('float32')*0.01)
char = np.zeros((1,vocab_size),dtype= 'float32')
char[0][0] = 1
finalString = ""

for i in range (1000):
  h_init = tf.matmul(char, W_xh)+tf.matmul(h_init, W_hh)+b_h
  h_init = tf.nn.tanh(h_init)
  logits = tf.matmul(h_init, W_ho)+b_o
  prob = np.exp(logits) / np.sum(np.exp(logits))
  # Random choice of output character based on output discrete probability
  idx = np.random.choice(range(vocab_size), p=prob.ravel())
  char = np.zeros((1,vocab_size),dtype= 'float32')
  char[0][idx] = 1
  op = ind_to_ch[idx]
  finalString += op
print(finalString)  

AD your unfeid
With kick and word, I come shorted spirith,
Hush te pue bowing them tham suppoud within his hand.

EMILIA:
There is anymudes, yourmed foul will faids a mortliss in a ridss:
As seek in goods, or she was far sire; all thing for daring forms, and go here
I'll come an homity.

KING EDWARD:

ICLUSDO:
I'll mus, that hady Marduse then with gured your heart as as the solding with truth them

EPHELUS:
He hath your monise,' it works or all rendex acy'd as doested or my abty'd her shame.

BOOTAND:

CASTARD:
I, my termand; you would beft,
And pray ady ords. Turden,
You ir, that the much againfessers, forsalians
A selold,
That it a were,
This adven,
And for your himest
Wither, and so thou me wray you.

ARLIAN:
What, thou thristed of his mansty you!

KATRICE:
Which to the cockence he dozands I much
ascit curl.
Then evore to be slasen'd Richard be betait?

LUCIO:
Come
I'wn Came that you never-way;
And I have there, as too; but then arm blairs no more orsuloes and stambe, o'
Clare man y