### Introduction
TODO: Explain a bit of sentence generation

#### What are RNNs

Feed forward networks with input $x_t = \{x_1, x_2 ... x_n\}$ and output $y_t = \{y_1, y_2 ... y_n\}$ lets us predict y based on x. If x is a time series then all events from $x_1 .. x_{t-1}$ led to $x_t$ and we cannot model this behavior using feed forward network. We are looking to model the function

$
h_t = g_1(x_t, h_{t-1};\theta)\:and\: y_t = g_2(h_t;\gamma)
$

With feed forard network we cannot model this recursive network and thus we need RNNs

#### Basic structure of an RNN

One cell of RNN has an input `x` and generates output `y`. The intemediate state `h` is generated using `x` and using the intermediate state of the previous cell. The output `y` for the current cell is generated using `h` for current state.

Assuming the input dimension is `d`, dimension of the hidden state `h` is `s` and number of output class are `c`. We therefore have 3 sets of weights

- `w_1`: Which has has dimension $s \times d$ and forms the set of weights between the input and hidden state
- `w_2`: Which has dimension $s \times s$ and forms the set of weights between the previous hidden state and current hidden state
- `w_3`: Which has dimension $c \times s$ and forms the weight matrix between the current hidden state and the current output


Using these weights, we have

$h_t = g_1(w_1\cdot x_t + w_2\cdot h_{t-1}) \:and\: y_t = g_2(w_3\cdot h_t)$

where $g_1$ and $g_2$ are some activation functions.


#### Backpropagation in RNN

Normal backpropagation in RNN as in a feedforward network doesnt work. This especially fails when we want to calculate the partial derivative of loss `L` by $w_2$

Suppose the true label is `l`, then the derivative 

$\frac{\partial{L}}{\partial{w_2}} = \frac{\partial{L}}{\partial{y}} \frac{\partial{y}}{\partial{h}} \frac{\partial{h}}{\partial{w_2}}$

The term $\frac{\partial{h}}{\partial{w_2}}$ is tricky as $\frac{\partial{h}}{\partial{w_2}} = \frac{\partial{(w_1\cdot x + w_2\cdot h)}}{\partial{w_2}}$ and as we can see its recursive

The solution is then to truncate derivative to some T steps back in time and not all the way to beginning.

#### Implementing sentence generation in Tensorflow using RNN

First we will download the corpus from [https://www.cs.cmu.edu/~spok/grimmtmp/](https://www.cs.cmu.edu/~spok/grimmtmp/)


In [13]:
import os
from urllib.request import urlretrieve

def maybe_download(target_url, target_dir, target_file):
    if not os.path.exists(target_dir):
        os.mkdir(target_dir)
     
    target_file = os.path.join(target_dir, target_file)
    if os.path.exists(target_file):
        print('File', target_file, 'exists, skipping download')
    else:
        print('Downloading from', target_url, 'to', target_file)
        urlretrieve(target_url, target_file)
        
        
target_dir = 'fairytales'
base_url = 'https://www.cs.cmu.edu/~spok/grimmtmp/'

for i in range(1, 101):
    file_name = format(i, '03d') + '.txt'
    maybe_download(base_url +  file_name, target_dir, file_name)


Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/001.txt to fairytales/001.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/002.txt to fairytales/002.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/003.txt to fairytales/003.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/004.txt to fairytales/004.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/005.txt to fairytales/005.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/006.txt to fairytales/006.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/007.txt to fairytales/007.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/008.txt to fairytales/008.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/009.txt to fairytales/009.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/010.txt to fairytales/010.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/011.txt to fairytales/011.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/012.txt to

Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/098.txt to fairytales/098.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/099.txt to fairytales/099.txt
Downloading from https://www.cs.cmu.edu/~spok/grimmtmp/100.txt to fairytales/100.txt


Next we will be reading all files in the directory and convert them to list of strings, which then we will split into bigrams

In [42]:
def read_as_bigrams(file_name):
    with open(file_name) as f:
        content = f.read().lower()
        
    return [content[i:i + 2] for i in range(0, len(content) - 2, 2)]
    
bigrams = [read_as_bigrams(os.path.join(target_dir, f)) for f in os.listdir(target_dir)]

print('first 10 bigrams from first 10 files are\n')
for i in range(10):
    print('File ', (i + 1), ', bigrams[0:10] = ', bigrams[i][0:10])

first 10 bigrams from first 10 files are

File  1 , bigrams[0:10] =  ['in', ' o', 'ld', 'en', ' t', 'im', 'es', ' w', 'he', 'n ']
File  2 , bigrams[0:10] =  ['ha', 'rd', ' b', 'y ', 'a ', 'gr', 'ea', 't ', 'fo', 're']
File  3 , bigrams[0:10] =  ['a ', 'ce', 'rt', 'ai', 'n ', 'fa', 'th', 'er', ' h', 'ad']
File  4 , bigrams[0:10] =  ['th', 'er', 'e ', 'wa', 's ', 'on', 'ce', ' u', 'po', 'n ']
File  5 , bigrams[0:10] =  ['th', 'er', 'e ', 'wa', 's ', 'on', 'ce', ' u', 'po', 'n ']
File  6 , bigrams[0:10] =  ['th', 'er', 'e ', 'wa', 's ', 'on', 'ce', ' a', ' p', 'ea']
File  7 , bigrams[0:10] =  ['th', 'er', 'e ', 'we', 're', ' o', 'nc', 'e ', 'up', 'on']
File  8 , bigrams[0:10] =  ['li', 'tt', 'le', ' b', 'ro', 'th', 'er', ' t', 'oo', 'k ']
File  9 , bigrams[0:10] =  ['th', 'er', 'e ', 'we', 're', ' o', 'nc', 'e ', 'a ', 'ma']
File  10 , bigrams[0:10] =  ['th', 'er', 'e ', 'wa', 's ', 'on', 'ce', ' a', ' m', 'an']


In [64]:
import collections

def build_dataset(documents, threshold = 10):
    # Input
    # documents: List of list of bigrams for each document
    # threshold: The threshold which will classify rare words as UNK
    #
    # returns 
    # 
    # dictionary: Mapping between word and the numeric value for it
    # reverse_dictionary: Mapping numeric value and the corresponding word
    # count: tuples of word and the count
    # data: list for each document where the corresponding word
    
    all_bigrams = []
    for d in documents:
        all_bigrams.extend(d)
        
    print('All bigrams make up', len(all_bigrams), 'words')
    counts = collections.Counter(all_bigrams).most_common()
    #
    dictionary = {word : (i + 1) for i, (word, count) in enumerate(counts) if count > threshold}
    dictionary['UNK'] = 0
    
    reverse_dictionary = {dictionary[k] : k for k in dictionary}
    
    print('Vocabulary is of size', len(reverse_dictionary))
    
    data = [[dictionary[b] if b in dictionary else dictionary['UNK'] for b in doc] for doc in documents]
    
    return dictionary, reverse_dictionary, counts, data
    
dictionary, reverse_dictionary, counts, data = build_dataset(bigrams)
print('5 most common words are', counts[0:5])
print('5 least common words are', counts[-5:])
print('Sample data 0 is', data[0][0:10])
print('Sample data 1 is', data[1][0:10])

All bigrams make up 449177 words
Vocabulary is of size 544
5 most common words are [('e ', 15229), ('he', 15164), (' t', 13443), ('th', 13076), ('d ', 10687)]
5 least common words are [('nm', 1), ('m?', 1), ('\t"', 1), ('\tw', 1), ('tz', 1)]
Sample data 0 is [15, 28, 86, 23, 3, 95, 74, 11, 2, 16]
Sample data 1 is [22, 156, 25, 37, 82, 185, 43, 9, 90, 19]


In [67]:
[reverse_dictionary[i] for i in data[0][25:50]]

['e ',
 'li',
 've',
 'd ',
 'a ',
 'ki',
 'ng',
 '\nw',
 'ho',
 'se',
 ' d',
 'au',
 'gh',
 'te',
 'rs',
 ' w',
 'er',
 'e ',
 'al',
 'l ',
 'be',
 'au',
 'ti',
 'fu',
 'l,']

In [105]:
import numpy as np

class BatchGenerator(object):
    
    def __init__(self, text, vocab_size, batch_size, num_unroll):
        self.vocab_size = vocab_size
        self.text = text
        self._text_size = len(text)
        self.split_size = self._text_size // batch_size
        self.batch_size = batch_size
        self.num_unroll = num_unroll
        self._cursor = [i * self.split_size for i in range(batch_size)]
    
    
    def next_batch(self):
        #
        #
        batch_data = np.zeros(shape = [self.batch_size, self.vocab_size], dtype = np.float32)
        batch_labels = np.zeros(shape = [self.batch_size, self.vocab_size], dtype = np.float32)
        for b in range(self.batch_size):
            
            if self._cursor[b]+1>=self._text_size:
                self._cursor[b] = b * self.split_size
            
            batch_cursor = self._cursor[b]
            batch_data[b, self.text[batch_cursor]] = 1
            batch_labels[b, self.text[batch_cursor + 1]] = 1
            self._cursor[b] += 1
            
        return  batch_data, batch_labels
    
    
    def unroll_batches(self):
        #
        #
        unrolled_batches, unrolled_labels = [], []
        for _ in range(self.num_unroll):
            batch, labels = self.next_batch()
            unrolled_batches.append(batch)
            unrolled_labels.append(labels)
            
        return unrolled_batches, unrolled_labels
        
    def reset(self):        
        self._cursor = [i * self.splits for i in range(self.batch_size)]
        
vocab_size = len(dictionary)
batch_generator = BatchGenerator(data[0][25:50], vocab_size, 5, 5)
print('Input text is', [reverse_dictionary[i] for i in data[0][25:50]])

unrolled_batches, unrolled_labels = batch_generator.unroll_batches()
    
for i, (batch_data, batch_labels)  in enumerate(zip(unrolled_batches, unrolled_labels)):
    print('Batch in Iteration', i)
    print('Input is ',[reverse_dictionary[np.argmax(b)] for b in batch_data])
    print('Label is ',[reverse_dictionary[np.argmax(b)] for b in batch_labels])
    
    

Input text is ['e ', 'li', 've', 'd ', 'a ', 'ki', 'ng', '\nw', 'ho', 'se', ' d', 'au', 'gh', 'te', 'rs', ' w', 'er', 'e ', 'al', 'l ', 'be', 'au', 'ti', 'fu', 'l,']
Batch in Iteration 0
Input is  ['e ', 'ki', ' d', ' w', 'be']
Label is  ['li', 'ng', 'au', 'er', 'au']
Batch in Iteration 1
Input is  ['li', 'ng', 'au', 'er', 'au']
Label is  ['ve', '\nw', 'gh', 'e ', 'ti']
Batch in Iteration 2
Input is  ['ve', '\nw', 'gh', 'e ', 'ti']
Label is  ['d ', 'ho', 'te', 'al', 'fu']
Batch in Iteration 3
Input is  ['d ', 'ho', 'te', 'al', 'fu']
Label is  ['a ', 'se', 'rs', 'l ', 'l,']
Batch in Iteration 4
Input is  ['a ', 'se', 'rs', 'l ', 'be']
Label is  ['ki', ' d', ' w', 'be', 'au']


In [1]:
# Number of timesteps to lookback in time
num_unroll = 50

#Size of hidden dimension
state_size = 64

#Training batch size
batch_size = 64