# Anna KaRNNa

In this notebook, we'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.

In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()

vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}

int_to_vocab = dict(enumerate(vocab))

encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

Let's check out the first 100 characters, make sure everything is peachy. According to the [American Book Review](http://americanbookreview.org/100bestlines.asp), this is the 6th best first line of a book ever.

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

And we can see the characters encoded as integers.

In [4]:
encoded[:100]

array([31, 64, 57, 72, 76, 61, 74,  1, 16,  0,  0,  0, 36, 57, 72, 72, 81,
        1, 62, 57, 69, 65, 68, 65, 61, 75,  1, 57, 74, 61,  1, 57, 68, 68,
        1, 57, 68, 65, 67, 61, 26,  1, 61, 78, 61, 74, 81,  1, 77, 70, 64,
       57, 72, 72, 81,  1, 62, 57, 69, 65, 68, 81,  1, 65, 75,  1, 77, 70,
       64, 57, 72, 72, 81,  1, 65, 70,  1, 65, 76, 75,  1, 71, 79, 70,  0,
       79, 57, 81, 13,  0,  0, 33, 78, 61, 74, 81, 76, 64, 65, 70])

Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text.  Here's how many 'classes' our network has to pick from.

In [5]:
len(vocab)

83

## Making training mini-batches

Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:

<img src="assets/sequence_batching@1x.png" width=500px>


<br>

We start with our text encoded as integers in one long array in `encoded`. Let's create a function that will give us an iterator for our batches. I like using [generator functions](https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/) to do this. Then we can pass `encoded` into this function and get our batch generator.

The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the total number of batches, $K$, we can make from the array `arr`, you divide the length of `arr` by the number of characters per batch. Once you know the number of batches, you can get the total number of characters to keep from `arr`, $N * M * K$.

After that, we need to split `arr` into $N$ sequences. You can do this using `arr.reshape(size)` where `size` is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (`batch_size` below), let's make that the size of the first dimension. For the second dimension, you can use `-1` as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$.

Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the $N \times (M * K)$ array. For each subsequent batch, the window moves over by `n_steps`. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. 

In [6]:
def get_batches(arr, batch_size, n_steps):
    '''Create a generator that returns batches of size
       batch_size x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    characters_per_batch = batch_size*n_steps
    n_batches = len(arr)//characters_per_batch
    
    # Keep only enough characters to make full batches
    arr = arr[:characters_per_batch*n_batches]
    
    # Reshape into batch_size rows
    arr = arr.reshape((batch_size, n_steps*n_batches))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y_temp = arr[:, n+1:n+n_steps+1]
        
        # For the very last batch, y will be one character short at the end of 
        # the sequences which breaks things. To get around this, I'll make an 
        # array of the appropriate size first, of all zeros, then add the targets.
        # This will introduce a small artifact in the last batch, but it won't matter.
        y = np.zeros(x.shape, dtype=x.dtype)
        y[:,:y_temp.shape[1]] = y_temp
        yield x, y

Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.

In [7]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)

In [8]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[31 64 57 72 76 61 74  1 16  0]
 [ 1 57 69  1 70 71 76  1 63 71]
 [78 65 70 13  0  0  3 53 61 75]
 [70  1 60 77 74 65 70 63  1 64]
 [ 1 65 76  1 65 75 11  1 75 65]
 [ 1 37 76  1 79 57 75  0 71 70]
 [64 61 70  1 59 71 69 61  1 62]
 [26  1 58 77 76  1 70 71 79  1]
 [76  1 65 75 70  7 76 13  1 48]
 [ 1 75 57 65 60  1 76 71  1 64]]

y
 [[64 57 72 76 61 74  1 16  0  0]
 [57 69  1 70 71 76  1 63 71 65]
 [65 70 13  0  0  3 53 61 75 11]
 [ 1 60 77 74 65 70 63  1 64 65]
 [65 76  1 65 75 11  1 75 65 74]
 [37 76  1 79 57 75  0 71 70 68]
 [61 70  1 59 71 69 61  1 62 71]
 [ 1 58 77 76  1 70 71 79  1 75]
 [ 1 65 75 70  7 76 13  1 48 64]
 [75 57 65 60  1 76 71  1 64 61]]


If you implemented `get_batches` correctly, the above output should look something like 
```
x
 [[55 63 69 22  6 76 45  5 16 35]
 [ 5 69  1  5 12 52  6  5 56 52]
 [48 29 12 61 35 35  8 64 76 78]
 [12  5 24 39 45 29 12 56  5 63]
 [ 5 29  6  5 29 78 28  5 78 29]
 [ 5 13  6  5 36 69 78 35 52 12]
 [63 76 12  5 18 52  1 76  5 58]
 [34  5 73 39  6  5 12 52 36  5]
 [ 6  5 29 78 12 79  6 61  5 59]
 [ 5 78 69 29 24  5  6 52  5 63]]

y
 [[63 69 22  6 76 45  5 16 35 35]
 [69  1  5 12 52  6  5 56 52 29]
 [29 12 61 35 35  8 64 76 78 28]
 [ 5 24 39 45 29 12 56  5 63 29]
 [29  6  5 29 78 28  5 78 29 45]
 [13  6  5 36 69 78 35 52 12 43]
 [76 12  5 18 52  1 76  5 58 52]
 [ 5 73 39  6  5 12 52 36  5 78]
 [ 5 29 78 12 79  6 61  5 59 63]
 [78 69 29 24  5  6 52  5 63 76]]
 ```
 although the exact numbers will be different. Check to make sure the data is shifted over one step for `y`.

### LSTM Cell

Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.

* https://zhuanlan.zhihu.com/p/58854907
* https://colah.github.io/posts/2015-08-Understanding-LSTMs/


Below, we implement the `build_lstm` function to create these LSTM cells

In [22]:
from tensorflow.keras import layers
from tensorflow import keras
from tensorflow.keras.layers import LSTMCell, StackedRNNCells

inputs = layers.Input(shape=(None,))

embed = layers.Embedding(len(vocab), 128)(inputs)


cells = StackedRNNCells([LSTMCell(128, dropout=0.3) for _ in range(2)])
lstm_layer = layers.RNN(cells, return_sequences=True)(embed)


dense = layers.Dense(len(vocab), activation=None)(lstm_layer)

model = keras.Model(inputs=inputs, outputs=dense, name="test")


model.summary()

Model: "test"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_8 (InputLayer)         [(None, None)]            0         
_________________________________________________________________
embedding_7 (Embedding)      (None, None, 128)         10624     
_________________________________________________________________
rnn_2 (RNN)                  (None, None, 128)         263168    
_________________________________________________________________
dense_6 (Dense)              (None, None, 83)          10707     
Total params: 284,499
Trainable params: 284,499
Non-trainable params: 0
_________________________________________________________________


In [24]:

optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)

epochs = 5
batch_size = 100         # Sequences per batch
num_steps = 50          # Number of sequence steps per batch
lstm_size = 128         # Size of hidden layers in LSTMs
learning_rate = 0.01    # Learning rate
keep_prob = 0.5         # Dropout keep probability

for e in range(epochs):
    
    for x, y in get_batches(encoded, batch_size, num_steps):
        
        with tf.GradientTape() as tape:
            logits = model(x, training=True)
            
            y_one_hot = tf.one_hot(y, len(vocab))
            # print(logits.get_shape(), y_one_hot.get_shape())
            y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
            
            # Softmax cross entropy loss
            loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
            loss = tf.reduce_mean(loss)

        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

        # print(r'Training loss: {:.4f}... '.format(loss))

    print ('Epoch {} finished with {} loss'.format(e+1, loss))

Training loss: 1.6266... 
Training loss: 1.5438... 
Training loss: 1.6189... 
Training loss: 1.5267... 
Training loss: 1.5565... 
Training loss: 1.5396... 
Training loss: 1.5584... 
Training loss: 1.5891... 
Training loss: 1.5826... 
Training loss: 1.5596... 
Training loss: 1.6058... 
Training loss: 1.5864... 
Training loss: 1.5556... 
Training loss: 1.5472... 
Training loss: 1.5235... 
Training loss: 1.5218... 
Training loss: 1.5631... 
Training loss: 1.5439... 
Training loss: 1.5469... 
Training loss: 1.5719... 
Training loss: 1.7623... 
Training loss: 1.5861... 
Training loss: 1.5382... 
Training loss: 1.5452... 
Training loss: 1.5899... 
Training loss: 1.5573... 
Training loss: 1.5645... 
Training loss: 1.5825... 
Training loss: 1.5468... 
Training loss: 1.5761... 
Training loss: 1.5898... 
Training loss: 1.5286... 
Training loss: 1.5049... 
Training loss: 1.5185... 
Training loss: 1.5368... 
Training loss: 1.5670... 
Training loss: 1.5606... 
Training loss: 1.5811... 
Training los

#### Saved checkpoints

Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables

In [25]:
model.save("Anna_KaRNNA.model")





INFO:tensorflow:Assets written to: Anna_KaRNNA.model\assets


INFO:tensorflow:Assets written to: Anna_KaRNNA.model\assets


## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. 

Since getting the argmax gives a looping like situation, using a randomization

In [80]:
string = "The world is "
stringEnc = [vocab_to_int[i] for i in string]
temp = model(np.array(stringEnc).reshape(1,len(string),1))

In [81]:
[int_to_vocab[np.argmax(temp[-1][-1].numpy())]]

['a']

## Not the Best way to do the prediction, but way easier, xD

In [115]:
string = "What is going"
stringEnc = [vocab_to_int[i] for i in string]
temp = model(np.array(stringEnc).reshape(1,len(string),1))

char_index_to_append = np.argmax(temp[-1][-1].numpy())

for i in range(1000):
    stringEnc.append(char_index_to_append)

    if(len(stringEnc)<50):
        temp = model(np.array(stringEnc).reshape(1, len(stringEnc), 1))
    else:
        temp = model(np.array(stringEnc[-50:]).reshape(1, 50, 1))


    predicted_ids = tf.random.categorical(temp[-1], num_samples=1)
    char_index_to_append = tf.squeeze(predicted_ids, axis=-1).numpy()[-1]


In [116]:
for i in stringEnc:
    print(int_to_vocab[i], end="")

What is going away, stegnious of the new, in the cuntures of incompared Sviazhsky placed.

They were sibesry.'"

