#### I have a GOT dataset for season 1 and 2
Firstly lets preprocess it, before feeding to our Keras Model

In [6]:
data = open("datasets/got.txt","r",encoding="utf-8").read()
chars = list(set(data)) #total unique characters
VOCAB_SIZE = len(chars)
print(len(data),VOCAB_SIZE)

2000193 80


In [7]:
#initialize Mapping
idx_to_char = {i: char for i, char in enumerate(chars)}
char_to_idx = {char: i for i, char in enumerate(chars)}

In [8]:
import numpy as np
"""
number_of_features = VOCAL_SIZE
length_of_sequence = how many chars, model will look at a time
number_of_sequence = len(data)/length_of_sequence
"""
SEQ_LENGTH = 60 #input sequence length
N_FEATURES = VOCAB_SIZE #one hot encoding here, that's why, but deduplicated for clarity

N_SEQ = int(np.floor((len(data) - 1) / SEQ_LENGTH))

X = np.zeros((N_SEQ, SEQ_LENGTH, N_FEATURES))
y = np.zeros((N_SEQ, SEQ_LENGTH, N_FEATURES))

In [None]:
for i in range(N_SEQ):
    X_sequence = data[i * SEQ_LENGTH: (i + 1) * SEQ_LENGTH]
    X_sequence_ix = [char_to_idx[c] for c in X_sequence]
    input_sequence = np.zeros((SEQ_LENGTH, N_FEATURES))
    for j in range(SEQ_LENGTH):
        input_sequence[j][X_sequence_ix[j]] = 1. #one-hot encoding of the input characters
    X[i] = input_sequence

    y_sequence = data[i * SEQ_LENGTH + 1: (i + 1) * SEQ_LENGTH + 1] #shifted by 1 to the right
    y_sequence_ix = [char_to_idx[c] for c in y_sequence]
    target_sequence = np.zeros((SEQ_LENGTH, N_FEATURES))
    for j in range(SEQ_LENGTH):
        target_sequence[j][y_sequence_ix[j]] = 1. #one-hot encoding of the target characters
    y[i] = target_sequence

#### Ok Now lets create a keras model
1. Model is described below

In [9]:
from keras.models import Sequential
from keras.layers import CuDNNLSTM, TimeDistributed, Dense, Activation
# constant parameter for the model
HIDDEN_DIM = 700 #size of each hidden layer, "each layer has 700 hidden states"
LAYER_NUM = 2 #number of hidden layers, how much were used?
NB_EPOCHS = 50 #max number of epochs to train, "200 epochs"
BATCH_SIZE = 128 
VALIDATION_SPLIT = 0.1 #proportion of the batch used for validation at each epoch

def createModel():
    model = Sequential()
    model.add(LSTM(HIDDEN_DIM, 
               input_shape=(None, VOCAB_SIZE), 
               return_sequences=True))
    for _ in range(LAYER_NUM - 1):
        model.add(LSTM(HIDDEN_DIM, return_sequences=True))
    model.add(TimeDistributed(Dense(VOCAB_SIZE)))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
    return model

In [10]:
def generate_text(model, length):
    ix = [np.random.randint(VOCAB_SIZE)]
    y_char = [idx_to_char[ix[-1]]]
    X = np.zeros((1, length, VOCAB_SIZE))
    for i in range(length):
        X[0, i, :][ix[-1]] = 1.
        ix = np.argmax(model.predict(X[:, :i+1,:])[0], 1)
        y_char.append(idx_to_char[ix[-1]])
    return ''.join(y_char)

In [11]:
from keras.callbacks import EarlyStopping, ModelCheckpoint, Callback
# callback to save the model if better
filepath="tgt_model.hdf5"
save_model_cb = ModelCheckpoint(filepath, monitor='val_acc', verbose=2, save_best_only=True, mode='max')
# callback to stop the training if no improvement
early_stopping_cb = EarlyStopping(monitor='val_loss', patience=10)
# callback to generate text at epoch end
class generateText(Callback):
    def on_epoch_end(self, batch, logs={}):
        print(generate_text(self.model, 100))
generate_text_cb = generateText()

callbacks_list = [save_model_cb]

def train(model):
    model.fit(X, y, batch_size=BATCH_SIZE, verbose=2, 
              epochs=NB_EPOCHS, callbacks=callbacks_list, 
              validation_split=VALIDATION_SPLIT)
    model.save_weights('text_gen_got.hdf5')

def load_weigths(model):
    model.load_weights('text_gen_got.hdf5')

In [None]:
model = createModel();
load_weigths(model)
for i in range(5):
    print(i,"\n",generate_text(model, GENERATE_LENGTH))

![Model](images/model_description.png)
The input shape of the text data is ordered as follows : (batch size, number of time steps, hidden size). In other words, for each batch sample and each word in the number of time steps, there is a 500 length embedding word vector to represent the input word. These embedding vectors will be learnt as part of the overall model learning. The input data is then fed into two “stacked” layers of LSTM cells (of 500 length hidden size) – in the diagram above, the LSTM network is shown as unrolled over all the time steps. The output from these unrolled cells is still (batch size, number of time steps, hidden size).

This output data is then passed to a Keras layer called TimeDistributed, which will be explained more fully below. Finally, the output layer has a softmax activation applied to it. This output is compared to the training y data for each batch, and the error and gradient back propagation is performed from there in Keras. The training y data in this case is the input x words advanced one time step – in other words, at each time step the model is trying to predict the very next word in the sequence. However, it does this at every time step – hence the output layer has the same number of time steps as the input layer. This will be made more clear later.

There is a special Keras layer for use in recurrent neural networks called TimeDistributed. This function adds an independent layer for each time step in the recurrent model. So, for instance, if we have 10 time steps in a model, a TimeDistributed layer operating on a Dense layer would produce 10 independent Dense layers, one for each time step. The activation for these dense layers is set to be softmax in the final layer of our Keras LSTM model.