An implementation of sequence to sequence learning for performing addition
Input: "535+61"
Output: "596"
Padding is handled by using a repeated sentinel character (space)
Input may optionally be reversed, shown to increase performance in many tasks in:
"Learning to Execute"
http://arxiv.org/abs/1410.4615
and
"Sequence to Sequence Learning with Neural Networks"
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
Theoretically it introduces shorter term dependencies between source and target.

Modified to train RNN to conjugate past tense of verbs. Data currently taken from UCLA dataset on verb conjugation.

Code taken, modified from https://github.com/keras-team/keras/blob/master/examples/addition_rnn.py

In [1]:
from __future__ import print_function
from keras.models import Sequential
from keras import layers
import numpy as np
from six.moves import range
import csv
from keras.regularizers import L1L2
from matplotlib import pyplot

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Character Table

Taking in our string of numbers/operations as a string of characters, we must vectorize each character as a new input to our RNN. 

We first create a table mapping each character in our vocabulary (in our case, this is the numbers 0-9 and addition symbol '+') to a unique integer index. We also create a table mapping back the other way. 

To encode a string, we convert each character in the string to a one-hot vector with the character index as 1 and all other indices as 0. We also feed in the parameter `num_rows` as a maximum length of our string, equivalent to padding all strings to a specified length.

To decode an output array x, we take each vector of the string (representing a component) and decode it back into its most likely character. NOTE: I'm a little unclear about how we deal with variable-length answers. Maybe those are just decoded into space characters?

In [2]:
class CharacterTable(object):
    """Given a set of characters:
    + Encode them to a one hot integer representation
    + Decode the one hot integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
    def __init__(self, chars):
        """Initialize character table.
        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One hot encode given string C.
        # Arguments
            num_rows: Number of rows in the returned one hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)


In [3]:
class colors:
    ok = '\033[92m'
    fail = '\033[91m'
    close = '\033[0m'


## Model Parameters

These parameters are now heuristically chosen based on the data in our database.

MAXLEN is the maximum length of any one input (or can be truncated below that). Chosen as 14, for max input length from UCLA dataset.  
TRAINING_SIZE is an approximation (I think) for the number of examples in our dataset.  
`chars` is a string with all characters that can be used in our inputs/outputs. We also add a space for padding.

REVERSE is not currently used, but it should indicate whether we feed the inputs in standard order or reversed.

In [4]:
# Parameters for the model and dataset.
TRAINING_SIZE = 50000
MAXLEN_INPUT = 14
MAXLEN_OUTPUT = 15
REVERSE = True

chars = 'abcdefghijklmnopqrstuvwxyz '
ctable = CharacterTable(chars)


## Data Processing for Verb Tenses

We read in the data from our CSV of UCLA's verb tenses. Then we vectorize them using the Character class and model parameters defined above.

In [5]:
data_file = 'wordlist.csv'

import re # For stripping non-alphanumeric characters
present = []
past = []
with open(data_file) as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:
        if not (re.search(r'\W+', row[0]) or re.search(r'\W+', row[1])):
            # Only add if all alphanumeric
            # Then pad inputs with spaces
            present_word = row[0] + ' ' * (MAXLEN_INPUT - len(row[0]))
            if REVERSE:
                # Reverse the query, e.g., '12+345  ' becomes '  543+21'.
                present_word = present_word[::-1]
            present.append(present_word)
            past_word = row[1] + ' ' * (MAXLEN_OUTPUT - len(row[1]))
            past.append(past_word)

print(list(zip(present, past))[:10])
print("MAXLEN of input: ", max([len(s) for s in present]))
print("MAXLEN of output: ", max([len(s) for s in past]))

assert(len(present)==len(past))
print("Num examples: ", len(present))

[('       nodnaba', 'abandoned      '), ('         esaba', 'abased         '), ('         hsaba', 'abashed        '), ('         etaba', 'abated         '), ('    etaiverbba', 'abbreviated    '), ('      etacidba', 'abdicated      '), ('        tcudba', 'abducted       '), ('          teba', 'abetted        '), ('         rohba', 'abhorred       '), ('         ediba', 'abode          ')]
MAXLEN of input:  16
MAXLEN of output:  17
Num examples:  6872


## Data Processing

Vectorize data using the Character library and previously set parameters

In [82]:
print('Vectorization...')
x = np.zeros((len(past), MAXLEN_INPUT, len(chars)), dtype=np.bool)
y = np.zeros((len(past), MAXLEN_OUTPUT, len(chars)), dtype=np.bool)
for i, word in enumerate(present):
    x[i] = ctable.encode(word, MAXLEN_INPUT)
for i, word in enumerate(past):
    y[i] = ctable.encode(word, MAXLEN_OUTPUT)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)

Vectorization...
Training Data:
(1930, 14, 27)
(1930, 15, 27)
Validation Data:
(214, 14, 27)
(214, 15, 27)


## Setting up Model

In [89]:
# Try replacing GRU, or SimpleRNN.
RNN = layers.LSTM
HIDDEN_SIZE = 512
BATCH_SIZE = 128
LAYERS = 3


print('Build model...')
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN_INPUT, len(chars))))
# As the decoder RNN's input, repeatedly provide with the last hidden state of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(MAXLEN_OUTPUT))
# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
    # By setting return_sequences to True, return not only the last output but
    # all the outputs so far in the form of (num_samples, timesteps,
    # output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    model.add(RNN(HIDDEN_SIZE, return_sequences=True))

# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars))))
model.add(layers.Activation('softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_27 (LSTM)               (None, 512)               1105920   
_________________________________________________________________
repeat_vector_11 (RepeatVect (None, 15, 512)           0         
_________________________________________________________________
lstm_28 (LSTM)               (None, 15, 512)           2099200   
_________________________________________________________________
lstm_29 (LSTM)               (None, 15, 512)           2099200   
_________________________________________________________________
lstm_30 (LSTM)               (None, 15, 512)           2099200   
_________________________________________________________________
time_distributed_11 (TimeDis (None, 15, 27)            13851     
_________________________________________________________________
activation_11 (Activation)   (None, 15, 27)            0     

## Training, Validating Model

In [90]:
# Print validation examples and classifications from the dataset
def classify_val_examples():
    # Select 10 samples from the validation set at random so we can visualize
    # errors.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model.predict_classes(rowx, verbose=0)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q[::-1] if REVERSE else q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print(colors.ok + '☑' + colors.close, end=' ')
        else:
            print(colors.fail + '☒' + colors.close, end=' ')
        print(guess)

#Takes list of past loss and val_loss values, plots them
def plot_learning_curves(loss, val_loss):
    pyplot.clf()
    pyplot.plot(loss)
    pyplot.plot(val_loss)
    pyplot.title('model train vs validation loss')
    pyplot.ylabel('loss')
    pyplot.xlabel('epoch')
    pyplot.legend(['train', 'validation'], loc='upper right')
    pyplot.show()

In [None]:
# Train the model each generation and show predictions against the validation
# dataset. Also graph the loss and validation loss.

loss = []
val_loss = []
for iteration in range(1, 500):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    history = model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=1,
              validation_data=(x_val, y_val))
    loss.append(history.history['loss'])
    val_loss.append(history.history['val_loss'])
    if iteration % 1 == 0:
        classify_val_examples()
    if iteration % 50 == 0:
        plot_learning_curves(loss, val_loss)
    
pyplot.title('model train vs validation loss')
pyplot.ylabel('loss')
pyplot.xlabel('epoch')
pyplot.legend(['train', 'validation'], loc='upper right')
pyplot.show()


--------------------------------------------------
Iteration 1
Train on 1930 samples, validate on 214 samples
Epoch 1/1
Q please         T pleased         [91m☒[0m eeeeeed        
Q ponder         T pondered        [91m☒[0m eeeeeed        
Q coordinate     T coordinated     [91m☒[0m eeeeeeeeed     
Q reward         T rewarded        [91m☒[0m eeeeeed        
Q worry          T worried         [91m☒[0m eeeeee         
Q complement     T complemented    [91m☒[0m eeeeeeeeed     
Q foster         T fostered        [91m☒[0m eeeeeed        
Q bay            T bayed           [91m☒[0m eeeee          
Q core           T cored           [91m☒[0m eeeeed         
Q weigh          T weighed         [91m☒[0m eeeeed         

--------------------------------------------------
Iteration 2
Train on 1930 samples, validate on 214 samples
Epoch 1/1
Q glide          T glided          [91m☒[0m eeeeeed        
Q contain        T contained       [91m☒[0m eeeeeeeeedd    
Q expect      

## Assessing Underfitting, Overfitting

Here we plot the model's results to diagnose underfitting or overfitting.

We'll plot the learning curves with training loss and validation loss to understand what happens. Machine Learning Yearning might help us understand this too.

In [66]:
%matplotlib inline

history = model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=600,
              validation_data=(x_val, y_val))
# plot train and validation loss
pyplot.plot(history.history['loss'])
pyplot.plot(history.history['val_loss'])
pyplot.title('model train vs validation loss')
pyplot.ylabel('loss')
pyplot.xlabel('epoch')
pyplot.legend(['train', 'validation'], loc='upper right')
pyplot.show()

Train on 1930 samples, validate on 214 samples
Epoch 1/600
Epoch 2/600
Epoch 3/600
Epoch 4/600
Epoch 5/600
Epoch 6/600
Epoch 7/600
Epoch 8/600
Epoch 9/600
Epoch 10/600
Epoch 11/600
Epoch 12/600
Epoch 13/600
Epoch 14/600
Epoch 15/600
Epoch 16/600
Epoch 17/600
Epoch 18/600
Epoch 19/600
Epoch 20/600
Epoch 21/600
Epoch 22/600
Epoch 23/600
Epoch 24/600
Epoch 25/600
Epoch 26/600
Epoch 27/600
Epoch 28/600
Epoch 29/600
Epoch 30/600
Epoch 31/600
Epoch 32/600
Epoch 33/600
Epoch 34/600
Epoch 35/600
Epoch 36/600
Epoch 37/600
Epoch 38/600
Epoch 39/600
Epoch 40/600
Epoch 41/600
Epoch 42/600
Epoch 43/600
Epoch 44/600
Epoch 45/600
Epoch 46/600
Epoch 47/600
Epoch 48/600
Epoch 49/600
Epoch 50/600
Epoch 51/600
Epoch 52/600
Epoch 53/600
Epoch 54/600
Epoch 55/600
Epoch 56/600
Epoch 57/600
Epoch 58/600
Epoch 59/600
Epoch 60/600


Epoch 61/600
Epoch 62/600
Epoch 63/600
Epoch 64/600
Epoch 65/600
Epoch 66/600
Epoch 67/600
Epoch 68/600
Epoch 69/600
Epoch 70/600
Epoch 71/600
Epoch 72/600
Epoch 73/600
Epoch 74/600
Epoch 75/600
Epoch 76/600
Epoch 77/600
Epoch 78/600
Epoch 79/600
Epoch 80/600
Epoch 81/600
Epoch 82/600
Epoch 83/600
Epoch 84/600
Epoch 85/600
Epoch 86/600
Epoch 87/600
Epoch 88/600
Epoch 89/600
Epoch 90/600
Epoch 91/600
Epoch 92/600
Epoch 93/600
Epoch 94/600
Epoch 95/600
Epoch 96/600
Epoch 97/600
Epoch 98/600
Epoch 99/600
Epoch 100/600
Epoch 101/600
Epoch 102/600
Epoch 103/600
Epoch 104/600
Epoch 105/600
Epoch 106/600
Epoch 107/600
Epoch 108/600
Epoch 109/600
Epoch 110/600
Epoch 111/600
Epoch 112/600
Epoch 113/600
Epoch 114/600
Epoch 115/600
Epoch 116/600
Epoch 117/600
Epoch 118/600
Epoch 119/600
Epoch 120/600
Epoch 121/600
Epoch 122/600
Epoch 123/600
Epoch 124/600
Epoch 125/600
Epoch 126/600
Epoch 127/600
Epoch 128/600
Epoch 129/600
Epoch 130/600
Epoch 131/600
Epoch 132/600
Epoch 133/600
Epoch 134/600
Epo

Epoch 179/600
Epoch 180/600
Epoch 181/600
Epoch 182/600
Epoch 183/600
Epoch 184/600
Epoch 185/600
Epoch 186/600
Epoch 187/600
Epoch 188/600
Epoch 189/600
Epoch 190/600
Epoch 191/600
Epoch 192/600
Epoch 193/600
Epoch 194/600
Epoch 195/600
Epoch 196/600
Epoch 197/600
Epoch 198/600
Epoch 199/600
Epoch 200/600
Epoch 201/600
Epoch 202/600
Epoch 203/600
Epoch 204/600
Epoch 205/600
Epoch 206/600
Epoch 207/600
Epoch 208/600
Epoch 209/600
Epoch 210/600
Epoch 211/600
Epoch 212/600
Epoch 213/600
Epoch 214/600
Epoch 215/600
Epoch 216/600
Epoch 217/600
Epoch 218/600
Epoch 219/600
Epoch 220/600
Epoch 221/600
Epoch 222/600
Epoch 223/600
Epoch 224/600
Epoch 225/600
Epoch 226/600
Epoch 227/600
Epoch 228/600
Epoch 229/600
Epoch 230/600
Epoch 231/600
Epoch 232/600
Epoch 233/600
Epoch 234/600
Epoch 235/600
Epoch 236/600
Epoch 237/600


Epoch 238/600
Epoch 239/600
Epoch 240/600
Epoch 241/600
Epoch 242/600
Epoch 243/600
Epoch 244/600
Epoch 245/600
Epoch 246/600
Epoch 247/600
Epoch 248/600
Epoch 249/600
Epoch 250/600
Epoch 251/600
Epoch 252/600
Epoch 253/600
Epoch 254/600
Epoch 255/600
Epoch 256/600
Epoch 257/600
Epoch 258/600
Epoch 259/600
Epoch 260/600
Epoch 261/600
Epoch 262/600
Epoch 263/600
Epoch 264/600
Epoch 265/600
Epoch 266/600
Epoch 267/600
Epoch 268/600
Epoch 269/600
Epoch 270/600
Epoch 271/600
Epoch 272/600
Epoch 273/600
Epoch 274/600
Epoch 275/600
Epoch 276/600
Epoch 277/600
Epoch 278/600
Epoch 279/600
Epoch 280/600
Epoch 281/600
Epoch 282/600
Epoch 283/600
Epoch 284/600
Epoch 285/600
Epoch 286/600
Epoch 287/600
Epoch 288/600
Epoch 289/600
Epoch 290/600
Epoch 291/600
Epoch 292/600
Epoch 293/600
Epoch 294/600
Epoch 295/600
Epoch 296/600


Epoch 297/600
Epoch 298/600
Epoch 299/600
Epoch 300/600
Epoch 301/600
Epoch 302/600
Epoch 303/600
Epoch 304/600
Epoch 305/600
Epoch 306/600
Epoch 307/600
Epoch 308/600
Epoch 309/600
Epoch 310/600
Epoch 311/600
Epoch 312/600
Epoch 313/600
Epoch 314/600
Epoch 315/600
Epoch 316/600
Epoch 317/600
Epoch 318/600
Epoch 319/600
Epoch 320/600
Epoch 321/600
Epoch 322/600
Epoch 323/600
Epoch 324/600
Epoch 325/600
Epoch 326/600
Epoch 327/600
Epoch 328/600
Epoch 329/600
Epoch 330/600
Epoch 331/600
Epoch 332/600
Epoch 333/600
Epoch 334/600
Epoch 335/600
Epoch 336/600
Epoch 337/600
Epoch 338/600
Epoch 339/600
Epoch 340/600
Epoch 341/600
Epoch 342/600
Epoch 343/600
Epoch 344/600
Epoch 345/600
Epoch 346/600
Epoch 347/600
Epoch 348/600
Epoch 349/600
Epoch 350/600
Epoch 351/600
Epoch 352/600
Epoch 353/600
Epoch 354/600
Epoch 355/600


Epoch 356/600
Epoch 357/600
Epoch 358/600
Epoch 359/600
Epoch 360/600
Epoch 361/600
Epoch 362/600
Epoch 363/600
Epoch 364/600
Epoch 365/600
Epoch 366/600
Epoch 367/600
Epoch 368/600
Epoch 369/600
Epoch 370/600
Epoch 371/600
Epoch 372/600
Epoch 373/600
Epoch 374/600
Epoch 375/600
Epoch 376/600
Epoch 377/600
Epoch 378/600
Epoch 379/600
Epoch 380/600
Epoch 381/600
Epoch 382/600
Epoch 383/600
Epoch 384/600
Epoch 385/600
Epoch 386/600
Epoch 387/600
Epoch 388/600
Epoch 389/600
Epoch 390/600
Epoch 391/600
Epoch 392/600
Epoch 393/600
Epoch 394/600
Epoch 395/600
Epoch 396/600
Epoch 397/600
Epoch 398/600
Epoch 399/600
Epoch 400/600
Epoch 401/600
Epoch 402/600
Epoch 403/600
Epoch 404/600
Epoch 405/600
Epoch 406/600
Epoch 407/600
Epoch 408/600
Epoch 409/600
Epoch 410/600
Epoch 411/600
Epoch 412/600
Epoch 413/600
Epoch 414/600


Epoch 415/600
Epoch 416/600
Epoch 417/600
Epoch 418/600
Epoch 419/600
Epoch 420/600
Epoch 421/600
Epoch 422/600
Epoch 423/600
Epoch 424/600
Epoch 425/600
Epoch 426/600
Epoch 427/600
Epoch 428/600
Epoch 429/600
Epoch 430/600
Epoch 431/600
Epoch 432/600
Epoch 433/600
Epoch 434/600
Epoch 435/600
Epoch 436/600
Epoch 437/600
Epoch 438/600
Epoch 439/600
Epoch 440/600
Epoch 441/600
Epoch 442/600
Epoch 443/600
Epoch 444/600
Epoch 445/600
Epoch 446/600
Epoch 447/600
Epoch 448/600
Epoch 449/600
Epoch 450/600
Epoch 451/600
Epoch 452/600
Epoch 453/600
Epoch 454/600
Epoch 455/600
Epoch 456/600
Epoch 457/600
Epoch 458/600
Epoch 459/600
Epoch 460/600

KeyboardInterrupt: 