Neural networks, while quite adept at addressing some complex problems which lack procedural solutions (e.g., image recognition...), have not yet developed sufficient capabilities to reliably perform rather basic tasks. For example, it is a non-trivial task to train a network to perform addition of two arbitrary numbers.

Now, when adding, humans perform a number of subtasks as part of the overall algorithm used. For example:

- Numbers are decomposed into individual digits. 
- Digits of the two numbers are paired based on corresponding places.
- Digits are added.
- A determination is made whether digit addition results in a carry-over to the next place.
- This sequence is iterated until all digits in both numbers have been processed.
- Output digits are concatenated into a final answer.

Furthermore, basic addition of single digits involves a careful iteration of counting out successive numbers to reach the right answer.

So, even though we may consider addition to be basic, there is a fair amount of complexity to it. We easily forget how long it took to learn full addition of arbitrary sized numbers in elementary school -- this was something that was neatly fed to us in bite-sized chunks over a period of time in elementary.

If it takes a human (who has about 100B neurons) that much progression to learn addition, it's impractical to think a neural net (regardless of the types of nodes used) could learn the process in full generality from a single set of training data. A possibly better approach would be to decompose the overall algorithm into atomic subtasks, train on these tasks, and find ways to allow networks to compose and iterate using previously learned tasks. 

Graves (https://arxiv.org/pdf/1603.08983v4.pdf) has started to address providing networks with the capability to learn flexible iteration strategies and Neelakantan et al (https://arxiv.org/pdf/1511.04834v3.pdf) have proposed a method for allowing networks to learn by composing available functions. This work aims to extend utilize and extend these approaches to develop a network architecture capable of learning atomic tasks and using these tasks to learn more complex tasks.

This present notebook provides exploratory analysis of the efficacy of LSTM-based networks in performing some of the atomic tasks involved in addition to gauge the computational resources required for the overall project.

In [1]:
# Imports
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
from utils import *

Using TensorFlow backend.


In [2]:
# List of rule-generating functions which will be used to train models
flist = [concatenation, successor, predecessor, extraction, equality, addition, carry]

models = {} 
history = {}
best_model = {}

for j, f in enumerate(flist):
    x_train, y_train = inflate(f(),50000)
    x_train = np.array(x_train)
    y_train = np.array(y_train)

    in_dim = len(x_train[0][0])
    timesteps = len(x_train[0])
    out_dim = len(y_train[0])
    max_scale = 3
    epochs = 5

    best_loss = 1.0
    best = None
    models[f] = []
    history[f] = []

    for i in range(max_scale):
        model = Sequential()
        model.add(LSTM(in_dim*(i+1), return_sequences=True, input_shape=(timesteps, in_dim)))  
        model.add(LSTM(in_dim*(i+1), return_sequences=True))  
        model.add(LSTM(in_dim*(i+1)))
        model.add(Dense(out_dim, activation='softmax'))
        model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
        print 'Training LSTM-based model for {} with hidden layer size {}:'.format(f.__name__,in_dim*(i+1))
        history[f].append(model.fit(x_train, y_train, batch_size=64, nb_epoch=epochs))
        models[f].append(model)
        if history[f][i].history['loss'][epochs-1] < best_loss:
            best_loss = history[f][i].history['loss'][epochs-1]
            best = i
    best_model[f] = best


Training LSTM-based model for concatenation with hidden layer size 10:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for concatenation with hidden layer size 20:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for concatenation with hidden layer size 30:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for successor with hidden layer size 10:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for successor with hidden layer size 20:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for successor with hidden layer size 30:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for predecessor with hidden layer size 10:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for predecessor with hidden layer size 20:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training LSTM-based model for predecessor with h