<h1><center>Recurrent Neural Network for Performing Integer Addition</center></h1>

# Introduction

This implementation has been adapted from an [example by Keras](https://github.com/keras-team/keras/blob/master/examples/addition_rnn.py). It is an implementation of sequence-to-sequence learning using a single LSTM layer.

# Implementation

In [125]:
# Imports
import numpy as np
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential


We start by generating our training set. For a lot of ML cases, finding a tagged dataset can be a challenge; fortunately for us, integer addition is something computers are pretty alright at doing already, so we can generate a dataset as large as we like with basically no effort.

We choose to reverse the input string as it's been shown to increase the model's accuracy (sources [here](https://arxiv.org/abs/1410.4615) and [here](http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)).

In [126]:
# Training data params
TRAINING_SET_SIZE = 20000
DIGITS_PER_SIDE = 3
REVERSE = True

MAX_INPUT_LEN = (2 * DIGITS_PER_SIDE) + 1
CHARS = list('0123456789+ ')
CHARS_INT_ENCODED = {x: i for i, x in enumerate(CHARS)}
ONE_HOT_CHARS = np.array(
    to_categorical(
        list(CHARS_INT_ENCODED.values()), 
        len(CHARS_INT_ENCODED), 
        'bool'
    )
)

questions = []
answers = []
seen_questions = set()

print('Generating training data...')
while len(questions) < TRAINING_SET_SIZE:

    # Generate question and answer
    random_number_gen = lambda: int(''.join(np.random.choice(CHARS[:-2])
        for i in range(np.random.randint(1, DIGITS_PER_SIDE + 1))))
    a, b = random_number_gen(), random_number_gen()
    answer_str = str(a + b)

    # Skip question if duplicate
    key = tuple(sorted((a, b)))
    if key in seen_questions:
        continue
    seen_questions.add(key)

    # Build (and pad) question and answer strings
    question_str = '{}+{}'.format(a, b)
    question_str += ' ' * (MAX_INPUT_LEN - len(question_str))
    answer_str += ' ' * (DIGITS_PER_SIDE + 1 - len(answer_str))

    # Reverse the question string
    if REVERSE:
        question_str = question_str[::-1]

    # Transform strings into one-hot representation
    char_to_one_hot = lambda x: ONE_HOT_CHARS[CHARS.index(x)]
    question_arr = np.array(list(map(char_to_one_hot, list(question_str))))
    answer_arr = np.array(list(map(char_to_one_hot, list(answer_str))))

    # Add to question/answer lists
    questions.append(question_arr)
    answers.append(answer_arr)

# Vectorise data lists
questions = np.array(questions)
answers = np.array(answers)
print('Finished generating {} questions'.format(len(questions)))

# Shuffle data
indicies = np.arange(len(questions))
np.random.shuffle(indicies)
questions = questions[indicies]
answers = answers[indicies]
print('Shuffled training data')

# Partition validation set
split_index = len(questions) - len(questions) // 10
(questions_train, questions_val) = questions[:split_index], questions[split_index:]
(answers_train, answers_val) = answers[:split_index], answers[split_index:]

print('Training set:')
print('Questions: {}'.format(questions_train.shape))
print('Answers: {}'.format(answers_train.shape))

print('Validation set:')
print('Questions: {}'.format(questions_val.shape))
print('Answers: {}'.format(answers_val.shape))

Generating training data...
Finished generating 20000 questions
Shuffled training data
Training set:
Questions: (18000, 7, 12)
Answers: (18000, 4, 12)
Validation set:
Questions: (2000, 7, 12)
Answers: (2000, 4, 12)


Next, we define some parameters for our model. Here we're using a single hidden layer of 128 nodes, and we're choosing to use LSTM for its suitability for processing sequences

In [127]:
# Model params
RNN = layers.LSTM
HIDDEN_SIZE = 128
LAYERS = 1
OPTIMIZER = 'adam'

Now we build our model. This works by allowing our LSTM layer to 'encode' our input into a hidden output, repeating for the length of the input, then decoding through a dense layer to retrieve our result.

This network structure is chosen because some very smart data scientists did research and experiments until they got to a really accurate model. I'm not going to even pretend I fully understand it, if you want to learn more I recommend reading the references linked at the bottom of the notebook.

In [128]:
print('Building model...')
model = Sequential()
model.add(RNN(HIDDEN_SIZE, input_shape=(MAX_INPUT_LEN, len(CHARS))))
model.add(layers.RepeatVector(DIGITS_PER_SIDE + 1))
for _ in range(LAYERS):
    model.add(RNN(HIDDEN_SIZE, return_sequences=True))
model.add(layers.TimeDistributed(layers.Dense(len(CHARS), activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
model.summary()

Building model...
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_12 (LSTM)               (None, 128)               72192     
_________________________________________________________________
repeat_vector_6 (RepeatVecto (None, 4, 128)            0         
_________________________________________________________________
lstm_13 (LSTM)               (None, 4, 128)            131584    
_________________________________________________________________
time_distributed_6 (TimeDist (None, 4, 12)             1548      
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


Finally, we use the data we generated earlier to train and evaluate our model, then save it so it can be used in other places for predictions.

In [129]:
EPOCHS = 50
BATCH_SIZE = 128
SAVE_PATH = './data/model'

model.fit(
    questions_train, 
    answers_train, 
    batch_size=BATCH_SIZE, 
    epochs=EPOCHS, 
    validation_data=(questions_val, answers_val)
)
print('Training completed')

print('Saving model to ', SAVE_PATH, ' ...')
model.save(SAVE_PATH)
print('Model saved')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Training completed
Saving model to  ./data/model  ...
INFO:tensorflow:Assets written to: ./data/model/assets
Model saved


# Results

Some more tinkering with our variables would be needed to get us a model as close to perfect as we could get, but with 20k training examples and 50 epochs, we're seeing an accuracies around 97%. With our model, we can accurately add together almost ANY pair of integers less than 1000. **Your move, Google**.

Below we use the model to obtain the answer to our question "what is 1 + 1?".

In [130]:
# Encode '1+1' into input format and transform with model
char_to_one_hot = lambda x: ONE_HOT_CHARS[CHARS.index(x)]
input_q = list(map(char_to_one_hot, list('    1+1')))
input_q = tf.convert_to_tensor([input_q])
output = model(input_q)

# Convert result into a string
prediction = tf.argmax(output[0], axis=1, output_type=tf.int32)
prediction_string = ''.join(list(map(lambda x: CHARS[x], list(prediction.numpy()))))
print('The predicted answer is ', prediction_string)

The predicted answer is  2   


# References

1. [Addition RNN example by Keras](https://github.com/keras-team/keras/blob/master/examples/addition_rnn.py) 

2. [Sequence to Sequence Learning with Neural Networks - Sutskever, Vinyals, Le](http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)

3. [Learning to Execute - Zaremba, Sutskever](https://arxiv.org/abs/1410.4615)