# Sequence to sequence learning for performing number addition (Perform add operation using RNN)

Link: https://keras.io/examples/nlp/addition_rnn/

## Introduction

In this example, we train a model to learn to add two numbers, provided as strings.

**Example:

    Input: "535+61"
    Output: "596"

Input may optionally be reversed, which was shown to increase performance in many tasks in: Learning to Execute and [Sequence to Sequence Learning with Neural Networks](

http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)

Theoretically, sequence order inversion introduces shorter term dependencies between source and target for this problem.

**Results:

For two digits (reversed):

    - One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs

Three digits (reversed):

    - One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs

Four digits (reversed):

    - One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs

Five digits (reversed):

    - One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs


## Setup

In [8]:
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Parameters for the model and dataset.
TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = True

# Maximum length of input is 'int + int' (e.g., '345+678').
# Maximum length of int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS

## Generate the data

In [9]:
class CharacterTable:
    """Given a set of characters:
    + Encode them to a one-hot integer representation
    + Decode the one-hot or integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
    
    def __init__(self, chars):
        """Initialize characcter table.
        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))
        
    def encode(self, C, num_rows):
        """One-hot encode given string C.
        # Arguments:
            C: string, to be encoded.
            num_rows: Number of rows in the returned one-hot encoding.
                This is used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x
    
    def decode(self, x, calc_argmax=True):
        """Decode the given vector or 2D array to their character output.
        # Arguments
            x: A vector or a 2D array of probabilities or one-hot representations;
                or a vector of character indices (used with `calc_argmax=False`).
            calc_argmax: Whether to find the character index with maximum
                probability, defaults to `True`.
        """
        if calc_argmax:
            x = x.argmax(axis=-1)
        return "".join(self.indices_char[x] for x in x)
    
    
# All the numbers, plus sign and space for padding.
chars = "0123456789+ "
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()
print("Generating data...")
while len(questions) < TRAINING_SIZE:
    f = lambda: int(
        "".join(
            np.random.choice(list("0123456789"))
            for i in range(np.random.randint(1, DIGITS + 1))
        )
    )
    a, b = f(), f()
    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)
    
    # Pad the data with spaces such that it is always MAXLEN.
    q = "{}+{}".format(a, b)
    query = q + " " * (MAXLEN - len(q))
    ans = str(a + b)
    # Answers can be of maximum size DIGITS + 1.
    ans += " " * (DIGITS + 1 - len(ans))
    if REVERSE:
        # Reverse the query, e.g., "12+345 ' becomes ' 543+21".
        # (Note the space used for padding.)
        query = query[::-1]
    questions.append(query)
    expected.append(ans)
print("Total questions:", len(questions))

Generating data...
Total questions: 50000


## Vectorize the data

In [13]:
print("Vectorization...")
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)
    
# Suffle (x, y) in unison as the later parts of x will almost
# all be larger digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print("Training Data:")
print(x_train.shape)
print(y_train.shape)

print("Validation Data:")
print(x_val.shape)
print(y_val.shape)

Vectorization...
Training Data:
(45000, 7, 12)
(45000, 4, 12)
Validation Data:
(5000, 7, 12)
(5000, 4, 12)


## Build the Model

In [17]:
print("Build model...")
num_layers = 1   # Try to add more LSTM layers!

model = keras.Sequential()
# "Encode" the input sequence using a LSTM, producing an output of size 128.
# Note: In a situation where our input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(layers.LSTM(128, input_shape=(MAXLEN, len(chars))))
# As the decoder RNN's input, repeatedly provide with the last output of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))
# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(num_layers):
    # By setting return_sequences to True, return not only the last output but
    # all the outputs so far in the form of (num_samples, timesteps,
    # output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    model.add(layers.LSTM(128, return_sequences=True))
    
# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.Dense(len(chars), activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

Build model...
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 128)               72192     
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 4, 128)            0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 4, 128)            131584    
_________________________________________________________________
dense (Dense)                (None, 4, 12)             1548      
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


## Train the Model

In [20]:
epochs = 30
batch_size = 32

# Train the model each generation and show predictions against
# the validation dataset.
for epoch in range(1, epochs):
    print()
    print("Iteration =", epoch)
    model.fit(
        x_train,
        y_train,
        batch_size=batch_size,
        epochs=1,
        validation_data=(x_val, y_val),
    )
    
    # Select 10 samples from the validation set at random
    # so we can visualize erroirs.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = np.argmax(model.predict(rowx), axis=-1)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print("Q", q[::-1] if REVERSE else q, end=" ")
        print("T", correct, end=" ")
        if correct == guess:
            print("☑ " + guess)
        else:
            print("☒ " + guess)


Iteration = 1
Q 46+352  T 398  ☒ 400 
Q 669+291 T 960  ☒ 102 
Q 64+475  T 539  ☒ 532 
Q 951+879 T 1830 ☒ 1588
Q 119+981 T 1100 ☒ 102 
Q 46+117  T 163  ☒ 211 
Q 27+855  T 882  ☒ 688 
Q 791+647 T 1438 ☒ 1333
Q 75+413  T 488  ☒ 410 
Q 725+8   T 733  ☒ 172 

Iteration = 2
Q 8+248   T 256  ☒ 259 
Q 963+41  T 1004 ☒ 901 
Q 422+9   T 431  ☒ 430 
Q 555+652 T 1207 ☒ 1211
Q 56+560  T 616  ☒ 614 
Q 88+60   T 148  ☒ 144 
Q 944+84  T 1028 ☒ 1024
Q 439+425 T 864  ☒ 764 
Q 26+333  T 359  ☒ 361 
Q 324+768 T 1092 ☒ 1011

Iteration = 3
Q 14+741  T 755  ☒ 747 
Q 315+8   T 323  ☒ 324 
Q 91+95   T 186  ☒ 176 
Q 0+578   T 578  ☒ 574 
Q 557+966 T 1523 ☒ 1419
Q 246+201 T 447  ☒ 449 
Q 356+454 T 810  ☒ 818 
Q 238+59  T 297  ☒ 295 
Q 239+729 T 968  ☒ 977 
Q 43+972  T 1015 ☒ 1018

Iteration = 4
Q 596+378 T 974  ☒ 968 
Q 717+225 T 942  ☒ 930 
Q 169+5   T 174  ☒ 172 
Q 735+6   T 741  ☒ 740 
Q 899+372 T 1271 ☒ 1265
Q 6+882   T 888  ☑ 888 
Q 659+806 T 1465 ☒ 1470
Q 29+941  T 970  ☒ 978 
Q 12+924  T 936  ☒ 930 
Q 18

Q 61+452  T 513  ☑ 513 
Q 904+32  T 936  ☑ 936 
Q 4+930   T 934  ☑ 934 
Q 9+549   T 558  ☑ 558 
Q 809+3   T 812  ☑ 812 
Q 15+26   T 41   ☑ 41  
Q 7+957   T 964  ☑ 964 
Q 583+15  T 598  ☑ 598 
Q 5+884   T 889  ☑ 889 
Q 999+504 T 1503 ☒ 1403

Iteration = 23
Q 118+3   T 121  ☑ 121 
Q 998+99  T 1097 ☑ 1097
Q 494+263 T 757  ☑ 757 
Q 6+289   T 295  ☑ 295 
Q 84+645  T 729  ☑ 729 
Q 54+687  T 741  ☑ 741 
Q 407+337 T 744  ☑ 744 
Q 512+782 T 1294 ☑ 1294
Q 8+694   T 702  ☑ 702 
Q 424+84  T 508  ☑ 508 

Iteration = 24
Q 453+71  T 524  ☑ 524 
Q 101+256 T 357  ☑ 357 
Q 798+74  T 872  ☑ 872 
Q 933+591 T 1524 ☑ 1524
Q 56+421  T 477  ☑ 477 
Q 677+463 T 1140 ☑ 1140
Q 61+402  T 463  ☑ 463 
Q 340+0   T 340  ☑ 340 
Q 97+698  T 795  ☑ 795 
Q 63+95   T 158  ☑ 158 

Iteration = 25
Q 550+191 T 741  ☑ 741 
Q 985+62  T 1047 ☑ 1047
Q 58+790  T 848  ☑ 848 
Q 69+371  T 440  ☑ 440 
Q 26+229  T 255  ☑ 255 
Q 56+328  T 384  ☑ 384 
Q 82+633  T 715  ☑ 715 
Q 244+872 T 1116 ☑ 1116
Q 573+89  T 662  ☑ 662 
Q 464+296 T 760 

## Save the Model

In [22]:
import os

model_name = 'add_operation_using_rnn.h5'
model.save(model_name)
print('model saved to', os.getcwd())

model saved to /home/nsl20/Desktop/Aminul(me)/Deep-Learning-Guide/RNN & LSTM
