# Deep Learning Foundation and Applications
## Assignment 2: Inferring Relation Among Very Large Numbers by Using RNNs
Author: 資工系碩一 李宸綾 610921231

## Step 1: Data loading and preprocessing
Every first step for AI model experiment always starts from loading the data value and give them a proper preprocessing. (I will try to put comments as much as I can for every line of codes to explain the meaning of codes. For more detailed informtion will be inside the contents of the PDF report.)

In [1]:
# load train.txt
with open("dataset/train.txt") as f:
    # declare empty lists for both train and predict (ground truth) value
    train_value = []
    predict_value = []

    # read the train values line by line
    for line in f:
        # seperate every sequence number by comma
        sep_string = line.split(',')

        count = 0

        # using slice of list to get the value we acctually need
        while(count < 2):
            # first and second number sequence need the last three digits
            sep_string[count] = sep_string[count][-3:]
            count += 1
        
        # the third number sequence need last four digits and no '\n'
        sep_string[2] = sep_string[2][-5:-1]

        # put the digits we need into the empty lists
        train_value.append(sep_string[:2])
        predict_value.append(sep_string[-1])

In [2]:
# the text we loaded in is string data type
# we want the integer type so have to transfer it
count = 0
while count < len(train_value):
    # because train_value now looks like [['1', '2'], ['3', '4'],...]
    # it's a list containing many lists, which lists containing 2 string inside
    # so we use map method to iterate those lists and transfer those string into integer
    train_value[count] = list(map(int, train_value[count]))
    count += 1

# predict_value looks like ['1234', '5678', ...]
# it's a simple list which containing many string
predict_value = list(map(int, predict_value))

# exam the element data type is int or not
print(train_value[0], type(train_value[0][0]))
print(predict_value[0], type(predict_value[0]))

[686, 617] <class 'int'>
4983 <class 'int'>


In [16]:
from numpy import array

# decide a split ratio for validation split
split_percent = 0.80
split = int(split_percent*len(train_value))

# format as NumPy arrays
X, y = array(train_value[:split]), array(predict_value[:split])
val_X, val_y = array(train_value[split:]), array(predict_value[split:])

# get the maximum value from train data
largest_X = max(max(X, key=max))
# largest_y = max(y)
val_largest_X = max(max(val_X, key=max))
# val_largest_y = max(val_y)

# since we have A and B two elements, the n_numbers will be 2
# if there are A, B, C and want to predict D, the n_numbers will be 3
n_numbers = 2

# normalize
X = X.astype('float') / float(largest_X * n_numbers)
y = y.astype('float') / float(largest_X * n_numbers)
val_X = val_X.astype('float') / float(val_largest_X * n_numbers)
val_y = val_y.astype('float') / float(val_largest_X * n_numbers)

print('Max value in C (first 200000 elements):', largest)
print('Max value in val_C (last 50000 elements): ',val_largest)
print('The amount of training data:', len(X))
print('The amount of validation data:', len(val_X))

Max value in C (first 200000 elements): 820
Max value in val_C (last 50000 elements):  820
The amount of training data: 200000
The amount of validation data: 50000


In [17]:
# invert normalization function
def invert(value, n_numbers, largest):
	return round(value * float(largest * n_numbers))

# example: put the first value from ground truth (C colunm)
# which value is already normalized, the function will return
# the original value (before normalization)
print(invert(y[0], 2, largest_X))
print(invert(val_y[0], 2, val_largest_X))

4983
1363


## Step 2: Creating different RNN models and compare the results
In this section, we will define some RNN models and use them to predict the validation data. Finally compare the results to find out which one is more suitable for this assignment. (Should keep as many as super parameters as same to make sure no other factors will affect the comparison between diffrent models' result)

In [4]:
'''
fix random seeds to get a reproducible result
'''
# Apparently you may use different seed values at each stage
seed_value= 0

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions: 
# tf.compat.v1.set_random_seed(seed_value)

In [19]:
'''
import some common libraris and declare some common variables that the value won't change even in diffrent model
'''
import keras
from keras.models import Sequential
from keras.layers import Dense
from math import sqrt
from sklearn.metrics import mean_squared_error

# how many time we gonna train
epochs = 20

# if batch size is divisible by 2^n (n is 1, 2, 3...),
# it can benefit the speed of training process
n_batch = 256

n_examples = split # n_examples = 200,000
val_numbers = len(train_value) - n_examples # val_numbers = 50,000

### Model GRU (Gated Recurrent Unit):
![GRU](figure/GRU.png)
![GRU&formula](figure/GRU&formula.png)

Here a Update gate is introduced, to decide whether to pass Previous O/P (ht-1) to next Cell (as ht) or not. Forget gate is nothing but additional Mathematical Operations with a new set of Weights (Wt).

In [38]:
from keras.layers import GRU

# create GRU
model_GRU = Sequential()

'''
"return_sequences": Boolean. Whether to return the last output in the output sequence, or the full sequence. Default: False.
'''
# remember input shape looks like: [[0.001, 0.014], [0.023, 0.034], ...]
model_GRU.add(GRU(6, input_shape=(n_numbers, 1), return_sequences=True))

model_GRU.add(GRU(2))
model_GRU.add(Dense(1))
model_GRU.compile(loss='mean_squared_error', optimizer='adam')

In [39]:
# train GRU
X = X.reshape(n_examples, n_numbers, 1)
for _ in range(epochs):
	model_GRU.fit(X, y, epochs=1, batch_size=n_batch, verbose=1)

Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


In [40]:
# evaluate on validation data
val_X = val_X.reshape(val_numbers, n_numbers, 1)
val_result = model_GRU.predict(val_X, batch_size=n_batch, verbose=1)



In [41]:
# calculate validation error (lose value)
expected = [invert(x, n_numbers, val_largest_X) for x in val_y]
predicted = [invert(x, n_numbers, val_largest_X) for x in val_result[:,0]]
rmse = sqrt(mean_squared_error(expected, predicted))

print('RMSE of GRU: %f' % rmse)

RMSE of GRU: 4.229000


In [24]:
# show some examples
for i in range(10):
	error = expected[i] - predicted[i]
	print('Expected = %d, Predicted = %d (err = %d)' % (expected[i], predicted[i], error))

Expected = 1363, Predicted = 1358 (err = 5)
Expected = 1962, Predicted = 1955 (err = 7)
Expected = 6813, Predicted = 6815 (err = -2)
Expected = 4184, Predicted = 4173 (err = 11)
Expected = 2029, Predicted = 2021 (err = 8)
Expected = 1354, Predicted = 1351 (err = 3)
Expected = 5371, Predicted = 5366 (err = 5)
Expected = 5614, Predicted = 5610 (err = 4)
Expected = 5568, Predicted = 5559 (err = 9)
Expected = 3746, Predicted = 3737 (err = 9)


### Model LSTM (Long Short-Term Memory layer):
![LSTM](figure/LSTM.png)
![LSTM&formula](figure/LSTM&formula.png)

Here 2 more Gates are introduced (Forget and Output) in addition to Update gate of GRU. And again as above, these are additional Mathematical Operations on same inputs (xt and ht-1). So overall, LSTM has introduced 2 Math operations having 2 new sets of Weights.

In [25]:
from keras.layers import LSTM

# create LSTM
model_LSTM = Sequential()

# remember input shape looks like: [[0.001, 0.014], [0.023, 0.034], ...]
model_LSTM.add(LSTM(6, input_shape=(n_numbers, 1), return_sequences=True))

model_LSTM.add(LSTM(2))
model_LSTM.add(Dense(1))
model_LSTM.compile(loss='mean_squared_error', optimizer='adam')

In [26]:
# train LSTM
X = X.reshape(n_examples, n_numbers, 1)
for _ in range(epochs):
	model_LSTM.fit(X, y, epochs=1, batch_size=n_batch, verbose=1)

Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


In [27]:
# evaluate on validation data
val_X = val_X.reshape(val_numbers, n_numbers, 1)
val_result = model_LSTM.predict(val_X, batch_size=n_batch, verbose=1)



In [28]:
# calculate validation error (lose value)
expected = [invert(x, n_numbers, val_largest_X) for x in val_y]
predicted = [invert(x, n_numbers, val_largest_X) for x in val_result[:,0]]
rmse = sqrt(mean_squared_error(expected, predicted))

print('RMSE of LSTM: %f' % rmse)

RMSE of LSTM: 7.783351


In [29]:
# show some examples
for i in range(10):
	error = expected[i] - predicted[i]
	print('Expected = %d, Predicted = %d (err = %d)' % (expected[i], predicted[i], error))

Expected = 1363, Predicted = 1354 (err = 9)
Expected = 1962, Predicted = 1954 (err = 8)
Expected = 6813, Predicted = 6815 (err = -2)
Expected = 4184, Predicted = 4176 (err = 8)
Expected = 2029, Predicted = 2019 (err = 10)
Expected = 1354, Predicted = 1345 (err = 9)
Expected = 5371, Predicted = 5367 (err = 4)
Expected = 5614, Predicted = 5614 (err = 0)
Expected = 5568, Predicted = 5562 (err = 6)
Expected = 3746, Predicted = 3742 (err = 4)


### Model SimpleRNN:
![simpleRNN](figure/simpleRNN.png)

Here there is simple multiplication of Input (xt) and Previous Output (ht-1). Passed through Tanh activation function. No Gates present.

In [30]:
from keras.layers import SimpleRNN

# create simpleRNN
model_sRNN = Sequential()

# remember input shape looks like: [[0.001, 0.014], [0.023, 0.034], ...]
model_sRNN.add(SimpleRNN(6, input_shape=(n_numbers, 1), return_sequences=True))

model_sRNN.add(SimpleRNN(2))
model_sRNN.add(Dense(1))
model_sRNN.compile(loss='mean_squared_error', optimizer='adam')

In [31]:
# train simpleRNN
X = X.reshape(n_examples, n_numbers, 1)
for _ in range(epochs):
	model_sRNN.fit(X, y, epochs=1, batch_size=n_batch, verbose=1)

Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


In [32]:
# evaluate on validation data
val_X = val_X.reshape(val_numbers, n_numbers, 1)
val_result = model_sRNN.predict(val_X, batch_size=n_batch, verbose=1)



In [33]:
# calculate validation error (lose value)
expected = [invert(x, n_numbers, val_largest_X) for x in val_y]
predicted = [invert(x, n_numbers, val_largest_X) for x in val_result[:,0]]
rmse = sqrt(mean_squared_error(expected, predicted))

print('RMSE of LSTM: %f' % rmse)

RMSE of LSTM: 28.950295


In [34]:
# show some examples
for i in range(10):
	error = expected[i] - predicted[i]
	print('Expected = %d, Predicted = %d (err = %d)' % (expected[i], predicted[i], error))

Expected = 1363, Predicted = 1383 (err = -20)
Expected = 1962, Predicted = 1964 (err = -2)
Expected = 6813, Predicted = 6750 (err = 63)
Expected = 4184, Predicted = 4182 (err = 2)
Expected = 2029, Predicted = 2007 (err = 22)
Expected = 1354, Predicted = 1363 (err = -9)
Expected = 5371, Predicted = 5395 (err = -24)
Expected = 5614, Predicted = 5639 (err = -25)
Expected = 5568, Predicted = 5566 (err = 2)
Expected = 3746, Predicted = 3736 (err = 10)


## Step 3: Choose the best answer and save the output answer
Here I choose the model GRU to predict the test.txt because it has the best lose value over other two models.

In [42]:
# load test.txt
with open("dataset/test.txt") as f:
    # declare empty lists for both train and predict (ground truth) value
    train_value = []

    # read the train values line by line
    for line in f:
        # seperate every sequence number by comma
        sep_string = line.split(',')

        # using slice of list to get the value we acctually need
        # first and second number sequence need the last three digits
        sep_string[0] = sep_string[0][-3:]
        
        # the second number sequence need last three digits and no '\n'
        sep_string[1] = sep_string[1][-4:-1]

        # put the digits we need into the empty lists
        train_value.append(sep_string)

In [43]:
print(train_value)

['323', '334'], ['351', '368'], ['360', '357'], ['332', '328'], ['344', '341'], ['354', '357'], ['340', '339'], ['330', '352'], ['352', '329'], ['329', '358'], ['335', '339'], ['341', '350'], ['367', '362'], ['323', '335'], ['352', '350'], ['363', '322'], ['338', '358'], ['336', '351'], ['339', '366'], ['339', '328'], ['331', '329'], ['338', '341'], ['357', '350'], ['328', '323'], ['330', '344'], ['355', '327'], ['365', '335'], ['343', '360'], ['359', '352'], ['354', '361'], ['359', '368'], ['343', '325'], ['329', '368'], ['352', '343'], ['337', '351'], ['366', '346'], ['347', '357'], ['334', '347'], ['326', '338'], ['344', '321'], ['321', '332'], ['329', '349'], ['324', '362'], ['322', '360'], ['359', '334'], ['339', '360'], ['326', '339'], ['335', '348'], ['343', '370'], ['339', '365'], ['341', '363'], ['324', '363'], ['362', '351'], ['363', '362'], ['366', '343'], ['340', '345'], ['336', '367'], ['368', '340'], ['357', '357'], ['344', '324'], ['321', '325'], ['364', '337'], ['324', 

In [44]:
# the text we loaded in is string data type
# we want the integer type so have to transfer it
count = 0
while count < len(train_value):
    # because train_value now looks like [['1', '2'], ['3', '4'],...]
    # it's a list containing many lists, which lists containing 2 string inside
    # so we use map method to iterate those lists and transfer those string into integer
    train_value[count] = list(map(int, train_value[count]))
    count += 1

# exam the element data type is int or not
print(train_value[0], type(train_value[0][0]))

[370, 339] <class 'int'>


In [45]:
# format as NumPy arrays
X = array(train_value)

# get the maximum value from train data
largest_X = max(max(X, key=max))

# since we have A and B two elements, the n_numbers will be 2
# if there are A, B, C and want to predict D, the n_numbers will be 3
n_numbers = 2

# normalize
X = X.astype('float') / float(largest_X * n_numbers)

In [48]:
X = X.reshape(len(train_value), n_numbers, 1)

# use the GRU model which has the best result
test_result = model_GRU.predict(X, batch_size=n_batch, verbose=1)



In [50]:
predicted = [invert(x, n_numbers, largest_X) for x in test_result[:,0]]

In [54]:
import pprint as pp

pp.pprint(predicted)

[3099,
 2982,
 3023,
 2787,
 2816,
 2847,
 3120,
 2924,
 3083,
 2975,
 3284,
 3231,
 3220,
 2766,
 2954,
 3038,
 3144,
 2902,
 2879,
 3207,
 2846,
 3288,
 3187,
 2965,
 3083,
 3091,
 2962,
 3028,
 2930,
 3120,
 2998,
 3078,
 2824,
 2921,
 2714,
 2966,
 3110,
 2965,
 2880,
 3179,
 3169,
 3192,
 2950,
 2870,
 3020,
 3053,
 2846,
 3017,
 2957,
 2884,
 3112,
 3267,
 2981,
 3234,
 3231,
 2844,
 3069,
 2925,
 3044,
 2937,
 3115,
 2948,
 3100,
 2975,
 3218,
 2829,
 3001,
 2976,
 3156,
 2864,
 2878,
 3190,
 3073,
 2847,
 2827,
 2956,
 3033,
 3197,
 3194,
 3095,
 3107,
 3187,
 3088,
 3093,
 3024,
 2794,
 2879,
 3193,
 2772,
 2986,
 2871,
 2845,
 3272,
 2759,
 2865,
 3255,
 3157,
 3211,
 3122,
 2953,
 3057,
 3244,
 2797,
 2843,
 3144,
 3006,
 3107,
 3026,
 2798,
 2851,
 3107,
 3148,
 3012,
 2744,
 3176,
 3115,
 3274,
 3108,
 3058,
 2777,
 2811,
 2953,
 3197,
 2784,
 3227,
 3043,
 2995,
 3051,
 3296,
 2698,
 2939,
 3009,
 2774,
 3118,
 2953,
 3116,
 3027,
 3062,
 3148,
 3255,
 2892,
 3136,
 3065,

In [55]:
# now we have to transfer the predicted value from intgeger to string
predicted = list(map(str, predicted))

print(predicted[0], type(predicted[0]))

3099 <class 'str'>


In [58]:
# don't forget to put on prefix with predicted value
prefix = "283950461728395046172839505982716"

# write predicted.txt
with open("dataset/C.txt", "w") as f:
    for line in predicted:
        f.writelines(prefix + line + '\n')