New task for adding scalar values (0 or 1) #4

Zeta36 · 2017-01-07T16:05:14Z

Common Settings

The model is trained on 2-layer feedforward controller (with hidden sizes 128 and 256 respectively) with the following set of hyperparameters:

RMSProp Optimizer with learning rate of 10⁻⁴, momentum of 0.9.
Memory word size of 10, with a single read head.
A batch size of 1.
input_size = 3.
output_size = 1.
sequence_max_length = 100.
words_count = 15.
word_size = 10.
read_heads = 1.

A square loss function of the form: (y - y_)**2 is used. Where both 'y' and 'y_' are scalar numbers.

The input is a (1, random_length, 3) tensor, where the 3 is for a one-hot encoding vector of size 3, where:

010 is a '0'
100 is a '1'
001 is the end mark

So, and example of an input of length 10 will be the next 3D-tensor:

[[[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]]

This input is a represenation of a sequence of adding 0 or 1 values in the form of:

0 + 0 + 1 + 0 + 0 + 1 + 0 + 0 + (end_mark)

The target outoput is a 3D-tensor with the result of this adding task. In the example above:

[[[2.0]]]

The DNC output is a 3D-tensor of shape (1, random_length, 1). For example:

[[[ 0.45]
[ -0.11]
[ 1.3]
[ 5.0]
[ 0.5]
[ 0.1]
[ 1.0]
[ -0.5]
[ 0.33]
[ 0.12]]]

The target output and the DNC output are both then reduced with tf.reduce_sum() so we end up with two scalar values. For example:

Target_output: 2.0
DNC_output: 5.89

And we apply then the square loss function:

loss = (Target_o - DNC_o)**2

and finally the gradient update.

Results

The model is going to recieve as input a random length sequence of 0 or 1 values like:

Input: 1 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1

Then it will return a scalar value for this input adding proccess. For example, the DNC will output something like: 3.98824.
This value will be the predicted result for the input adding sequence (we are going to truncate the integer part of the result):

DNC prediction: 1 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 = 3 [3.98824]

Once we train the model with:

$python tasks/copy/train.py --iterations=50000

we can see that the model learns in less than 1000 iterations to compute this adding function, and the loss drop from:

Iteration 0/1000
Avg. Logistic Loss: 24.9968

to:

Iteration 1000/1000
Avg. Logistic Loss: 0.0076

It seems like the DNC model is able to learn this pseudo-code:

function(x):
if (x == [ 1. 0. 0.])
return (near) 1.0 (float values)
else
return (near) 0.0 (float values)

Generalization test

We use for the model a sequence_max_length = 100, but in the training proccess we use just random length sequences up to 10 (sequence_max_length/10). Once the train is finished, we let the trained model to generalize to random length sequences up to 100 (sequence_max_length).

Results show that the model successfully generalize the adding task even with sequence 10 times larger than the training ones.

These are real data outputs:

Building Computational Graph ... Done!
Initializing Variables ... Done!

Iteration 0/1000
Avg. Logistic Loss: 24.9968
Real value: 0 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 1 + 1 = 5
Predicted: 0 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 1 + 1 = 0 [0.000319847]

Iteration 100/1000
Avg. Logistic Loss: 5.8042
Real value: 0 + 1 + 0 + 0 + 1 + 0 + 1 + 0 + 1 + 1 = 5
Predicted: 0 + 1 + 0 + 0 + 1 + 0 + 1 + 0 + 1 + 1 = 6 [6.1732]

Iteration 200/1000
Avg. Logistic Loss: 0.7492
Real value: 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 1 = 9
Predicted: 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 1 = 8 [8.91952]

Iteration 300/1000
Avg. Logistic Loss: 0.0253
Real value: 0 + 1 + 1 = 2
Predicted: 0 + 1 + 1 = 2 [2.0231]

Iteration 400/1000
Avg. Logistic Loss: 0.0089
Real value: 0 + 1 + 0 + 0 + 0 + 1 + 1 = 3
Predicted: 0 + 1 + 0 + 0 + 0 + 1 + 1 = 2 [2.83419]

Iteration 500/1000
Avg. Logistic Loss: 0.0444
Real value: 1 + 0 + 1 + 1 = 3
Predicted: 1 + 0 + 1 + 1 = 2 [2.95937]

Iteration 600/1000
Avg. Logistic Loss: 0.0093
Real value: 1 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 = 4
Predicted: 1 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 = 3 [3.98824]

Iteration 700/1000
Avg. Logistic Loss: 0.0224
Real value: 0 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 = 6
Predicted: 0 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 = 5 [5.93554]

Iteration 800/1000
Avg. Logistic Loss: 0.0115
Real value: 0 + 0 = 0
Predicted: 0 + 0 = -1 [-0.0118587]

Iteration 900/1000
Avg. Logistic Loss: 0.0023
Real value: 1 + 1 + 0 + 0 + 1 + 1 + 1 + 0 + 0 = 5
Predicted: 1 + 1 + 0 + 0 + 1 + 1 + 1 + 0 + 0 = 4 [4.97147]

Iteration 1000/1000
Avg. Logistic Loss: 0.0076
Real value: 1 + 0 + 0 + 1 + 1 + 0 + 0 + 1 = 4Done!

Testing generalization...

Iteration 0/1000
Predicted: 1 + 0 + 0 + 1 + 1 + 0 + 0 + 1 = 4 [4.123]

Saving Checkpoint ...
Real value: 1 + 1 + 0 + 0 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 0 + 0 = 6
Predicted: 1 + 1 + 0 + 0 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 0 + 0 = 6 [6.24339]

Iteration 1/1000
Real value: 1 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 0 + 0 + 1 + 1 = 11
Predicted: 1 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 0 + 0 + 1 + 1 = 11 [11.1931]

Iteration 2/1000
Real value: 0 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 1 + 1 + 1 = 33
Predicted: 0 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 1 + 1 + 1 = 32 [32.9866]

Iteration 3/1000
Real value: 1 + 0 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 1 = 16
Predicted: 1 + 0 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 0 + 0 + 1 + 1 = 16 [16.1541]

Iteration 4/1000
Real value: 1 + 0 + 0 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 = 44
Predicted: 1 + 0 + 0 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 = 43 [43.5211]

Mostafa-Samir · 2017-01-14T23:37:13Z

Impressive work!
I'm certainly curious about how it was able to generalize with the same amount of memory locations!

What do you think about taking it up a notch?
Let's remove that reduce_sum and see if it can learn to add on its own. Here's how I think it could go: your input sequence would go something like this:
1 + 0 + 0 + 1 + 1 + 1 + 0 + 1 + 0 = - , and your target output would be the scalar 5. instead of attempting to copy the sequence via adding, we make the task that at the step containing '-' the model should output the value of the summation! Your loss would be the squared difference between between the output at that step and your target output, the loss at all previous step is omitted (you can find the technique of omitting the loss on specific steps in the recently pushed bAbI task).

I've just pushed new updates to the code that include optimizations in both memory and execution time performance, so you would be able to leave it training for more iterations while doing this more quickly!

I'm looking forwrad to see your results with this!

Zeta36 · 2017-01-15T15:23:05Z

Hello, @Mostafa-Samir.

You can get the code of the adding task without the tf.resume_sum() in here: https://github.com/Zeta36/DNC-tensorflow/blob/master/tasks/adding/train_v2.py.

But I'm afraid that removing the tf.reduce_sum() makes the model unable to generalize with success with a fixed memory size as before. In this new version of the code, the model is still able to learn to resolve any sequence of 0 and 1 sums, but it fails when we try to use the learned model to larger sequences than that used in the training process.

I think that's because the original version I pulled here make use of the tf.reduce_sum() as a way of accumulator. I think the model learns an algorithm like this:

function(X):
for each x in X:
if (x == [ 1. 0. 0.])
return (near) 1.0 (float value)
else
return (near) 0.0 (float value)

And later, the tf.reduce_sum() makes the correct sum over all the sequence output. The output will have a nearly 1 for each [ 1. 0. 0.] input vector, and a nearly 0 in other case, and finally the tf.reduce_sum() will give the correct answer no matter the large the input is. And I think is because this little "if else" f(x) algorithm is easy to learn that the model is able to generalize to unlimited large inputs X sequences with a fixed memory size.

As soon as we remove the tf.reduce_sum() like in the version I made following your instructions, this trick doesn't work and the model has to learn other more complex and less generalizable algorithm than the f(x) I told you before.

What do you think, @Mostafa-Samir?

Regards,
Samu.

Zeta36 · 2017-01-15T15:29:15Z

Here you have a little excerpt of a real training result of the new version (https://github.com/Zeta36/DNC-tensorflow/blob/master/tasks/adding/train_v2.py):
Iteration 800/1001

Avg. Cross-Entropy: 0.0231753
Avg. 100 iterations time: 0.03 minutes
Approx. time to completion: 0.00 hours
DNC input
[[[ 0. 1. 0.]
[ 1. 0. 0.]
[ 0. 1. 0.]
[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]]
Text input: 1 + 0 + 1 + 0 + 1 + 1 = -
Target_output
[[[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 4.]]]
DNC output
[[[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 3.52538943]]]
Real operation: 1 + 0 + 1 + 0 + 1 + 1 = 4
Predicted result: 1 + 0 + 1 + 0 + 1 + 1 = 4 [3.52539]
...
...
Iteration 1000/1001
Avg. Cross-Entropy: 0.0046492
Avg. 100 iterations time: 0.03 minutes
Approx. time to completion: 0.00 hours
DNC input
[[[ 0. 1. 0.]
[ 1. 0. 0.]
[ 1. 0. 0.]
[ 1. 0. 0.]
[ 1. 0. 0.]
[ 1. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 1.]]]
Text input: 1 + 0 + 0 + 0 + 0 + 0 + 0 = -
Target_output
[[[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 1.]]]
DNC output
[[[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0.86268544]]]
Real operation: 1 + 0 + 0 + 0 + 0 + 0 + 0 = 1
Predicted result: 1 + 0 + 0 + 0 + 0 + 0 + 0 = 1 [0.862685]

Iteration 1001/1001
Saving Checkpoint ... Done!

Testing generalization...

Iteration 0/1000
Real operation: 1 + 0 + 1 + 0 + 0 + 1 + 0 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 1 + 0 + 0 + 0 = 56
Predicted result: 1 + 0 + 1 + 0 + 0 + 1 + 0 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 1 + 0 + 0 + 0 = 9316 [[ 9316.20117188]]

Iteration 1/1000
Real operation: 1 + 0 = 1
Predicted result: 1 + 0 = 1 [[ 0.853342]]

Iteration 2/1000
Real operation: 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 0 + 1 = 17
Predicted result: 1 + 0 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 0 + 1 = 74 [[ 73.88546753]]

Zeta36 · 2017-01-15T20:50:04Z

@Mostafa-Samir, due to the great improvement in the core of your DNC implementation I've developed another task for testing the project. I've made a model that successfully is able to learn a argmax function over a input.

The model is feed with a vector of onehot integer values, and the target output is the index inside the vector with the maximum value. I'm glad to say to you that your DNC is able to learn this function using just a feedforward controller, and even better, ¡is able to generalize to larger vectors of those used in the training process!

You can see my code here: https://github.com/Zeta36/DNC-tensorflow/blob/master/tasks/argmax/train_v2.py.

And here you can see some results:
...
...
Iteration 9900/10001
Avg. Cross-Entropy: 0.1064857
Avg. 100 iterations time: 0.16 minutes
Approx. time to completion: 0.00 hours
DNC input [[[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]]
Target_output [[[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 1.]]]
DNC output [[[ 0. ]
[-0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 1.44688594]]]
Real argmax(X): 1
Predicted f(X): 1

Iteration 10000/10001
Avg. Cross-Entropy: 0.0603415
Avg. 100 iterations time: 0.16 minutes
Approx. time to completion: 0.00 hours
DNC input [[[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]]
Target_output [[[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 5.]]]
DNC output [[[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 0. ]
[ 4.93556786]]]
Real argmax(X): 5
Predicted f(X): 5

Saving Checkpoint ... Done!

Testing generalization...

Iteration 0/10000
Real argmax(X): 3
Predicted f(X): 3

Iteration 1/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 2/10000
Real argmax(X): 4
Predicted f(X): 3

Iteration 3/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 4/10000
Real argmax(X): 1
Predicted f(X): 1

Iteration 5/10000
Real argmax(X): 3
Predicted f(X): 3

Iteration 6/10000
Real argmax(X): 1
Predicted f(X): 2

Iteration 7/10000
Real argmax(X): 3
Predicted f(X): 2

Iteration 8/10000
Real argmax(X): 6
Predicted f(X): 6

Iteration 9/10000
Real argmax(X): 5
Predicted f(X): 4

Iteration 10/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 11/10000
Real argmax(X): 5
Predicted f(X): 4

Iteration 12/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 13/10000
Real argmax(X): 0
Predicted f(X): 2

Iteration 14/10000
Real argmax(X): 2
Predicted f(X): 5

Iteration 15/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 16/10000
Real argmax(X): 1
Predicted f(X): 1

Iteration 17/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 18/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 19/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 20/10000
Real argmax(X): 1
Predicted f(X): 1

Iteration 21/10000
Real argmax(X): 4
Predicted f(X): 4

Iteration 22/10000
Real argmax(X): 10
Predicted f(X): 10

Iteration 23/10000
Real argmax(X): 6
Predicted f(X): 5

Iteration 24/10000
Real argmax(X): 1
Predicted f(X): 2

Iteration 25/10000
Real argmax(X): 4
Predicted f(X): 3

Iteration 26/10000
Real argmax(X): 1
Predicted f(X): 3

Iteration 27/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 28/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 29/10000
Real argmax(X): 3
Predicted f(X): 3

Iteration 30/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 31/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 32/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 33/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 34/10000
Real argmax(X): 6
Predicted f(X): 6

Iteration 35/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 36/10000
Real argmax(X): 5
Predicted f(X): 4

Iteration 37/10000
Real argmax(X): 0
Predicted f(X): 0

Iteration 38/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 39/10000
Real argmax(X): 3
Predicted f(X): 3

Iteration 40/10000
Real argmax(X): 4
Predicted f(X): 4

Iteration 41/10000
Real argmax(X): 6
Predicted f(X): 6

Iteration 42/10000
Real argmax(X): 15
Predicted f(X): 14

Iteration 43/10000
Real argmax(X): 1
Predicted f(X): 1

Iteration 44/10000
Real argmax(X): 2
Predicted f(X): 2

Iteration 45/10000
Real argmax(X): 11
Predicted f(X): 10

Iteration 46/10000
Real argmax(X): 3
Predicted f(X): 3

Iteration 47/10000
Real argmax(X): 1
Predicted f(X): 1

Iteration 48/10000
Real argmax(X): 1
Predicted f(X): 1

Iteration 49/10000
Real argmax(X): 13
Predicted f(X): 13
...
...

I don't know how the model is able to figure out where has been the highest value in the sequence of onehot encoded input values but it does, and even is able to generalize this learned method to sequences double of the size used in the training process without more memory use. DeepMind has found something big with this DNC, and they are improving it with a sparse version able to use less resources: https://arxiv.org/pdf/1610.09027v1.pdf

Regards,
Samu.

Mostafa-Samir · 2017-01-18T18:27:55Z

Great work Samu @Zeta36 !

Regarding the adding task
I have a comment about how you apply the wights to the loss. You use the following:

loss = tf.reduce_mean(tf.square((loss_weights * output) - ncomputer.target_output))

while you should be using:

loss = tf.reduce_mean(loss_weights * tf.square(output - ncomputer.target_output))

Remember, you're weighting the contribution of the loss of each step not the significance of each step on its own. Mathematically it's written as

not

I don't really know how you generate the output vector, but the 1st formulation can easily overestimate your loss value.

Try to adopt this change and see if it has any effect on the model. You should also try to test the generalization of the adding by using the same trained model but with larger memory matrix (more locations) just as you can find in the visualization notebook of the copy task. It'd also be a good idea to separate the generalization tests into different scripts than the training one, and try to use a single descriptive statistic (like the percentage of correct answers, or the percentage of error or whatever you decide) to describe your results so instead of dumping the entire log in the README you can just add one or two examples from the log and describe your results with that statistic!

I'll be happy then to merge your contributions to repo!

cornagli · 2019-09-10T13:19:26Z

Hi @Zeta36 and @Mostafa-Samir ,
I am really excited about the results of your tasks and about the DNC's potentiality.

For this reason, I am trying to implement a further task by myself. I am interested in understanding if a DNC can solve it. I would really appreciate any feedback from you, thanks.

Task description

The task is to count the total number of repeated numbers in a list.

For example:

Input: [ 1, 2, 3] 
Output: [0]

Input: [ 1, 2, 3, 2, 4, 1, 5]
                  X     X     : Repetitions
Output: [2]

The pseudo code the DNC should learn is:

function(x, seenNumbers):
   if x in seenNumbers:
       return 1
   else:
       return 0

I am wondering if the DNC can manage by itself the seenNumber list.

Settings

Assuming that the DNC can solve the task (I suppose a simple LSTM net can), I would structure the data as follows:

Input: (1, random length, 1) tensor
Output: Either (1, random length, 1) tensor or scalar containing the sum of the repetitions
Loss: depending on the output structure, a square loss function element by element or between two scalars
DNC parameters: Currently it is obscure to me how to set the memory parameters (word size, number of words)

What do you think? Do you think that it would be feasible for the DNC to solve the task?

Thanks,
Alessandro

Samuel Graván added 18 commits January 7, 2017 16:34

Create README.md

2ddb470

first commit

3251a02

first commit

1bcf71c

Delete dnc

36124ff

first commit

6d6fc67

first commit

37ab2ec

first commit

c94a3e7

Update dnc

8c8b9e3

Delete dnc

f300c87

Create README

938d64b

Delete README.md

b1cf05f

Add files via upload

414fa8b

Delete controller.py

cd831bd

Delete README

49c7752

Delete dnc.py

98d657a

Delete memory.py

0fbce1d

Delete utility.py

1e489dc

Delete README.md

b16f1bc

uploading the v2 version (without tf.resume_sum)

0940969

Update train_v2.py

34d649c

Samuel Graván added 2 commits January 15, 2017 17:34

Create README.md

57b0542

first commit

7db6af7

Mostafa-Samir assigned Mostafa-Samir and unassigned Mostafa-Samir Jan 18, 2017

Mostafa-Samir added the in-progress label Jan 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New task for adding scalar values (0 or 1) #4

New task for adding scalar values (0 or 1) #4

Zeta36 commented Jan 7, 2017

Mostafa-Samir commented Jan 14, 2017 •

edited

Loading

Zeta36 commented Jan 15, 2017

Zeta36 commented Jan 15, 2017 •

edited

Loading

Zeta36 commented Jan 15, 2017 •

edited

Loading

Mostafa-Samir commented Jan 18, 2017

cornagli commented Sep 10, 2019

New task for adding scalar values (0 or 1) #4

Are you sure you want to change the base?

New task for adding scalar values (0 or 1) #4

Conversation

Zeta36 commented Jan 7, 2017

Common Settings

Results

Generalization test

Mostafa-Samir commented Jan 14, 2017 • edited Loading

Zeta36 commented Jan 15, 2017

Zeta36 commented Jan 15, 2017 • edited Loading

Zeta36 commented Jan 15, 2017 • edited Loading

Mostafa-Samir commented Jan 18, 2017

cornagli commented Sep 10, 2019

Task description

Settings

Mostafa-Samir commented Jan 14, 2017 •

edited

Loading

Zeta36 commented Jan 15, 2017 •

edited

Loading

Zeta36 commented Jan 15, 2017 •

edited

Loading