# Stacks

There exist many implementations of differentiable stacks in the literature related to neural turing machines and similar. e.g. [Learning to Transduce with Unbounded Memory](http://papers.nips.cc/paper/5648-learning-to-transduce-with-unbounded-memory.pdf), [Inferring Algorithmic Patterns withStack-Augmented Recurrent Nets](https://papers.nips.cc/paper/5857-inferring-algorithmic-patterns-with-stack-augmented-recurrent-nets.pdf) e.t.c.

However, here we restrict ourselves to follow these two rules. 
1. It must be deterministic and lossless in forward pass
2. It must have well definied gradients in backward pass

Tensorflow's autograd does a good job at keeping track where the gradients should flow and thus the differentiable implementation looks almost identical to the classical implementation.

In [1]:
import tensorflow as tf

## Soft assignment
Tensorflow does not allow direct assignment of array indexes, so we use this trick. For more information go to [bubble-sort.ipynb](bubble-sort.ipynb) or [differentiable-indexed-arrays.ipynb](differentiable-indexed-arrays.ipynb)

In [2]:
@tf.function
def assign_index(arr, index, element):
    arr_shape = tf.shape(arr)
    
    pos_mask = tf.eye(arr_shape[0])[index]
    pos_mask = tf.transpose(tf.expand_dims(pos_mask, 0))
    neg_mask = 1 - pos_mask
    
    tiled_element = tf.reshape(tf.tile(element, [arr_shape[0]]), arr_shape)
    
    arr = arr * neg_mask + tiled_element * pos_mask
    
    return arr

## Stack push
The `stack_push` function is a stateless function. At the time of writing, the Autograph has undefined behaviour if we try to build a stateful implementation of stack like using python class or using closures.

The `state` variable has two variables, buffer and index. The buffer is the writable buffer where stack elements are stored. Index points to top of stack + 1.

In [3]:
@tf.function
def stack_push(state, element):
    buffer, index = state
    buffer = assign_index(buffer, index, element)
    index += 1
    
    state = (buffer, index)
    return state

buffer = tf.zeros((3,3), dtype=tf.float32)
index = tf.constant(0, dtype=tf.int32)
state = (buffer, index)
elements = tf.Variable([
    [1,1,1],
    [2,2,2],
    [3,3,3]
],dtype=tf.float32)

with tf.GradientTape() as tape:
    state = stack_push(state, elements[0])
    state = stack_push(state, elements[1])
    
print(state[0])
print(state[1])
print(tape.gradient(state[0], elements))

tf.Tensor(
[[1. 1. 1.]
 [2. 2. 2.]
 [0. 0. 0.]], shape=(3, 3), dtype=float32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(
[[1. 1. 1.]
 [1. 1. 1.]
 [0. 0. 0.]], shape=(3, 3), dtype=float32)


## Stack pop
For buffer lookup we use the naive approach as described in [differentiable-indexed-arrays.ipynb](differentiable-indexed-arrays.ipynb). We also update the index and return both state and element.

In [4]:
@tf.function
def stack_pop(state):
    buffer, index = state
    index -= 1
    element = buffer[index]
    
    state = (buffer, index)
    return state, element

index = tf.constant(3, dtype=tf.int32)
buffer = tf.Variable([
    [1,1,1],
    [2,2,2],
    [3,3,3]
],dtype=tf.float32)
state = (buffer, index)

with tf.GradientTape() as tape:
    ns1, element = stack_pop(state)
    ns2, element = stack_pop(ns1)

print(ns2[0])
print(ns2[1])
print(tape.gradient(element, buffer))

tf.Tensor(
[[1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]], shape=(3, 3), dtype=float32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(
[[0. 0. 0.]
 [1. 1. 1.]
 [0. 0. 0.]], shape=(3, 3), dtype=float32)


## Toy example: Reversing a list
Using two stacks, we can reverse a list. The algorithm has two steps
* Stack 1 pushes all elements into itself
* Stack 1 then pops an element and Stack 2 pushes that element into itself

The buffer of Stack 2 is the solution

In [5]:
@tf.function
def reverse_list(arr):
    arr_shape = tf.shape(arr)
    arr = tf.unstack(arr)
    
    buffer1 = tf.zeros(arr_shape, dtype=tf.float32)
    index1 = tf.constant(0, dtype=tf.int32)
    state1 = (buffer1, index1)
    
    # Step 1: Push all elements into stack 1
    for element in arr:
        state1 = stack_push(state1, element)
    
    buffer2 = tf.zeros(arr_shape, dtype=tf.float32)
    index2 = tf.constant(0, dtype=tf.int32)
    state2 = (buffer2, index2)
    
    # Step 2: Transfer all elements to stack 2
    for _ in tf.range(arr_shape[0]):
        state1, element = stack_pop(state1)
        state2 = stack_push(state2, element)
    
    # Return buffer of stack 2
    return state2[0]

arr = tf.Variable([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3],
    [4,4,4,4],
], dtype=tf.float32)

with tf.GradientTape() as tape:
    new_arr = reverse_list(arr)

print(new_arr)
print(tape.gradient(new_arr, arr))

tf.Tensor(
[[4. 4. 4. 4.]
 [3. 3. 3. 3.]
 [2. 2. 2. 2.]
 [1. 1. 1. 1.]], shape=(4, 4), dtype=float32)
tf.Tensor(
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]], shape=(4, 4), dtype=float32)


## Backward pass
To demonstrate the working of backward pass, we give a reversed target array `reversed_arr` to the algorithm and a learnable `input_arr`. The algorithm must learn the `input_arr` using gradients.

In [6]:
opt = tf.keras.optimizers.Adam(1e-1)

@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        y_ = reverse_list(x)
        loss = tf.nn.l2_loss(y - y_)
        
    grads = tape.gradient(loss, x)
    opt.apply_gradients(zip([grads], [x]))
    
    return loss

input_arr = tf.Variable([
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1]
], dtype=tf.float32)
reversed_arr = tf.constant([
    [4,4,4,4],
    [3,3,3,3],
    [2,2,2,2],
    [1,1,1,1],
], dtype=tf.float32)

for i in range(100):
    loss = train_step(input_arr, reversed_arr)
    if i % 10 == 0:
        tf.print(loss)
tf.print(tf.round(input_arr))

28
10.2250357
2.75192738
0.439089298
0.129742727
0.0685746
0.0525279418
0.0155251706
0.000200064795
0.00172046362
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]
 [4 4 4 4]]
