<a href="https://colab.research.google.com/github/ak-7/TCN-Layer/blob/main/Experiment_with_parameters_of_TCN_layer_MNIST_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Solving Sequential MNIST with Temporal Convolutional Networks(TCNs)

- Sequential MNIST: Based on the work of [Aymeric Damien](https://github.com/aymericdamien/TensorFlow-Examples/) and [Sungjoon](https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/rnn_mnist_simple.ipynb)
- Temporal Convolutional Networks: [Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling.](http://arxiv.org/abs/1803.01271)

### MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.

More info: http://yann.lecun.com/exdb/mnist/

### Temporal Convolutional Networks Overview

![TCNs](https://cdn-images-1.medium.com/max/1000/1*1cK-UEWHGaZLM-4ITCeqdQ.png)

In [None]:
32 x 128 x 256



input : 32 x 128 x 256
kernel, stride = 16, 16
conv1: 32 X 8 X 256
kernel_stride = 2, 1
conv2: 32 X 7 X 256


S1 S2 ..................... S128


C1 C1 C1 C1 ... C2 C2..             C7 C7 C7..

k=16,s=16
S1 S2 S3 ..... S128
(conv1) S0 - S16 | S16 - S32 | S32 - S48 | S48 - S47
(conv2) S0 S32 S16 - 48


S1 S2 S3 ..... S128
(conv1) S0 - S7 | S8 - S15 | S16 - S31 | S32 - S47
(conv2) S0 S15 S16 - 48


x: 32 x 8 x 256 (repeat 16 times) 32 x 128 x 256


input : 32 x 86 x 256
repeat 85th frame input: 32 x 96 x 256

x: 32 x 6 x 256 (repeat 16 times) 32 x 96 x 256

# Remove last 10 frames from x + original input


SyntaxError: ignored

## System Information

In [None]:
## TCN experiments

In [5]:
 !pip install tensorflow==1.14



In [6]:
from pathlib import Path
import random 
from datetime import datetime

import tensorflow as tf
import numpy as np

# Import MNIST data
# from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

In [7]:
kernel_size_1=4
strides_1=4
kernel_size_2=8
strides_2=4
conv_maps=256

In [None]:
kernel_size_1=20
strides_1=4
kernel_size_2=4
strides_2=3
conv_maps=256

In [None]:
kernel_size_1=20
strides_1=4
kernel_size_2=2
strides_2=3
conv_maps=256

In [None]:
kernel_size_1=84
strides_1=4
kernel_size_2=8
strides_2=4
conv_maps=256

In [45]:
class CausalConv1D(tf.layers.Conv1D):
    def __init__(self, filters,
               kernel_size,
               strides=1,
               dilation_rate=1,
               activation=None,
               use_bias=True,
               kernel_initializer=None,
               bias_initializer=tf.zeros_initializer(),
               kernel_regularizer=None,
               bias_regularizer=None,
               activity_regularizer=None,
               kernel_constraint=None,
               bias_constraint=None,
               trainable=True,
               name=None,
               **kwargs):
        super(CausalConv1D, self).__init__(
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding='valid',
            data_format='channels_last',
            dilation_rate=dilation_rate,
            activation=activation,
            use_bias=use_bias,
            kernel_initializer=kernel_initializer,
            bias_initializer=bias_initializer,
            kernel_regularizer=kernel_regularizer,
            bias_regularizer=bias_regularizer,
            activity_regularizer=activity_regularizer,
            kernel_constraint=kernel_constraint,
            bias_constraint=bias_constraint,
            trainable=trainable,
            name=name, **kwargs
        )
       
    def call(self, inputs, pad=False):
        return super(CausalConv1D, self).call(inputs)



class TemporalBlock(tf.layers.Layer):
  def __init__(self, n_outputs, kernel_size, strides, dilation_rate, dropout=0.2,
               trainable=True, name=None, dtype=None,
               activity_regularizer=None, **kwargs):
    super(TemporalBlock, self).__init__(
      trainable=trainable, dtype=dtype,
      activity_regularizer=activity_regularizer,
      name=name, **kwargs
    )
    self.dropout = dropout
    self.n_outputs = n_outputs

    # Kernel size for first layer
    self.kernel_size_1 = kernel_size_1
    self.strides_1 = strides_1

    ## shift length = stride 1 * stride 2
    ## block size = kernel_size1 * kernel_size_2

    # Kernel size for second layer
    self.kernel_size_2 = kernel_size_2
    self.strides_2 = strides_2

    self.conv1 = CausalConv1D(
      n_outputs, self.kernel_size_1, strides=self.strides_1,
      dilation_rate=dilation_rate, activation=tf.nn.relu,
      name="conv1")
    self.conv2 = CausalConv1D(
      n_outputs, self.kernel_size_2, strides=self.strides_2,
      dilation_rate=dilation_rate, activation=tf.nn.relu,
      name="conv2")
    self.down_sample = None

  def build(self, input_shape):
    channel_dim = 2
    self.dropout1 = tf.layers.Dropout(self.dropout, [tf.constant(1), tf.constant(1), tf.constant(self.n_outputs)])
    self.dropout2 = tf.layers.Dropout(self.dropout, [tf.constant(1), tf.constant(1), tf.constant(self.n_outputs)])
    if input_shape[channel_dim] != self.n_outputs:
      # self.down_sample = tf.layers.Conv1D(
      #     self.n_outputs, kernel_size=1,
      #     activation=None, data_format="channels_last", padding="valid")
      self.down_sample = tf.layers.Dense(self.n_outputs, activation=None)
    self.built = True

  def repeat(self, x, num_repetitions):
    # Repeat x element num_repititions times

    N = num_repetitions
    K = tf.shape(x)[1]
    order = tf.range(0, N * K, K)
    K_array = tf.range(0, K)

    x_ = tf.expand_dims(order, 0)
    y_ = tf.expand_dims(K_array, 1)
    z = tf.reshape(tf.add(x_, y_), [-1, N])
    indices = tf.reshape(z, [-1])
    x_rep = tf.gather(tf.tile(x, [1, N, 1]), indices, axis=1)
    return x_rep

  def call(self, inputs, training=True):
    input_length = tf.shape(inputs)[1]
    #print('\nStep1: Input size: ', input_length)
    # Transform the input block to multiple of shift len. That involves repeating the last element x number of times
    shift_len = self.strides_1 * self.strides_2
    # if input_length % shift_len is not None: len_repeat = 0:
    len_repeat = shift_len - input_length % shift_len
    i_ = self.repeat(inputs[:, input_length - 1:, :], len_repeat)
    m_inputs = tf.concat([inputs, i_], 1)
    print('Modified inputs: ', m_inputs.shape, 'original: ', inputs.shape)


    # Append k1 - s1 frames at beginning for overlapping convolutions came
    if self.kernel_size_1 > self.strides_1:
      len_repeat = self.kernel_size_1 - self.strides_1
      print("Add in the beginning: ", len_repeat)
      i_ = self.repeat(inputs[:, :1, :], len_repeat)
      m_inputs = tf.concat([i_, m_inputs], 1)
      print('Step 1.5 Modified inputs: ', m_inputs.shape)
    

    # then cut what you added in beginning
    #print('\n\nStep2: Perform dilated convolutions.....')
    # Perform dilated convolutions in residual block
    x = self.conv1(m_inputs)
    x = self.dropout1(x, training=training)
    print('X1 shape: ', x.shape) 
    x = self.conv2(x)
    LayerNorm = LayerNormalization()
    y1 = LayerNorm(x)
    
    
    y2 = tf.contrib.layers.layer_norm(x, begin_norm_axis=1, center=False, scale=False, trainable=False)
    return y1, y2
    x = self.dropout2(x, training=training)
    print('X2 shape: ', x.shape) 

    # print('\n\nStep3: Transform to input size and perform addition.....')
    # Transform output of dilated convolutions to input size. This involves repeating first block x times and the next blocks shift_len times. It also involves removing last x elements
    next_blocks_repeat = self.repeat(x, shift_len)
    num_repeat = tf.shape(m_inputs)[1] - tf.shape(next_blocks_repeat)[1]
    print(m_inputs.shape, next_blocks_repeat.shape)
    first_block_repeat = self.repeat(x[:, :1, :],num_repeat)

    x = tf.concat([first_block_repeat, next_blocks_repeat], 1)
    x = x[:, :input_length, :]
    #print('Final X shape after repeating', x.shape)
    if self.down_sample is not None:
      inputs = self.down_sample(inputs)
    return tf.nn.relu(x + inputs)

In [65]:
class LayerNormalization(tf.layers.Layer):
    """Applies layer normalization."""

    def __init__(self, hidden_size=256):
        super(LayerNormalization, self).__init__()
        self.hidden_size = hidden_size

    def build(self, _):
        self.scale = tf.get_variable("layer_norm_scale", [self.hidden_size],
                                     initializer=tf.ones_initializer())
        self.bias = tf.get_variable("layer_norm_bias", [self.hidden_size],
                                    initializer=tf.zeros_initializer())
        self.built = True
    def cum_mean(self, arr):
        dim = 1
        cum_sum = tf.math.cumsum(arr, axis=dim)   
        length_tensor = tf.range(1, tf.shape(cum_sum)[dim] + 1)
        length_tensor = tf.cast(length_tensor, tf.float32)
        cum_sum = cum_sum / length_tensor
        return cum_sum
      
    def calculate_expanding_mean(self, x):
        mean_feature_maps = tf.reduce_mean(x, axis=[-1])
        print('Step 1 mean: ', mean_feature_maps.shape)
        mean = self.cum_mean(mean_feature_maps)
        # print('Step 2 mean: ', mean.shape)
        mean = tf.expand_dims(mean, axis=2)
        mean = tf.tile(mean, [1, 1, tf.shape(x)[2]])
        return mean
    
    def call(self, x, epsilon=1e-6):
        mean = self.calculate_expanding_mean(x)
        print('Step 1 mean: ', mean.shape)

        epsilon = 0
        norm_x_intermediate = tf.square(x - mean)
        # variance = self.calculate_expanding_mean(norm_x_intermediate)
        variance = tf.reduce_mean(tf.square(x - mean), axis=[-1], keepdims=True)
        
        print('Step 2 variance: ', variance.shape)
        norm_x = (x - mean) * tf.rsqrt(variance + epsilon)
        return norm_x
        return norm_x * self.scale + self.bias

In [66]:
tf.reset_default_graph()
tf.compat.v1.random.set_random_seed(1234)
with tf.Graph().as_default() as g:
    x = tf.random_normal((1, 45, 256)) # (batch_size, length, channel)
    # n_outputs, kernel_size, strides, dilation_rate
    tblock = TemporalBlock(n_outputs=256, kernel_size=2, strides=1, dilation_rate=1)
    output = tblock(x, training=tf.constant(True))
    init = tf.global_variables_initializer()
    
    
with tf.Session(graph=g) as sess:
    sess.run(init)
    res = sess.run(output)
    # print(res.shape)
    o1, o2 = res
    print('Vineet:', o1.shape) 
    print('TF layer norm:', o2.shape)
    print("\n\n")
    print(type(o1[0][0][0]), type(o2[0][0][1]))
    print(o1[0, -1, 240:])
    print("\n\n")
    print(o2[0, -1, 240:])
    
    print("\n\n")
    print(o1[0, -2, 240:])
    print("\n\n")
    print(o2[0, -2, 240:])

Modified inputs:  (1, 48, 256) original:  (1, 45, 256)
X1 shape:  (1, 12, 256)
Step 1 mean:  (1, 2)
Step 1 mean:  (1, 2, 256)
Step 2 variance:  (1, 2, 1)
Vineet: (1, 2, 256)
TF layer norm: (1, 2, 256)



<class 'numpy.float32'> <class 'numpy.float32'>
[ 0.69518757  1.3975507  -0.649979   -0.00726203 -0.5186878  -0.649979
  1.8926712   3.1499934   2.7983959   1.098325   -0.649979   -0.649979
 -0.649979   -0.38057926 -0.649979   -0.649979  ]



[ 0.67207235  1.3510818  -0.6283671  -0.00702065 -0.5014414  -0.6283671
  1.8297393   3.0452554   2.7053485   1.0618055  -0.6283671  -0.6283671
 -0.6283671  -0.367925   -0.6283671  -0.6283671 ]



[-0.64640695  1.324365   -0.64640695  0.7535492  -0.64640695 -0.64640695
 -0.64640695  0.9410761   0.22900899  0.3485515  -0.45108992 -0.64640695
  0.47109133  0.86711794 -0.63364714 -0.64640695]



[-0.6283671   1.4044166  -0.6283671   0.8156397  -0.6283671  -0.6283671
 -0.6283671   1.0090673   0.27459443  0.3978985  -0.42690432 -0.6283671
  0.5242941  

In [14]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((10, 45, 256)) # (batch_size, length, channel)
    # n_outputs, kernel_size, strides, dilation_rate
    tblock = TemporalBlock(n_outputs=256, kernel_size=2, strides=1, dilation_rate=1)
    output = tblock(x, training=tf.constant(True))
    init = tf.global_variables_initializer()
    
    
with tf.Session(graph=g) as sess:
    sess.run(init)
    res = sess.run(output)
    print(res.shape)
    print(res)  
    print(res[0, :, 0])
    # print(res[1, :, 1])

Modified inputs:  (10, 48, 256) original:  (10, 45, 256)
X1 shape:  (10, 12, 256)
(10, 2, 256)
[[[ 1.076151    2.4806857   0.3438147  ...  0.78642607  0.77645195
   -0.3247258 ]
  [-0.01145346  0.71144634 -0.07523614 ... -0.6552048  -0.6552048
   -0.6552048 ]]

 [[ 2.142585    0.64131314 -0.54366225 ... -0.7358025  -0.5633334
   -0.53629136]
  [-0.66515595  0.9099496   1.4428179  ... -0.66515595 -0.10320154
   -0.66515595]]

 [[-0.6768589   0.02260999 -0.6768589  ...  2.1301756  -0.6768589
   -0.6768589 ]
  [-0.7614702   1.0422966  -0.28541365 ... -0.09996591  0.5962766
   -0.43815005]]

 ...

 [[ 1.6870708  -0.63675857  0.9309146  ...  1.2461637  -0.63675857
    1.4270027 ]
  [ 0.43117163 -0.5559139   4.4189005  ... -0.6597238  -0.6597238
   -0.6597238 ]]

 [[-0.1445648  -0.64350545  0.95935935 ... -0.64350545 -0.6288977
   -0.64350545]
  [-0.1837035   1.0011969  -0.69185734 ... -0.69185734 -0.69185734
   -0.69185734]]

 [[-0.64604884  0.48686978 -0.586327   ... -0.64604884 -0.6460488

In [None]:
shift_len > kernel_size_1
first block size = 2 * shift

In [None]:
Source TCN output 0 - 32 = Espresso TCN output 0-32

In [None]:
Source TCN output 16 - 48 = Espresso TCN output 16-48

In [None]:
Source TCN output 0 - 48 = Espresso TCN output 1 (0-32) (16-48) doesn't match

In [None]:
# tf.reset_default_graph()
# # tf.random.set_seed(5)
# with tf.Graph().as_default() as g:
#     x = tf.random_normal((1, 48, 256), seed=2)[:] # (batch_size, length, channel)
#     print(x[0, :, 0])
#     # n_outputs, kernel_size, strides, dilation_rate
#     tblock = TemporalBlock(n_outputs=256, kernel_size=2, strides=1, dilation_rate=1)
#     output = tblock(i, training=tf.constant(True))
#     init = tf.global_variables_initializer()
    
    
# with tf.Session(graph=g) as sess:
#     sess.run(init)
#     res = sess.run(output)
#     print(res.shape)   
#     print(res[0, :, 0])
#     # print(res[1, :, 1])

In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((1, 48, 256), seed=2) # (batch_size, length, channel)
    LayerNorm = LayerNormalization()
    x = LayerNorm(x)
    print(x)
    # x = tf.identity(i)
    # n_outputs, kernel_size, strides, dilation_rate
    # tblock = TemporalBlock(n_outputs=256, kernel_size=2, strides=1, dilation_rate=1)
    # output = tblock(x, training=tf.constant(True))
    # init = tf.global_variables_initializer()
    
    
with tf.Session(graph=g) as sess:
    sess.run(init)
    res = sess.run(output)
    print(res.shape)   
    print(res[0, :, 0])
    # print(res[1, :, 1])

Step 1 mean:  (1, 48)
Step 3 mean:  (1, 48)
Tensor("layer_normalization/add_1:0", shape=(1, 48, 256), dtype=float32)


ValueError: ignored

In [4]:
def cum_mean(arr):
    cum_sum = np.cumsum(arr, axis=0, dtype=float)    
    for i in range(cum_sum.shape[0]):       
        if i == 0:
            continue        
        # print(cum_sum[i] / (i + 1))
        cum_sum[i] =  cum_sum[i] / (i + 1)
    return cum_sum
epsilon=1e-6
x = np.random.random((1, 6, 2))
step1_mean = np.mean(x, axis=2)
step1_mean = step1_mean.flatten()

mean = cum_mean(step1_mean)
# repeat
mean = np.expand_dims(mean, axis=[0,2])
mean = np.tile(mean, x.shape[2])

print(x - mean)

[[[ 0.23122015 -0.23122015]
  [ 0.10669028  0.04515204]
  [-0.15216137 -0.10007275]
  [ 0.47764491  0.42627682]
  [-0.07139401 -0.36781364]
  [-0.27442334  0.29475225]]]


In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((2, 47, 256)) # (batch_size, length, channel)
    # n_outputs, kernel_size, strides, dilation_rate
    tblock = TemporalBlock(n_outputs=256, kernel_size=2, strides=1, dilation_rate=1)
    output = tblock(x, training=tf.constant(True))
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    sess.run(init)
    res = sess.run(output)
    print(res.shape)   
    print(res[0, :, 0])
    print(res[1, :, 1])

16
Modified inputs:  (2, 48, 256) original:  (2, 47, 256)
X1 shape:  (2, 8, 256)
X2 shape:  (2, 2, 256)
(2, 48, 256) (2, 32, 256)
(2, 47, 256)
[0.0000000e+00 0.0000000e+00 1.7744056e+00 9.0063453e-01 1.2888507e+00
 6.2661827e-01 9.1972822e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 1.7668011e-03 1.7425556e+00 0.0000000e+00 7.1024650e-01
 8.6270563e-02 1.9126055e+00 6.6705036e-01 1.0853175e-01 0.0000000e+00
 1.9394204e-01 3.1168628e+00 6.4455308e-02 2.2087263e-01 0.0000000e+00
 0.0000000e+00 0.0000000e+00 7.3721683e-01 6.9630042e-02 9.0151519e-01
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 2.0612460e-02 0.0000000e+00 0.0000000e+00 7.9387873e-01
 0.0000000e+00 3.7305686e-01 6.3145775e-01 0.0000000e+00 0.0000000e+00
 3.5856661e-01 0.0000000e+00]
[0.7978143  0.         1.0751885  1.1837944  0.13714105 0.33656406
 1.3389802  0.         0.         1.4031792  1.4371414  1.9136165
 0.36016485 0.         1.2857449  0.         0.        

In [None]:
## END

## Experiment more below:
import itertools

In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((32, 8, 256)) # (batch_size, length, channel)
    
with tf.Session(graph=g) as sess:

    # Run the initializer
    N = tf.constant(8)
    # M = 3
    K = x.shape[1] # for here 3
    order = tf.range(0, N*K, K)
    K_array = tf.range(0, K)
    # print('Order: ', sess.run(order))
    # print('K: ', sess.run(K_array))

    x_ = tf.expand_dims(order, 0)
    y_ = tf.expand_dims(K_array, 1)
    # print(x_.shape, y_.shape, tf.add(x_, y_).shape)
    z = tf.reshape(tf.add(x_, y_), [-1, N])
    print('Sum: ', sess.run(z), z.shape)
    indices = tf.reshape(z, [-1])
    print(sess.run(indices), indices.shape)

    #
    # For checking purposes
    #
    # order = list(range(0, M*K, K))
    # order = [[x+i for x in order] for i in range(K)]
    # order = list(itertools.chain.from_iterable(order))
    # print('Should be', order)
    # x_rep = tf.gather(tf.tile(x, [1, N, 1]), indices, axis=1)
    # print(x_rep.shape)

    s = tf.reduce_sum(x, axis=(0, 2))
    s = sess.run(s)
    print('Sum original: ', s.shape, s)

    # s = tf.reduce_sum(x_rep, axis=(0, 2))
    # s = sess.run(s)
    # print('Sum original: ', s.shape, s)

Sum:  [[ 0  8 16 24 32 40 48 56]
 [ 1  9 17 25 33 41 49 57]
 [ 2 10 18 26 34 42 50 58]
 [ 3 11 19 27 35 43 51 59]
 [ 4 12 20 28 36 44 52 60]
 [ 5 13 21 29 37 45 53 61]
 [ 6 14 22 30 38 46 54 62]
 [ 7 15 23 31 39 47 55 63]] (8, 8)
[ 0  8 16 24 32 40 48 56  1  9 17 25 33 41 49 57  2 10 18 26 34 42 50 58
  3 11 19 27 35 43 51 59  4 12 20 28 36 44 52 60  5 13 21 29 37 45 53 61
  6 14 22 30 38 46 54 62  7 15 23 31 39 47 55 63] (64,)
Sum original:  (8,) [ 47.629395 -35.18349   46.662994 -68.17631   80.89191  175.77397
 117.67242   55.33221 ]


In [None]:
class TemporalConvNet(tf.layers.Layer):
    def __init__(self, num_channels, kernel_size=2, dropout=0.2,
                 trainable=True, name=None, dtype=None, 
                 activity_regularizer=None, **kwargs):
        super(TemporalConvNet, self).__init__(
            trainable=trainable, dtype=dtype,
            activity_regularizer=activity_regularizer,
            name=name, **kwargs
        )
        self.layers = []
        num_levels = len(num_channels)
        for i in range(num_levels):
            dilation_size = 2 ** i
            out_channels = num_channels[i]
            self.layers.append(
                TemporalBlock(out_channels, kernel_size, strides=1, dilation_rate=dilation_size,
                              dropout=dropout, name="tblock_{}".format(i))
            )
    
    def call(self, inputs, training=True):
        outputs = inputs
        for layer in self.layers:
            outputs = layer(outputs, training=training)
        return outputs


tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((32, 128, 280)) # (batch_size, length, channel)
    tcn = TemporalConvNet([8, 8, 8, 8], 2, 0.25)
    output = tcn(x, training=tf.constant(True))
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res = sess.run(output)
    print(res.shape)   
    print(res[0, :, 0])
    print(res[1, :, 1])

## Building TCNs

###  Causal Convolution

In [None]:
class CausalConv1D(tf.layers.Conv1D):
    def __init__(self, filters,
               kernel_size,
               strides=1,
               dilation_rate=1,
               activation=None,
               use_bias=True,
               kernel_initializer=None,
               bias_initializer=tf.zeros_initializer(),
               kernel_regularizer=None,
               bias_regularizer=None,
               activity_regularizer=None,
               kernel_constraint=None,
               bias_constraint=None,
               trainable=True,
               name=None,
               **kwargs):
        super(CausalConv1D, self).__init__(
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding='valid',
            data_format='channels_last',
            dilation_rate=dilation_rate,
            activation=activation,
            use_bias=use_bias,
            kernel_initializer=kernel_initializer,
            bias_initializer=bias_initializer,
            kernel_regularizer=kernel_regularizer,
            bias_regularizer=bias_regularizer,
            activity_regularizer=activity_regularizer,
            kernel_constraint=kernel_constraint,
            bias_constraint=bias_constraint,
            trainable=trainable,
            name=name, **kwargs
        )
        
    def call(self, inputs):
        padding = (self.kernel_size[0] - 1) * self.dilation_rate[0]
        if self.data_format == 'channels_first':
            inputs = tf.pad(inputs, tf.constant([[0, 0], [0, 0], [padding, 0]], dtype=tf.int32))
        else:
            inputs = tf.pad(inputs, tf.constant([(0, 0,), (padding, 0), (0, 0)]))
        return super(CausalConv1D, self).call(inputs), inputs

In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((32, 10, 4)) # (batch_size, length, channel)
    with tf.variable_scope("tcn"):
        conv = CausalConv1D(8, 3, activation=tf.nn.relu)
    output = conv(x)
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res, inputs = sess.run(output)
    print(inputs.shape)
    print(inputs[0, :, 0])
    print(res.shape)    
    print(res[0, :, 0])

(32, 12, 4)
[ 0.          0.          0.88044167 -1.2032915   0.29814827 -1.1900542
  1.0388981  -0.07884882 -1.2475842  -0.48100722 -1.7068204  -1.0657506 ]
(32, 10, 8)
[0.         0.         1.2657795  1.5949045  0.         0.
 0.         0.         0.27693093 0.        ]


In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.expand_dims(
        tf.expand_dims(tf.constant([1, 0, 0, 1, 0, 0, 1], dtype=tf.float32), axis=0),
        axis=-1) # (batch_size, length, channel)
    with tf.variable_scope("tcn"):
        conv = CausalConv1D(8, 2, dilation_rate=2, activation=None)
    output = conv(x)
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res, inputs = sess.run(output)
    print(inputs.shape)
    print(inputs[0, :, 0])
    print(res.shape)    
    print(res[0, :, 0])

(1, 9, 1)
[0. 0. 1. 0. 0. 1. 0. 0. 1.]
(1, 7, 8)
[0.1447475  0.         0.48786867 0.1447475  0.         0.48786867
 0.1447475 ]


###  Spatial Dropout

Reference: https://stats.stackexchange.com/questions/282282/how-is-spatial-dropout-in-2d-implemented

Actually, simply setting noise_shape in tf.layers.Dropout will do the trick.

In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((32, 4, 10)) # (batch_size, channel, length)
    dropout = tf.layers.Dropout(0.5, noise_shape=[x.shape[0], x.shape[1], tf.constant(1)])
    output = dropout(x, training=True)
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res = sess.run(output)
    print(res.shape)   
    print(res[0, :, :])
    print(res[1, :, :])

(32, 4, 10)
[[ 0.          0.         -0.          0.          0.         -0.
  -0.         -0.          0.         -0.        ]
 [-1.3023647  -1.628788    0.20181039  1.554159    5.3209696   1.7473009
  -0.41964796 -0.6231473  -2.0326247  -0.21218014]
 [ 1.1885451  -1.3423481  -1.3000014  -1.132894   -0.20258099 -1.6488353
   1.3672652   3.4905746  -0.01186325 -0.8049923 ]
 [ 0.          0.          0.         -0.         -0.          0.
   0.          0.          0.         -0.        ]]
[[ 0.8047661  -2.3998349   0.68522704  0.7751469  -0.6081628  -4.7214503
  -1.3095977   0.8691299  -2.2773757  -0.2609347 ]
 [-0.          0.          0.         -0.          0.         -0.
   0.         -0.         -0.          0.        ]
 [ 0.         -0.          0.          0.         -0.          0.
   0.         -0.          0.         -0.        ]
 [ 0.         -0.         -0.          0.          0.         -0.
   0.          0.         -0.          0.        ]]


### Temporal blocks

Note: `tf.contrib.layers.layer_norm` only supports `channels_last`.

In [None]:
# Redefining CausalConv1D to simplify its return values
class CausalConv1D(tf.layers.Conv1D):
    def __init__(self, filters,
               kernel_size,
               strides=1,
               dilation_rate=1,
               activation=None,
               use_bias=True,
               kernel_initializer=None,
               bias_initializer=tf.zeros_initializer(),
               kernel_regularizer=None,
               bias_regularizer=None,
               activity_regularizer=None,
               kernel_constraint=None,
               bias_constraint=None,
               trainable=True,
               name=None,
               **kwargs):
        super(CausalConv1D, self).__init__(
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding='valid',
            data_format='channels_last',
            dilation_rate=dilation_rate,
            activation=activation,
            use_bias=use_bias,
            kernel_initializer=kernel_initializer,
            bias_initializer=bias_initializer,
            kernel_regularizer=kernel_regularizer,
            bias_regularizer=bias_regularizer,
            activity_regularizer=activity_regularizer,
            kernel_constraint=kernel_constraint,
            bias_constraint=bias_constraint,
            trainable=trainable,
            name=name, **kwargs
        )
       
    def call(self, inputs):
        padding = (self.kernel_size[0] - 1) * self.dilation_rate[0]
        inputs = tf.pad(inputs, tf.constant([(0, 0,), (1, 0), (0, 0)]) * padding)
        return super(CausalConv1D, self).call(inputs)

In [None]:
class TemporalBlock(tf.layers.Layer):
    def __init__(self, n_outputs, kernel_size, strides, dilation_rate, dropout=0.2, 
                 trainable=True, name=None, dtype=None, 
                 activity_regularizer=None, **kwargs):
        super(TemporalBlock, self).__init__(
            trainable=trainable, dtype=dtype,
            activity_regularizer=activity_regularizer,
            name=name, **kwargs
        )        
        self.dropout = dropout
        self.n_outputs = n_outputs
        self.conv1 = CausalConv1D(
            n_outputs, kernel_size, strides=strides, 
            dilation_rate=dilation_rate, activation=tf.nn.relu, 
            name="conv1")
        self.conv2 = CausalConv1D(
            n_outputs, kernel_size, strides=strides, 
            dilation_rate=dilation_rate, activation=tf.nn.relu, 
            name="conv2")
        self.down_sample = None

    
    def build(self, input_shape):
        channel_dim = 2
        self.dropout1 = tf.layers.Dropout(self.dropout, [tf.constant(1), tf.constant(1), tf.constant(self.n_outputs)])
        self.dropout2 = tf.layers.Dropout(self.dropout, [tf.constant(1), tf.constant(1), tf.constant(self.n_outputs)])
        if input_shape[channel_dim] != self.n_outputs:
            # self.down_sample = tf.layers.Conv1D(
            #     self.n_outputs, kernel_size=1, 
            #     activation=None, data_format="channels_last", padding="valid")
            self.down_sample = tf.layers.Dense(self.n_outputs, activation=None)
        self.built = True
    
    def call(self, inputs, training=True):
        x = self.conv1(inputs)
        x = tf.contrib.layers.layer_norm(x)
        x = self.dropout1(x, training=training)
        x = self.conv2(x)
        x = tf.contrib.layers.layer_norm(x)
        x = self.dropout2(x, training=training)
        if self.down_sample is not None:
            inputs = self.down_sample(inputs)
        return tf.nn.relu(x + inputs)

In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((32, 10, 4)) # (batch_size, length, channel)
    tblock = TemporalBlock(8, 2, 1, 1)
    output = tblock(x, training=tf.constant(True))
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res = sess.run(output)
    print(res.shape)   
    print(res[0, :, 0])
    print(res[1, :, 1])

(32, 10, 8)
[0.         1.534969   0.         1.4863     2.3868496  0.9489206
 0.         0.42759717 0.         0.        ]
[0.04622638 0.         3.0465143  0.350043   0.         0.42763186
 0.         0.         0.         0.        ]


### Temporal convolutional networks

In [None]:
class TemporalConvNet(tf.layers.Layer):
    def __init__(self, num_channels, kernel_size=2, dropout=0.2,
                 trainable=True, name=None, dtype=None, 
                 activity_regularizer=None, **kwargs):
        super(TemporalConvNet, self).__init__(
            trainable=trainable, dtype=dtype,
            activity_regularizer=activity_regularizer,
            name=name, **kwargs
        )
        self.layers = []
        num_levels = len(num_channels)
        for i in range(num_levels):
            dilation_size = 2 ** i
            out_channels = num_channels[i]
            self.layers.append(
                TemporalBlock(out_channels, kernel_size, strides=1, dilation_rate=dilation_size,
                              dropout=dropout, name="tblock_{}".format(i))
            )
    
    def call(self, inputs, training=True):
        outputs = inputs
        for layer in self.layers:
            outputs = layer(outputs, training=training)
        return outputs

In [None]:
tf.reset_default_graph()
with tf.Graph().as_default() as g:
    x = tf.random_normal((32, 10, 4)) # (batch_size, length, channel)
    tcn = TemporalConvNet([8, 8, 8, 8], 2, 0.25)
    output = tcn(x, training=tf.constant(True))
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res = sess.run(output)
    print(res.shape)   
    print(res[0, :, 0])
    print(res[1, :, 1])

(32, 10, 8)
[0.6988858  0.         0.7061405  0.15213768 0.93309164 0.
 1.3751144  2.6659122  1.8665429  0.        ]
[0.         0.         4.748131   0.24300182 0.         0.
 0.         0.         1.4196995  0.8674514 ]


In [None]:
tf.reset_default_graph()
g = tf.Graph()
with g.as_default():
    Xinput = tf.placeholder(tf.float32, shape=[None, 10, 4])
    tcn = TemporalConvNet([8, 8, 8, 8], 2, 0.25)
    output = tcn(Xinput, training=tf.constant(True))
    print(tcn.layers[0].down_sample)    
    init = tf.global_variables_initializer()
    
with tf.Session(graph=g) as sess:
    # Run the initializer
    sess.run(init)
    res = sess.run(output, {Xinput: np.random.randn(32, 10, 4)})
    print(res.shape)   
    print(res[0, :, 0])
    print(res[1, :, 1])

<tensorflow.python.layers.core.Dense object at 0x7fddee04f940>
(32, 10, 8)
[0.        0.        2.1618829 0.        3.862938  2.824544  4.999326
 1.7338551 6.2948    1.4054765]
[0.        0.        0.        0.        0.        0.        0.
 7.947054  5.4993486 0.       ]


## Sequential MNIST

In [None]:
# Training Parameters
learning_rate = 0.001
batch_size = 64
display_step = 500
total_batch = int(mnist.train.num_examples / batch_size)
print("Number of batches per epoch:", total_batch)
training_steps = 3000

# Network Parameters
num_input = 1 # MNIST data input (img shape: 28*28)
timesteps = 28 * 28 # timesteps
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.1
kernel_size = 8
levels = 6
nhid = 20 # hidden layer num of features

Number of batches per epoch: 859


In [None]:
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
    tf.set_random_seed(10)
    # tf Graph input
    X = tf.placeholder("float", [None, timesteps, num_input])
    Y = tf.placeholder("float", [None, num_classes])
    is_training = tf.placeholder("bool")
    
    # Define weights
    logits = tf.layers.dense(
        TemporalConvNet([nhid] * levels, kernel_size, dropout)(
            X, training=is_training)[:, -1, :],
        num_classes, activation=None, 
        kernel_initializer=tf.orthogonal_initializer()
    )
    prediction = tf.nn.softmax(logits)
   
    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=Y))
    
    with tf.name_scope("optimizer"):
        # optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        # gvs = optimizer.compute_gradients(loss_op)
        # for grad, var in gvs:
        #     if grad is None:
        #         print(var)
        # capped_gvs = [(tf.clip_by_value(grad, -.5, .5), var) for grad, var in gvs]
        # train_op = optimizer.apply_gradients(capped_gvs)    
        train_op = optimizer.minimize(loss_op)

    # Evaluate model (with test logits, for dropout to be disabled)
    correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Initialize the variables (i.e. assign their default value)
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    print("All parameters:", np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.global_variables()]))
    print("Trainable parameters:", np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.trainable_variables()]))

All parameters: 108992.0
Trainable parameters: 36330


In [None]:
# Start training
log_dir = "logs/tcn/%s" % datetime.now().strftime("%Y%m%d_%H%M")
Path(log_dir).mkdir(exist_ok=True, parents=True)
tb_writer = tf.summary.FileWriter(log_dir, graph)
config = tf.ConfigProto()
config.gpu_options.allow_growth = False
best_val_acc = 0.8
with tf.Session(graph=graph, config=config) as sess:
    # Run the initializer
    sess.run(init)
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # print(np.max(batch_x), np.mean(batch_x), np.median(batch_x))
        # Reshape data to get 28 * 28 seq of 1 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, is_training: True})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={
                X: batch_x, Y: batch_y, is_training: False})
            # Calculate accuracy for 128 mnist test images
            test_len = 128
            test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
            test_label = mnist.test.labels[:test_len]
            val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label, is_training: False})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc) + ", Test Accuracy= " + \
                  "{:.3f}".format(val_acc))
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                save_path = saver.save(sess, "/tmp/model.ckpt")
                print("Model saved in path: %s" % save_path)
    print("Optimization Finished!")

Step 1, Minibatch Loss= 3.6770, Training Accuracy= 0.109, Test Accuracy= 0.156
Step 500, Minibatch Loss= 0.1022, Training Accuracy= 0.953, Test Accuracy= 0.945
Model saved in path: /tmp/model.ckpt
Step 1000, Minibatch Loss= 0.2515, Training Accuracy= 0.922, Test Accuracy= 0.992
Model saved in path: /tmp/model.ckpt
Step 1500, Minibatch Loss= 0.0310, Training Accuracy= 0.984, Test Accuracy= 0.992
Step 2000, Minibatch Loss= 0.1406, Training Accuracy= 0.953, Test Accuracy= 0.984
Step 2500, Minibatch Loss= 0.0131, Training Accuracy= 1.000, Test Accuracy= 0.984
Step 3000, Minibatch Loss= 0.0228, Training Accuracy= 1.000, Test Accuracy= 0.984
Step 3500, Minibatch Loss= 0.1202, Training Accuracy= 0.969, Test Accuracy= 1.000
Model saved in path: /tmp/model.ckpt
Step 4000, Minibatch Loss= 0.0847, Training Accuracy= 0.984, Test Accuracy= 1.000
Step 4500, Minibatch Loss= 0.0906, Training Accuracy= 0.953, Test Accuracy= 0.992
Step 5000, Minibatch Loss= 0.0346, Training Accuracy= 0.984, Test Accurac

## Permuted

In [None]:
training_steps = 5000

In [None]:
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
    tf.set_random_seed(10)
    # tf Graph input
    X = tf.placeholder("float", [None, timesteps, num_input])
    Y = tf.placeholder("float", [None, num_classes])
    is_training = tf.placeholder("bool")
    
    # Permute the time step
    np.random.seed(100)
    permute = np.random.permutation(784)
    X_shuffled = tf.gather(X, permute, axis=1)
    
    # Define weights
    logits = tf.layers.dense(
        TemporalConvNet([nhid] * levels, kernel_size, dropout)(
            X_shuffled, training=is_training)[:, -1, :],
        num_classes, activation=None, 
        kernel_initializer=tf.orthogonal_initializer()
    )
    prediction = tf.nn.softmax(logits)
   
    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=Y))
    
    with tf.name_scope("optimizer"):
        # optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        # gvs = optimizer.compute_gradients(loss_op)
        # for grad, var in gvs:
        #     if grad is None:
        #         print(var)
        # capped_gvs = [(tf.clip_by_value(grad, -.5, .5), var) for grad, var in gvs]
        # train_op = optimizer.apply_gradients(capped_gvs)    
        train_op = optimizer.minimize(loss_op)

    # Evaluate model (with test logits, for dropout to be disabled)
    correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Initialize the variables (i.e. assign their default value)
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    print("All parameters:", np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.global_variables()]))
    print("Trainable parameters:", np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.trainable_variables()]))

All parameters: 108992.0
Trainable parameters: 36330


In [None]:
# Start training
log_dir = "logs/tcn/%s" % datetime.now().strftime("%Y%m%d_%H%M")
Path(log_dir).mkdir(exist_ok=True, parents=True)
tb_writer = tf.summary.FileWriter(log_dir, graph)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
best_val_acc = 0.8
with tf.Session(graph=graph, config=config) as sess:
    # Run the initializer
    sess.run(init)
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # print(np.max(batch_x), np.mean(batch_x), np.median(batch_x))
        # Reshape data to get 28 * 28 seq of 1 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, is_training: True})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={
                X: batch_x, Y: batch_y, is_training: False})
            # Calculate accuracy for 128 mnist test images
            test_len = 128
            test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
            test_label = mnist.test.labels[:test_len]
            val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label, is_training: False})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc) + ", Test Accuracy= " + \
                  "{:.3f}".format(val_acc))
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                save_path = saver.save(sess, "/tmp/model.ckpt")
                print("Model saved in path: %s" % save_path)
    print("Optimization Finished!")

Step 1, Minibatch Loss= 3.7196, Training Accuracy= 0.125, Test Accuracy= 0.062
Step 500, Minibatch Loss= 0.3547, Training Accuracy= 0.906, Test Accuracy= 0.906
Model saved in path: /tmp/model.ckpt
Step 1000, Minibatch Loss= 0.2223, Training Accuracy= 0.922, Test Accuracy= 0.945
Model saved in path: /tmp/model.ckpt
Step 1500, Minibatch Loss= 0.4307, Training Accuracy= 0.906, Test Accuracy= 0.961
Model saved in path: /tmp/model.ckpt
Step 2000, Minibatch Loss= 0.1025, Training Accuracy= 0.938, Test Accuracy= 0.977
Model saved in path: /tmp/model.ckpt
Step 2500, Minibatch Loss= 0.2563, Training Accuracy= 0.891, Test Accuracy= 0.961
Step 3000, Minibatch Loss= 0.1184, Training Accuracy= 0.969, Test Accuracy= 0.969
Step 3500, Minibatch Loss= 0.1279, Training Accuracy= 0.953, Test Accuracy= 0.969
Step 4000, Minibatch Loss= 0.0419, Training Accuracy= 0.984, Test Accuracy= 0.984
Model saved in path: /tmp/model.ckpt
Step 4500, Minibatch Loss= 0.3604, Training Accuracy= 0.938, Test Accuracy= 0.969