# Keras/Tensorflow implementation of the Deep Adaptive Input Normalization layer for Time Series Forecasting

This notebook contains the Keras/Tensorflow Layer implementation of the Deep Adaptive Input Normalization model  for Time Series Forecasting proposed by Passalis *et al.* ([Deep Adaptive Input Normalization for Time series Forecasting](https://arxiv.org/pdf/1902.07892.pdf)).

The authors of the above mentioned paper propose a PyTorch implementation ([PyTorch implementation](https://github.com/passalis/dain)) of the model. A slightly reviewed version (software structure) is here reported. Results obtained by the two implementations are compared through an explicative example.

### Keras/Tensorflow implementation

In [85]:
import tensorflow as tf
tf.keras.backend.set_floatx('float64')

In [212]:
class Adaptive_Normalizer_Layer(tf.keras.layers.Layer):
    def __init__(self, mode = 'full', input_dim = 5):
        super(Adaptive_Normalizer_Layer, self).__init__()
        
        '''
        PARAMETERS
        
        :param mode: Type of normalization to be performed.
                        - 'adaptive_average' performs the adaptive average of the inputs
                        - 'adaptive_scale' performs the adaptive z-score normalization of the inputs
                        - 'full' (Default) performs the complete normalization process: adaptive_average + adaptive_scale + gating
        :param input_dim: Number of rows in each batch
        '''
        
        self.mode = mode
        self.x = None

        self.eps = 1e-8
        
        initializer = tf.keras.initializers.Identity()
        gate_initializer =  tf.keras.initializers.GlorotNormal()
        bias_initializer = tf.keras.initializers.RandomNormal()
        self.linear_1 = tf.keras.layers.Dense(input_dim, kernel_initializer=initializer, use_bias=False)
        self.linear_2 = tf.keras.layers.Dense(input_dim, kernel_initializer=initializer, use_bias=False)
        self.linear_3 = tf.keras.layers.Dense(input_dim, kernel_initializer=gate_initializer, bias_initializer=gate_initializer)

    def call(self, inputs):
        # Expecting (n_samples, dim, n_feature_vectors)
        
        def adaptive_avg(inputs):
        
            avg = tf.keras.backend.mean(inputs, 2)
            adaptive_avg = self.linear_1(avg)
            adaptive_avg = tf.keras.backend.reshape(adaptive_avg, (tf.shape(inputs)[0].numpy(), tf.shape(inputs)[1].numpy(), 1))
            x = inputs - adaptive_avg
            
            return x
        
        def adaptive_std(x):
        
            std = tf.keras.backend.mean(x ** 2, 2)
            std = tf.keras.backend.sqrt(std + self.eps)
            adaptive_std = self.linear_2(std)
            adaptive_std = tf.where(tf.math.less_equal(adaptive_std, self.eps), 1, adaptive_std)
            adaptive_std = tf.keras.backend.reshape(adaptive_std, (tf.shape(inputs)[0].numpy(), tf.shape(inputs)[1].numpy(), 1))
            x = x / (adaptive_std)
            
            return x
        
        def gating(x):
            
            gate = tf.keras.backend.mean(x, 2)
            gate = self.linear_3(gate)
            gate = tf.math.sigmoid(gate)
            gate = tf.keras.backend.reshape(gate, (tf.shape(inputs)[0].numpy(), tf.shape(inputs)[1].numpy(), 1))
            x = x * gate
            
            return x
        
        if self.mode == None:
            pass
        
        elif self.mode == 'adaptive_average':
            self.x = adaptive_avg(inputs)
            
        elif self.mode == 'adaptive_scale':
            self.x = adaptive_avg(inputs)
            self.x = adaptive_std(x)
            
        elif self.mode == 'full':
            self.x = adaptive_avg(inputs)
            self.x = adaptive_std(self.x)
            self.x = gating(self.x)
        
        else:
            assert False

        return self.x

Let's now propose a shallow experiment to test the *Adaptive_Normalizer_Layer*. 

In [213]:
example_tensor = tf.constant([
  [[0.0, 1.0, 2.0, 3.0, 4.0],
   [5.0, 6.0, 7.0, 8.0, 9.0]],
  [[10.0, 11.0, 12.0, 13.0, 14.0],
   [15.0, 16.0, 17.0, 18.0, 19.0]],
  [[20.0, 21.0, 22.0, 23.0, 24.0],
   [25.0, 26.0, 27.0, 28.0, 29.0]],], dtype=np.float64)

keras_layer = Adaptive_Normalizer_Layer()
example_tensor = tf.transpose(example_tensor, perm=[0, 2, 1])

In [214]:
output = keras_layer(example_tensor)

In [215]:
output = tf.transpose(output, perm=[0, 2, 1])
output

<tf.Tensor: id=2926, shape=(3, 2, 5), dtype=float64, numpy=
array([[[-0.3957139 , -0.50543095, -0.60220335, -0.70781163,
         -0.3916883 ],
        [ 0.3957139 ,  0.50543095,  0.60220335,  0.70781163,
          0.3916883 ]],

       [[-0.3957139 , -0.50543095, -0.60220335, -0.70781163,
         -0.3916883 ],
        [ 0.3957139 ,  0.50543095,  0.60220335,  0.70781163,
          0.3916883 ]],

       [[-0.3957139 , -0.50543095, -0.60220335, -0.70781163,
         -0.3916883 ],
        [ 0.3957139 ,  0.50543095,  0.60220335,  0.70781163,
          0.3916883 ]]])>

### PyTorch implementation

In [216]:
import torch
import torch.nn as nn

import numpy as np
import torch.nn.functional as F 

In [217]:
class DAIN_Layer(nn.Module):
    def __init__(self, mode='full', mean_lr=0.00001, gate_lr=0.001, scale_lr=0.00001, input_dim=5):
        super(DAIN_Layer, self).__init__()
        
        #print('Mode = ', mode)
        
        self.mode = mode
        self.mean_lr = mean_lr
        self.gate_lr = gate_lr
        self.scale_lr = scale_lr
        
        # Parameters for adaptive average
        self.mean_layer = nn.Linear(input_dim, input_dim, bias = False)
        self.mean_layer.weight.data = torch.FloatTensor(data = np.eye(input_dim, input_dim))
        
        # Parameters for adaptive std
        self.scaling_layer = nn.Linear(input_dim, input_dim, bias = False)
        self.scaling_layer.weight.data = torch.FloatTensor(data = np.eye(input_dim, input_dim))
        
        # Parameters for adaptive scaling
        self.gating_layer = nn.Linear(input_dim, input_dim)
        #self.gating_layer.weight.data = torch.FloatTensor(data = np.eye(input_dim, input_dim))
        
        self.eps = 1e-8
        
    def forward(self, x):
        # Expecting (window_length, batch_size, n_features)
        # [batch_size, rows, columns]

        def adaptive_avg(x):
            avg = torch.mean(x, 2)
            print(avg)
            print(avg.shape)
            adaptive_avg = self.mean_layer(avg)
            adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
            x = x - adaptive_avg
            return x

        def adaptive_std(x):
            std = torch.mean(x ** 2, 2)
            std = torch.sqrt(std + self.eps)
            adaptive_std = self.scaling_layer(std)
            adaptive_std[adaptive_std <= self.eps] = 1
            
            adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1)
            x = x / (adaptive_std)
            return x

        def gating(x):
            avg = torch.mean(x,2)
            print(avg)
            avg = self.gating_layer(avg)
            print(avg)
            gate = F.sigmoid(avg)
            gate = gate.resize(gate.size(0), gate.size(1), 1)
            x = x * gate
            return x
        
        # Nothing to normalize
        if self.mode == None:
            pass
        
        # Do simple average normalization
        elif self.mode == 'avg':
            avg = avg.resize(avg.size(0), avg.size(1), 1)
            x = x - avg
        
        # Perform only the adaptive averaging step
        elif self.mode == 'adaptive_avg':
            x = adaptive_avg(x)
            
        # Perform the adaptive averaging + adaptive scaling
        elif self.mode == 'adaptive_scale':
            
            # Step 1
            x = adaptive_avg(x)
            # Step 2
            x = adaptive_std(x)
        
        # Perform the adaptive averaging + adaptive scaling + gating
        elif self.mode == 'full':
            
            # Step 1:
            x = adaptive_avg(x)
            # Step 2:
            x = adaptive_std(x)
            # Step 3
            x = gating(x)
            
        else:
            assert False
            
        return x

Let's now propose a shallow experiment to test the *DAIN_Layer* and to compare obtained results with the ones achieved by the *Adaptive_Normalizer_Layer*.

In [218]:
example_tensor = torch.tensor([
  [[0.0, 1.0, 2.0, 3.0, 4.0],
   [5.0, 6.0, 7.0, 8.0, 9.0]],
  [[10.0, 11.0, 12.0, 13.0, 14.0],
   [15.0, 16.0, 17.0, 18.0, 19.0]],
  [[20.0, 21.0, 22.0, 23.0, 24.0],
   [25.0, 26.0, 27.0, 28.0, 29.0]],])

torch_layer = DAIN_Layer(mode='full')
example_tensor.shape

torch.Size([3, 2, 5])

In [219]:
example_tensor.shape

torch.Size([3, 2, 5])

In [220]:
example_tensor = example_tensor.transpose(1, 2)
output = torch_layer(example_tensor)

tensor([[ 2.5000,  3.5000,  4.5000,  5.5000,  6.5000],
        [12.5000, 13.5000, 14.5000, 15.5000, 16.5000],
        [22.5000, 23.5000, 24.5000, 25.5000, 26.5000]])
torch.Size([3, 5])
tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]], grad_fn=<MeanBackward1>)
tensor([[-0.4400,  0.0863,  0.1173, -0.2174, -0.0367],
        [-0.4400,  0.0863,  0.1173, -0.2174, -0.0367],
        [-0.4400,  0.0863,  0.1173, -0.2174, -0.0367]],
       grad_fn=<AddmmBackward>)


In [221]:
output.transpose(1, 2)

tensor([[[-0.3917, -0.5216, -0.5293, -0.4459, -0.4908],
         [ 0.3917,  0.5216,  0.5293,  0.4459,  0.4908]],

        [[-0.3917, -0.5216, -0.5293, -0.4459, -0.4908],
         [ 0.3917,  0.5216,  0.5293,  0.4459,  0.4908]],

        [[-0.3917, -0.5216, -0.5293, -0.4459, -0.4908],
         [ 0.3917,  0.5216,  0.5293,  0.4459,  0.4908]]],
       grad_fn=<TransposeBackward0>)

### Conclusions

It is possible to note how the two implementations achieve very similar results. The differences are justified by the initialization parameters adopted in the Dense/Linear gating layer.

### Disclaimer

The author of this notebook just implemented the Keras/Tensorflow version of a model originally defined in Passalis *et al.* ([Deep Adaptive Input Normalization for Time series Forecasting](https://arxiv.org/pdf/1902.07892.pdf)).