# Using Identity Links Deep Neural Networks

Shallow Deep neural networks are the 'defacto' model for structured data consisting only of numeric data (e.g., sensor measurements).

When numeric data columns are not normalized within the same range, we find that many times a deep neural network will either not converge, or overfit to the training data. We will demonstrate that a shallow deep neural network can be trained to converge without overfitting on non-normalized numeric data by adding an identity link from the input vector to each shallow layer.

We observe from this technique the following:

    1. Introduces regularization into the model (preventing overfitting).
    2. Adds stability to the slope of the valuation loss/accuracy with non-normalized data.
    
While the identity link adds some parameters, we add it as a concatenation vector operation. This is a fast operation, and only adds a nominal number of parameters at each layer.

## Datasets

In this notebook, we use two well-known numeric only datasets: 'iris' and 'wine' for demonstration. Both datasets consists of numeric data only with three output classes. 

### Iris Dataset

This dataset consists of 150 examples for three output classes (50 each). The data consists of 4 numeric columns.

See (UCI Machine Learning Repository for more details)[https://archive.ics.uci.edu/ml/datasets/iris].

In [31]:
from sklearn import datasets
iris = datasets.load_iris()
X_iris = iris.data
Y_iris = iris.target

### Wine Dataset

This dataset consists of 178 examples for three output classes (~60 each). The data consists of 13 numeric columns.

See (UCI Machine Learning Repository for more details)[https://archive.ics.uci.edu/ml/datasets/wine].

In [47]:
wine = datasets.load_wine()
X_wine = wine.data
Y_wine = wine.target

## Models

For both datasets, we will use the same shallow deep model, with the only difference being the input vector:

        input layer : 10 node dense layer
        hidden layer: 10 node dense layer
        output layer: 3 node dense layer
        
We will train using non-normalized data on two versions of the model for each dataset. In the first version, we will train w/o the identity linkand then with the identity link.

### Iris Model

    iris_model  : model w/o identity link
    iris_modelx : model with identity link

In [39]:
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Concatenate

inputs = Input(shape=(4,))
x = Dense(10, activation='relu')(inputs)
x = Dense(10, activation='relu')(x)
outputs = Dense(3, activation='softmax')(x)
iris_model = Model(inputs, outputs)
iris_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
iris_model.summary()

Model: "model_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_11 (InputLayer)        [(None, 4)]               0         
_________________________________________________________________
dense_30 (Dense)             (None, 10)                50        
_________________________________________________________________
dense_31 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_32 (Dense)             (None, 3)                 33        
Total params: 193
Trainable params: 193
Non-trainable params: 0
_________________________________________________________________


In [40]:
inputs = Input(shape=(4,))
x = Dense(10, activation='relu')(inputs)
x = Concatenate()([inputs, x])
x = Dense(10, activation='relu')(x)
x = Concatenate()([inputs, x])
outputs = Dense(3, activation='softmax')(x)
iris_modelx = Model(inputs, outputs)
iris_modelx.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
iris_modelx.summary()

Model: "model_11"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_12 (InputLayer)           [(None, 4)]          0                                            
__________________________________________________________________________________________________
dense_33 (Dense)                (None, 10)           50          input_12[0][0]                   
__________________________________________________________________________________________________
concatenate_10 (Concatenate)    (None, 14)           0           input_12[0][0]                   
                                                                 dense_33[0][0]                   
__________________________________________________________________________________________________
dense_34 (Dense)                (None, 10)           150         concatenate_10[0][0]      

### Wine Model

    wine_model  : model w/o identity link
    wine_modelx : model with identity link

In [41]:
inputs = Input(shape=(13,))
x = Dense(10, activation='relu')(inputs)
x = Dense(10, activation='relu')(x)
outputs = Dense(3, activation='softmax')(x)
wine_model = Model(inputs, outputs)
wine_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
wine_model.summary()

Model: "model_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_13 (InputLayer)        [(None, 13)]              0         
_________________________________________________________________
dense_36 (Dense)             (None, 10)                140       
_________________________________________________________________
dense_37 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_38 (Dense)             (None, 3)                 33        
Total params: 283
Trainable params: 283
Non-trainable params: 0
_________________________________________________________________


In [42]:
inputs = Input(shape=(13,))
x = Dense(10, activation='relu')(inputs)
x = Concatenate()([inputs, x])
x = Dense(10, activation='relu')(x)
x = Concatenate()([inputs, x])
outputs = Dense(3, activation='softmax')(x)
wine_modelx = Model(inputs, outputs)
wine_modelx.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
wine_modelx.summary()

Model: "model_13"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_14 (InputLayer)           [(None, 13)]         0                                            
__________________________________________________________________________________________________
dense_39 (Dense)                (None, 10)           140         input_14[0][0]                   
__________________________________________________________________________________________________
concatenate_12 (Concatenate)    (None, 23)           0           input_14[0][0]                   
                                                                 dense_39[0][0]                   
__________________________________________________________________________________________________
dense_40 (Dense)                (None, 10)           240         concatenate_12[0][0]      

## Results

Each traing session will train for 30 epochs for stocastic gradient descent (batch=1).

### Iris w/o identity Link

In [44]:
 iris_model.fit(X_iris, Y_iris, epochs=30, batch_size=1, verbose=1, validation_split=0.2)

Train on 120 samples, validate on 30 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7fa9c897b0f0>

In [45]:
iris_modelx.fit(X_iris, Y_iris, epochs=30, batch_size=1, verbose=1, validation_split=0.2)

Train on 120 samples, validate on 30 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7fa9e931a7b8>

### Wine Results

In [48]:
wine_model.fit(X_wine, Y_wine, epochs=30, batch_size=1, verbose=1, validation_split=0.2)

Train on 142 samples, validate on 36 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7fa9e91d8198>

In [49]:
wine_modelx.fit(X, Y, epochs=30, batch_size=1, verbose=1, validation_split=0.1)

Train on 160 samples, validate on 18 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7fa9e9470f98>