# Using Identity Links Deep Neural Networks

Shallow Deep neural networks are the 'defacto' model for structured data consisting only of numeric data (e.g., sensor measurements).

When numeric data columns are not normalized within the same range, we find that many times a deep neural network will either not converge, or overfit to the training data. We will demonstrate that a shallow deep neural network can be trained to converge without overfitting on non-normalized numeric data by adding an identity link from the input vector to each shallow layer.

We observe from this technique the following:

    1. Introduces regularization into the model (preventing overfitting).
    2. Adds stability to the slope of the valuation loss/accuracy with non-normalized data.
    
While the identity link adds some parameters, we add it as a concatenation vector operation. This is a fast operation, and only adds a nominal number of parameters at each layer.

## Datasets

In this notebook, we use two well-known numeric only datasets: 'iris' and 'wine' for demonstration. Both datasets consists of numeric data only with three output classes. 

### Iris Dataset

This dataset consists of 150 examples for three output classes (50 each). The data consists of 4 numeric columns.

See (UCI Machine Learning Repository for more details)[https://archive.ics.uci.edu/ml/datasets/iris].

In [None]:
from sklearn import datasets
iris = datasets.load_iris()
X_iris = iris.data
Y_iris = iris.target

### Wine Dataset

This dataset consists of 178 examples for three output classes (~60 each). The data consists of 13 numeric columns.

See (UCI Machine Learning Repository for more details)[https://archive.ics.uci.edu/ml/datasets/wine].

In [None]:
wine = datasets.load_wine()
X_wine = wine.data
Y_wine = wine.target

## Models

For both datasets, we will use the same shallow deep model, with the only difference being the input vector:

        input layer : 10 node dense layer
        hidden layer: 10 node dense layer
        output layer: 3 node dense layer
        
We will train using non-normalized data on two versions of the model for each dataset. In the first version, we will train w/o the identity linkand then with the identity link.

### Iris Model

    iris_model  : model w/o identity link
    iris_modelx : model with identity link

In [None]:
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Concatenate

def make_iris():
    inputs = Input(shape=(4,))
    x = Dense(10, activation='relu')(inputs)
    x = Dense(10, activation='relu')(x)
    outputs = Dense(3, activation='softmax')(x)
    return Model(inputs, outputs)

iris_model = make_iris()
iris_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
iris_model.summary()

In [None]:
def make_iris_x():
    inputs = Input(shape=(4,))
    x = Dense(10, activation='relu')(inputs)
    x = Concatenate()([inputs, x])
    x = Dense(10, activation='relu')(x)
    x = Concatenate()([inputs, x])
    outputs = Dense(3, activation='softmax')(x)
    return Model(inputs, outputs)

iris_modelx = make_iris_x()
iris_modelx.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
iris_modelx.summary()

### Wine Model

    wine_model  : model w/o identity link
    wine_modelx : model with identity link

In [None]:
def make_wine():
    inputs = Input(shape=(13,))
    x = Dense(10, activation='relu')(inputs)
    x = Dense(10, activation='relu')(x)
    outputs = Dense(3, activation='softmax')(x)
    return Model(inputs, outputs)

wine_model = make_wine()
wine_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
wine_model.summary()

In [None]:
def make_wine_x():
    inputs = Input(shape=(13,))
    x = Dense(10, activation='relu')(inputs)
    x = Concatenate()([inputs, x])
    x = Dense(10, activation='relu')(x)
    x = Concatenate()([inputs, x])
    outputs = Dense(3, activation='softmax')(x)
    return Model(inputs, outputs)

wine_modelx = Model(inputs, outputs)
wine_modelx.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
wine_modelx.summary()

## Results

Each traing session will train for 30 epochs for stocastic gradient descent (batch=1).

### Iris w/o identity Link

In [None]:
 for _ in range(3):
        print("ITER", _)
        model = make_iris()
        model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
        model.fit(X_iris, Y_iris, epochs=30, batch_size=1, verbose=1, validation_split=0.2)

#### Results (Non-Identity)

    ZERO - Number of epochs where valuation accuracy < 3%
    AVE  - Average valuation accuracy across 30 epochs.

    ITER 1: ZERO 9, AVE 30%
    ITER 2: ZERO 16, AVE 25%
    ITER 3: ZERO 1, AVE 62%

    Total number of ZERO: 26
    Average Acc across iterations: 39

In [None]:
 for _ in range(3):
        print("ITER", _)
        model = make_iris_x()
        model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
        model.fit(X_iris, Y_iris, epochs=30, batch_size=1, verbose=1, validation_split=0.2)

#### Results (Identity)

    ZERO - Number of epochs where valuation accuracy < 3%
    AVE  - Average valuation accuracy across 30 epochs.

    ITER 1: ZERO 7, AVE 30%
    ITER 2: ZERO 3, AVE 25%
    ITER 3: ZERO 5, AVE 62%

    Total number of ZERO: 15
    Average Acc across iterations: 49

### Wine w/o Identity Link


In [None]:
for _ in range(3):
        print("ITER", _)
        model = make_wine()
        model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
        model.fit(X_wine, Y_wine, epochs=30, batch_size=1, verbose=1, validation_split=0.2)

#### Results (Non-Identity)

    ZERO - Number of epochs where valuation accuracy < 3%
    AVE  - Average valuation accuracy across 30 epochs.

    ITER 1: ZERO 28, AVE 3%
    ITER 2: ZERO 14, AVE 28%
    ITER 3: ZERO 14, AVE 24%

    Total number of ZERO: 56
    Average Acc across iterations: 18%

In [None]:
for _ in range(3):
        print("ITER", _)
        model = make_wine_x()
        model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
        model.fit(X_wine, Y_wine, epochs=30, batch_size=1, verbose=1, validation_split=0.1)

#### Results (Non-Identity)

    ZERO - Number of epochs where valuation accuracy < 3%
    AVE  - Average valuation accuracy across 30 epochs.

    ITER 1: ZERO 5, AVE 58%
    ITER 2: ZERO 1, AVE 81%
    ITER 3: ZERO 2, AVE 81%

    Total number of ZERO: 56
    Average Acc across iterations: 18%