# Introduction

While FFNs (Feed Forward Networks) with Batch Normalization holds great potential to harness the many levels of abstract representations that comes with a deep network, the number of layers is limited by SGD. This is because after a certain number of few layers, SGD becomes unstable and the network starts to encounter problems such as vanishing and exploding gradients. Moreover, SGD and regularization techniques like dropout often perturbs Batch Normalization leading to high variance in training error. These problems are solved by Self Normalizing Neural Networks.

Self-Normalizing Neural Networks (SNNs) are neural networks which automatically keep their activations at zero-mean and unit-variance (per neuron). This is accomplished through the use of SeLU activation function which requires LeCun Normal kernel initialization.

Following is an excerpt from the [research paper](https://arxiv.org/pdf/1706.02515.pdf) of Self Normalizing Neural Networks:

> Self-normalizing neural networks (SNNs) are robust to perturbations and do not have high variance
in their training errors. SNNs push neuron activations to zero mean and unit variance
thereby leading to the same effect as batch normalization, which enables to robustly learn many
layers. SNNs are based on scaled exponential linear units “SELUs” which induce self-normalizing
properties like variance stabilization which in turn avoids exploding and vanishing gradients.

# Code

## Some preprocessing

In [2]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow


In [2]:
import numpy as np
import pandas as pd

In [3]:
train_df = pd.read_csv("../input/tabular-playground-series-nov-2021/train.csv")
test_df = pd.read_csv("../input/tabular-playground-series-nov-2021/test.csv")
sub_df = pd.read_csv("../input/tabular-playground-series-nov-2021/sample_submission.csv")

FileNotFoundError: [Errno 2] No such file or directory: '../input/tabular-playground-series-nov-2021/train.csv'

Seperating features and targets

In [3]:
train_df.drop(columns=["id"], inplace=True)
test_df.drop(columns=["id"], inplace=True)

X = train_df.drop(columns=["target"]).values
y = train_df["target"].values

## Building SNN Model

Notice that, the **[LeCun Normal](https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunNormal)** kernel initializer is used instead of the default one. Although, this network does not contains dropout layers, deep networks with large number of neurons can have [dropout](https://keras.io/api/layers/regularization_layers/dropout/) layers. However, the authors of the SNN paper have advised not to use this dropout. Instead they have proposed a new dropout technique called **alpha dropout** and have also suggested to use it instead. **[Alpha dropout](https://keras.io/api/layers/regularization_layers/alpha_dropout/)** is available as a layer in keras.

Although in this case I have built an SNN with only 2 layers, it is possible to stack many layers in an SNN. I have used 160 neurons in the first hidden layer and 96 neurons in the second hidden layer.

In [1]:
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential


def build_model():
    model = Sequential([
        layers.Dense(units=160, activation="selu", kernel_initializer="lecun_normal", input_shape=X.shape[1:]),
        layers.Dense(units=96, activation="selu", kernel_initializer="lecun_normal"),
        layers.Dense(units=1, activation="sigmoid")
    ])

    model.compile(
        optimizer="adam",
        loss="binary_crossentropy",
        metrics=["AUC"]
    )

    return model


build_model().summary()

ModuleNotFoundError: No module named 'tensorflow'

Defining Various callbacks

In [5]:
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping


reduce_lr = ReduceLROnPlateau(
    monitor="val_loss",
    factor=0.8,
    patience=10,
)

early_stop = EarlyStopping(
    monitor="val_loss",
    patience=60,
    restore_best_weights=True
)

callbacks = [reduce_lr, early_stop]

## Training Model

I have used the StratifiedKFold validation strategy with 7 folds. To speed up the model training, a batch size of 2048 is used.

In [6]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold


EPOCHS = 500
BATCH_SIZE = 2048
FOLDS = 7

cv = StratifiedKFold(n_splits=FOLDS, shuffle=True, random_state=42)
test_preds = []
mean_score = 0

for fold, (train_idx, val_idx) in enumerate(cv.split(X, y)):
    X_train, y_train = X[train_idx], y[train_idx]
    X_val, y_val = X[val_idx], y[val_idx]

    scaler = MinMaxScaler()

    X_train = scaler.fit_transform(X_train)
    X_val = scaler.transform(X_val)
    X_test = scaler.transform(test_df)

    model = build_model()

    model.fit(
        X_train,
        y_train,
        validation_data=(X_val, y_val),
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        callbacks=[reduce_lr, early_stop],
        verbose=False
    )

    y_pred = model.predict(X_val)
    score = roc_auc_score(y_val, y_pred)
    mean_score += score

    print(f"FOLD {fold} | Score: {score}")

    test_preds.append(model.predict(X_test))


print()
print(f"Mean score of all folds: {mean_score/FOLDS}")

2021-11-07 13:53:22.777451: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


FOLD 0 | Score: 0.753520126342355
FOLD 1 | Score: 0.7517027327737468
FOLD 2 | Score: 0.7518236041418409
FOLD 3 | Score: 0.7550353868193339
FOLD 4 | Score: 0.7535631363554156
FOLD 5 | Score: 0.7536377961395959
FOLD 6 | Score: 0.7578906297465712

Mean score of all folds: 0.7538819160455513


In [7]:
sub_df["target"] = sum(test_preds)/FOLDS
sub_df.to_csv("submission.csv", index=False)

sub_df.head()

Unnamed: 0,id,target
0,600000,0.729144
1,600001,0.748575
2,600002,0.756956
3,600003,0.346804
4,600004,0.731332
