# Classification with Gated Residual and Variable Selection Networks 🤖
## Using Gated Residual and Variable Selection Networks for Predicting States of Manufacturing Control Data

## Work in Progress...



**Introduction**

This example demonstrates the use of Gated Residual Networks (GRN) and Variable Selection Networks (VSN), proposed by Bryan Lim et al. in Temporal Fusion Transformers (TFT) for Interpretable Multi-horizon Time Series Forecasting, for structured data classification. GRNs give the flexibility to the model to apply non-linear processing only where needed. VSNs allow the model to softly remove any unnecessary noisy inputs which could negatively impact performance. Together, those techniques help improving the learning capacity of deep neural network models.

Note that this example implements only the GRN and VSN components described in in the paper, rather than the whole TFT model, as GRN and VSN can be useful on their own for structured data learning tasks.

To run the code you need to use TensorFlow 2.3 or higher.


**References**

https://keras.io/examples/structured_data/classification_with_grn_and_vsn/

**Notebooks Ideas and Credits**

I took ideas or inspiration from the following notebooks, if you enjoy my work, please take a look to the notebooks that inspire my work.

**TPSMAY22 Gradient-Boosting Quickstart:** 

https://www.kaggle.com/code/ambrosm/tpsmay22-gradient-boosting-quickstart/notebook


**TPSMAY22 Advanced Keras:**

https://www.kaggle.com/code/ambrosm/tpsmay22-advanced-keras


# 1. Loading the Requiered Libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras import callbacks
from tensorflow.keras.callbacks import ReduceLROnPlateau, LearningRateScheduler, EarlyStopping

import tensorflow as tf
import random
import os

from sklearn.metrics import roc_auc_score, log_loss
from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.preprocessing import StandardScaler, RobustScaler, PowerTransformer, MinMaxScaler

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import time
from sklearn import model_selection

from collections import defaultdict
import tensorflow as tf

---

# 2. Setting the Notebook

In [None]:
%%time
# I like to disable my Notebook Warnings.
import warnings
warnings.filterwarnings('ignore')

In [None]:
%%time
# Notebook Configuration...

# Amount of data we want to load into the Model...
DATA_ROWS = None
# Dataframe, the amount of rows and cols to visualize...
NROWS = 15
NCOLS = 10
# Main data location path...
BASE_PATH = '...'

In [None]:
%%time
# Configure notebook display settings to only use 2 decimal places, tables look nicer.
pd.options.display.float_format = '{:,.5f}'.format
pd.set_option('display.max_columns', NCOLS) 
pd.set_option('display.max_rows', NROWS)

---

# 3. Loading the Information (CSV) Into A Dataframe

In [None]:
%%time
# Load the CSV information into a Pandas DataFrame...
trn_data = pd.read_csv('/kaggle/input/tabular-playground-series-may-2022/train.csv')
tst_data = pd.read_csv('/kaggle/input/tabular-playground-series-may-2022/test.csv')

sub = pd.read_csv('/kaggle/input/tabular-playground-series-may-2022/sample_submission.csv')

---

# 4. Exploring the Information Available

In [None]:
%%time
# Explore the shape of the DataFrame...
trn_data.shape

In [None]:
%%time
# Display simple information of the variables in the dataset...
trn_data.info()

In [None]:
%%time
# Display the first few rows of the DataFrame...
trn_data.head()

In [None]:
%%time
# Generate a simple statistical summary of the DataFrame, Only Numerical...
trn_data.describe()

In [None]:
%%time
# Calculates the total number of missing values...
trn_data.isnull().sum().sum()

In [None]:
%%time
# Display the number of missing values by variable...
trn_data.isnull().sum()

In [None]:
%%time
# Display the number of unique values for each variable...
trn_data.nunique()

In [None]:
# Display the number of unique values for each variable, sorted by quantity...
trn_data.nunique().sort_values(ascending = True)

In [None]:
%%time
# Check some of the categorical variables
categ_cols = ['f_29','f_30','f_13', 'f_18','f_17','f_14','f_11','f_10','f_09','f_15','f_07','f_12','f_16','f_08','f_27']
trn_data[categ_cols].sample(5)

In [None]:
%%time
# Generate a quick correlation matrix to understand the dataset better
correlation = trn_data.corr()

In [None]:
%%time
# Diplay the correlation matrix
correlation

In [None]:
%%time
# Check the most correlated variables to the target
correlation['target'].sort_values(ascending = False)[:5]

In [None]:
%%time
# Check the least correlated variables to the target
correlation['target'].sort_values(ascending = True)[:5]

In [None]:
%%time
# Check how well balanced is the dataset
trn_data['target'].value_counts()

In [None]:
%%time
# Check some statistics on the target variable
trn_data['target'].describe()

---

# 5. Feature Engineering

## 5.1 - Character Features.

In [None]:
%%time
def count_chars(df, field):
    '''
    Describe something...
    '''
    
    for i in range(10):
        df[f'ch_{i}'] = df[field].str.get(i).apply(ord) - ord('A')
        
    df["unique_characters"] = df[field].apply(lambda s: len(set(s)))
    return df

In [None]:
%%time
# Utilizes the new created funtions to generate more features.
trn_data = count_chars(trn_data, 'f_27')
tst_data = count_chars(tst_data, 'f_27')

## 5.2 - Interaction Features

In [None]:
%%time
def calculate_feat_int(df):
    df['i_02_21'] = (df.f_21 + df.f_02 > 5.2).astype(int) - (df.f_21 + df.f_02 < -5.3).astype(int)
    df['i_05_22'] = (df.f_22 + df.f_05 > 5.1).astype(int) - (df.f_22 + df.f_05 < -5.4).astype(int)
    i_00_01_26 = df.f_00 + df.f_01 + df.f_26
    df['i_00_01_26'] = (i_00_01_26 > 5.0).astype(int) - (i_00_01_26 < -5.0).astype(int)
    return df

trn_data = calculate_feat_int(trn_data)
tst_data = calculate_feat_int(tst_data)

In [None]:
%%time
continuous_feat = ['f_00', 'f_01', 'f_02', 'f_03', 'f_04', 'f_05', 'f_06', 'f_19', 'f_20', 'f_21', 'f_22', 'f_23', 'f_24', 'f_25', 'f_26', 'f_28']

def stat_features(df, cols = continuous_feat):
    '''
    Calculate aggregated features across the selected continuous columns
    
    '''
    # Base statistical features.
    df['f_sum']  = df[continuous_feat].sum(axis=1)
    df['f_min']  = df[continuous_feat].min(axis=1)
    df['f_max']  = df[continuous_feat].max(axis=1)
    df['f_std']  = df[continuous_feat].std(axis=1)    
    df['f_mad']  = df[continuous_feat].mad(axis=1)
    df['f_mean'] = df[continuous_feat].mean(axis=1)
    df['f_kurt'] = df[continuous_feat].kurt(axis=1)

    # Extra statistical features.
    df['f_prod'] = df[continuous_feat].prod(axis=1)
    df['f_range'] = df[continuous_feat].max(axis=1) - df[continuous_feat].min(axis=1)
    df['f_count_pos']  = df[df[continuous_feat].gt(0)].count(axis=1)
    df['f_count_neg']  = df[df[continuous_feat].lt(0)].count(axis=1)

    return df

In [None]:
%%time
#trn_data = stat_features(trn_data, continuous_feat)
#tst_data = stat_features(tst_data, continuous_feat)

---

# 6. Feature Selection for Baseline Model

In [None]:
%%time
# Define what will be used in the training stage
ignore = ['id', 
          'f_27', 
          'f_27_enc', 
          'is_train', 
          'target'] # f_27 has been label encoded...

FEATURES = [feat for feat in trn_data.columns if feat not in ignore]
TARGET = 'target'

---

# 7. Pre-Processing for Training

In [None]:
# scaler = MinMaxScaler(feature_range = (0, 1))

# scaler = StandardScaler()

# for col in FEATURES:
#     trn_data[col] = scaler.fit_transform(trn_data[col].to_numpy().reshape(-1,1))
#     tst_data[col] = scaler.transform(tst_data[col].to_numpy().reshape(-1,1))
    
X = trn_data[FEATURES].to_numpy().astype(np.float32)
Y = trn_data[TARGET].to_numpy().astype(np.float32)
X_test = tst_data[FEATURES].to_numpy().astype(np.float32)

---

# 8. Model Construction, Gated Residual and Variable Selection Networks

## 8.1 - Creating the Layers for the Model.

In [None]:
def create_model_inputs():
    inputs = {}
    for feature_name in FEATURES:
        inputs[feature_name] = layers.Input(
            name=feature_name, shape=(), dtype=tf.float32
        )
    return inputs

In [None]:
def encode_inputs(inputs, encoding_size):
    encoded_features = []
    for i in range(inputs.shape[1]):
        encoded_feature = tf.expand_dims(inputs[:, i], -1)
        encoded_feature = layers.Dense(units=encoding_size)(encoded_feature)
        encoded_features.append(encoded_feature)
    return encoded_features   

In [None]:
# Creates network units to be used in the model.

class GatedLinearUnit(layers.Layer):
    def __init__(self, units):
        super(GatedLinearUnit, self).__init__()
        self.linear = layers.Dense(units)
        self.sigmoid = layers.Dense(units, activation="sigmoid")

    def call(self, inputs):
        return self.linear(inputs) * self.sigmoid(inputs)

In [None]:
# Creates network units to be used in the model.

class GatedResidualNetwork(layers.Layer):
    def __init__(self, units, dropout_rate):
        super(GatedResidualNetwork, self).__init__()
        self.units = units
        self.elu_dense = layers.Dense(units, activation="swish") # Originally Was Utilizing 'elu' Activations.
        self.linear_dense = layers.Dense(units)
        self.dropout = layers.Dropout(dropout_rate)
        self.gated_linear_unit = GatedLinearUnit(units)
        self.layer_norm = layers.LayerNormalization()
        self.project = layers.Dense(units)

    def call(self, inputs):
        x = self.elu_dense(inputs)
        x = self.linear_dense(x)
        x = self.dropout(x)
        if inputs.shape[-1] != self.units:
            inputs = self.project(inputs)
        x = inputs + self.gated_linear_unit(x)
        x = self.layer_norm(x)
        return x

In [None]:
class VariableSelection(layers.Layer):
    def __init__(self, num_features, units, dropout_rate):
        super(VariableSelection, self).__init__()
        self.grns = list()
        # Create a GRN for each feature independently
        for idx in range(num_features):
            grn = GatedResidualNetwork(units, dropout_rate)
            self.grns.append(grn)
        # Create a GRN for the concatenation of all the features
        self.grn_concat = GatedResidualNetwork(units, dropout_rate)
        self.softmax = layers.Dense(units=num_features, activation="softmax")

    def call(self, inputs):
        v = layers.concatenate(inputs)
        v = self.grn_concat(v)
        v = tf.expand_dims(self.softmax(v), axis=-1)

        x = []
        for idx, input in enumerate(inputs):
            x.append(self.grns[idx](input))
        x = tf.stack(x, axis=1)

        outputs = tf.squeeze(tf.matmul(v, x, transpose_a=True), axis=1)
        return outputs

---

## 8.2 - Creating the Model GRV & VSN

In [None]:
def create_model(encoding_size, dropout_rate=0.10):
    inputs = layers.Input(len(FEATURES))
    feature_list = encode_inputs(inputs, encoding_size)
    num_features = len(feature_list)

    features = VariableSelection(num_features, encoding_size, dropout_rate)(
        feature_list
    )

    outputs = layers.Dense(units=1, activation="sigmoid")(features)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    return model

---

# 9.0 Training the Model, Cross Validation Loop

In [None]:
def format_time(seconds):
    """
    Formates time in human readable form

    Args:
        seconds: seconds passed in a process
    Return:
        formatted string in form of MM:SS or HH:MM:SS
    """
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = int(seconds % 60)
    result = ''
    _h = ('0' + str(h)) if h < 10 else str(h)
    result += (_h + ' hr ') if h > 0 else ''
    _m = ('0' + str(m)) if m < 10 else str(m)
    result += (_m + ' min ') if m > 0 else ''
    _s = ('0' + str(s)) if s < 10 else str(s)
    result += (_s + ' sec')
    return result

In [None]:
import gc
import math

oof_df = defaultdict(lambda : [])
test_df = defaultdict(lambda : np.zeros((X_test.shape[0])))

N_FOLDS = 3
ENCODING_SIZE = 96 # Default Value = 32 ...
EPOCHS = 15
VERBOSE = 1
BATCH_SIZE = 2048
SEED = 42

start = time.time()
skfolds = model_selection.StratifiedKFold(n_splits=N_FOLDS, shuffle=True, random_state=SEED)

for fold, (t, v) in enumerate(skfolds.split(X, Y)):
    x_train, x_val = X[t], X[v]
    y_train, y_val = Y[t], Y[v]
    
    # Scaling features for improved training
    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_val = scaler.transform(x_val)
    
    oof_df[TARGET].extend(y_val)
    print(f"\n{'-'*15} FOLD-{fold} {'-'*15}")
    
    tic = time.time()
    
    clf = create_model(ENCODING_SIZE)
    
    clf.compile(loss='binary_crossentropy', 
                optimizer='adam', 
                metrics=[tf.keras.metrics.AUC(name='auc'), 'acc'])
    
    lr = tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.25, 
                               patience=4, verbose=VERBOSE)

    es = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=15, 
                       verbose=VERBOSE, mode="min", 
                       restore_best_weights=True)
    
    lr_start = 0.0100
    lr_end   = 0.0002
    
    epochs = EPOCHS
    def cosine_decay(epoch):
        if epochs > 1:
            w = (1 + math.cos(epoch / (epochs - 1) * math.pi)) / 2
        else:
            w = 1
        return w * lr_start + (1 - w) * lr_end
        
    tm = tf.keras.callbacks.TerminateOnNaN()
    lr = LearningRateScheduler(cosine_decay, verbose = 0)
    
    # callbacks = [es, lr]
    callbacks = [lr, tm]
    
    
    
    clf.fit(x_train, y_train, 
            epochs=EPOCHS, 
            batch_size=BATCH_SIZE,
            validation_data=(x_val, y_val),
            validation_batch_size=len(y_val),
            callbacks=callbacks,
            shuffle=True,
            verbose=VERBOSE)
    
    
    X_test = scaler.transform(X_test)
    
    preds = np.squeeze(clf.predict(x_val, batch_size=len(y_val)))
    oof_df[f'nn'].extend(preds)
    test_df[f'nn'] += (np.squeeze(clf.predict(X_test, batch_size=BATCH_SIZE) / N_FOLDS))

    score = roc_auc_score(y_val, preds)
    print(f"MODEL: nn\tSCORE: {score}\tTIME: {format_time(time.time()-tic)}")

    del clf
    gc.collect()
        
    del x_train, x_val, y_train, y_val
    gc.collect()
        
oof_df = pd.DataFrame(oof_df)
test_df = pd.DataFrame(test_df)

print()
print(f'TOTAL TIME: {format_time(time.time() - start)}')

In [None]:
score = roc_auc_score(oof_df[TARGET], oof_df['nn'])
print(f'Overall ROC AUC of: {score}')

In [None]:
# Overall ROC AUC of: 0.9968621943736978

---

# 11.0 - Baseline Model Submission File Generation

In [None]:
%%time
# Review the format of the submission file
sub.head()

In [None]:
%%time
# Populated the prediction on the submission dataset and creates an output file
sub['target'] = test_df['nn']
sub.to_csv('my_submission_051322.csv', index = False)

In [None]:
%%time
# Review the submission file as a final step to upload to Kaggle.
sub.head()