## Introduction 📝
🎯 Goal:Binary classification based on features

📖 Data:

train.csv / test.csv - the training and testing set

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed target. <br>
______________________________________________________________________________________________________________________

### What is the purpose of the notebook?

This notebook looks at a simple nueral net model and identifies how we can interpret the results and insights from a nueral net model outputs

In [None]:
!pip install  tensorflow==2.5.0

In [None]:
import os
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import re
import time
import spacy
import gc
import shutil
import datatable as dt
from pathlib import Path
import warnings
import os
import cupy as cp
import pandas as pd
import cudf
import dask_cudf


#### Get the data and create train, test split

In [None]:
!pip freeze | grep 'tensorflow'

In [None]:
train=pd.read_csv("/kaggle/input/tabular-playground-series-nov-2021/train.csv")
test=pd.read_csv("/kaggle/input/tabular-playground-series-nov-2021/test.csv")

In [None]:
train.head(20)

In [None]:
X=train.iloc[:,1:100]
Y=train['target']

In [None]:
gc.collect()

In [None]:
import sklearn
X_train, X_test, Y_train, Y_test = sklearn.model_selection.train_test_split(X, Y, test_size = 0.33, random_state = 5)
print('Y_train: ', Y_train.shape)
print('Y_test: ', Y_test.shape)

#### Create a simple DNN Model

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
def build_nn_model(input_size, output_size, activ_func='linear', dropout=0.2, loss='binary_crossentropy', optimizer='adam'):
    model = keras.Sequential()
    model.add(layers.Dense(512, input_dim=input_size, activation='relu'))
    model.add(layers.Dropout(0.1))
    model.add(layers.Dense(256, activation='relu'))
    model.add(layers.Dropout(0.2))
    model.add(layers.Dense(128, activation='relu'))
    # model.add(Dropout(0.2))
    model.add(layers.Dense(output_size, activation='sigmoid'))
    # Compile model

    model.compile(loss=loss, optimizer=optimizer, metrics=[ 'binary_crossentropy'])
    return model

In [None]:
epochs = 5
batch_size = 500
loss = 'binary_crossentropy'
dropout = 0.2
optimizer = 'adam'
zero_base = True
output_size=1
input_size = 99

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
model = build_nn_model(
    input_size, output_size=1, dropout=dropout, loss=loss,
    optimizer=optimizer)
history = model.fit(
    X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=1, shuffle=True,
    validation_data = (X_test,Y_test),
    callbacks=[EarlyStopping(monitor='val_loss', patience=1,)])

In [None]:
model.predict(test.iloc[:,1:100][1:10])

### SHAP

you can read more about SHAP here :https://github.com/slundberg/shap

It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see papers for details and citations).

Using DeepSHAP is pretty easy using shap python library. Just set it up as follows :



In [None]:
import shap

shap.initjs()

# Because our dataset is large we take a subset and use is to explain the model
background = X_train.iloc[0:1000,:].values.astype('float')
explainer = shap.DeepExplainer(model, background)
shap_values = explainer.shap_values(X=X_train.values[:500],
                                      ranked_outputs=True)

In [None]:
shap.force_plot(explainer.expected_value.numpy(),
                shap_values[0][0],
                feature_names=X_train.columns)

shap.force_plot(explainer.expected_value.numpy(),
                shap_values[0][0][1],
                X_train.values[:500][0],
                feature_names=X_train.columns,)



#### Feature Importance

In [None]:
shap.summary_plot(shap_values[0], X_train[1:200], plot_type="bar")

In [None]:
gc.collect()

#### How to interpret the above?

Notice there is this base value which is the expected value calculated by DeepSHAP which is just the value that would be predicted if you did not know any features. There is also this output value (i.e. the sumation of all feature contributions and base value) which is equal to the prediction of the actual model. SHAP values, then, just tells you how much contribution each feature adds in order to go from the base value to the output value.

In [None]:
record = 1 # this is just to pick one record in the dataset 
base_value = explainer.expected_value
output= base_value + np.sum(shap_values[0][0][record])
print('base value: ',base_value)
print('output value: ',output)

#sanity check that the ouput value is equal to the actual prediction
print(np.round(output,decimals=1) == np.round(model.predict(X_train.values)[record],decimals=1))


# to get the shape values or each feature
shap_df = pd.DataFrame(list(dict(zip(X_train.columns.values,shap_values[0][0][record])).items()),
             columns=['features','shapvals']).sort_values(by='shapvals', ascending=True)
shap_df

The above table shows how the NN derives the output for the particular sample from the base value. As you can see from the above there are positive and negative contributions. You will add them all to the base value to calculate the output that will happen for the given example.

So the total output is 

$basevalue +  + $$\sum_{n=1}^{features} shapvalues_n$$

### Do upvote if you find the kernel useful