# Improving your Keras model

- Dataset: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

# Setup

In [1]:
# ! conda install keras -y
# ! conda install tensorflow -y
# ! conda install xlrd -y

In [2]:
# imports
import pandas as pd
import numpy as np

from sklearn.utils import class_weight
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import metrics

from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical

In [3]:
# get the dataset from UCI ML Repository
# ! curl -o default.xls https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls

In [4]:
# load the dataset
df = pd.read_excel('data/default.xls', header=1)
df.shape

(30000, 25)

In [10]:
# Check for missing data
df.isnull().sum().sum()

0

Split into input (X) and output (y) variables

In [12]:
# predictors include all variables but ID and default
X = df.drop(['ID', 'default payment next month'], axis=1)
# convert target to categorical
y = to_categorical(df['default payment next month'])
# note that the y-variable is now one-hot encoded
print(y[:5])

[[0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]]


In [13]:
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [14]:
# Standardize the predictors
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Baseline Model

How many layers should the model contain?   
* There's a mountain of commentary on the question of hidden layer configuration in NNs (see the insanely thorough and insightful NN FAQ for an excellent summary of that commentary). One issue within this subject on which there is a consensus is the performance difference from adding additional hidden layers: the situations in which performance improves with a second (or third, etc.) hidden layer are very few. **One hidden layer is sufficient for the large majority of problems.**
* There are really two decisions that must be made regarding the hidden layers: how many hidden layers to actually have in the neural network and how many neurons will be in each of these layers. 
* Neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers.
 - [source](https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw)


#### Define the model

The number of nodes in the input layer is always determined by number of predictors. The number of neurons comprising that layer is equal to the number of features (columns) in your data. Note: Some NN configurations add one additional node for a bias term.

In [15]:
# number of nodes in the input layer 
nodes_input_layer = X_train.shape[1]
print(nodes_input_layer)

23


Like the Input layer, every NN has exactly one output layer. Determining its size (number of neurons) is simple; it is completely determined by the chosen model configuration.
* If the NN is a regressor, then the output layer has a single node.

* If the NN is a classifier, then it also has a single node unless softmax is used in which case the output layer has one node per class label in your model.

In [None]:
# number of nodes in output layer
nodes_output_layer = 2

The number of nodes in the hidden layers is not easy to determine. There is no universal answer for this question yet. Ultimately, the selection of an architecture for your neural network will come down to trial and error.
* Using too few neurons in the hidden layers will result in underfitting
* Too many neurons in the hidden layers may result in overfitting   

There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

*    The number of hidden neurons should be between the size of the input layer and the size of the output layer.
*    The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
*    The number of hidden neurons should be less than twice the size of the input layer.

- [source](https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw)
* [further reading](https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/)

In [None]:
# number of nodes in first hidden layer
nodes_hidden_layer = 12

Rules for the activation function

Input and Output Layers:
* The input layer does not require an activation function.
* For regression problems, the output layer does not require an activation function.
* For binary classification problems with a single output variable, the activation function should be "sigmoid".
* For multi-label classification problems with a single output variable, the activation function should be "softmax".  

Hidden Layers:
* The rectified linear activation function, or ReLU activation function, is perhaps the most common function used for hidden layers.
* Sigmoid and Tanh used to be popular but were more susceptible to vanishing gradients that prevent deep models from being trained
* Recurrent networks still commonly use Tanh or sigmoid activation functions, or even both. 


Additional reading:
* [Jason Brownlee](https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/)
* [Keras documentation](https://keras.io/api/layers/activations/)

In [None]:
# activation function for the hidden layer
activation_function_hidden_layer = 'relu'

In [None]:
# activation function for the output layer
activation_function_output_layer = 'softmax'

In [None]:
# define the model
model = Sequential()

# add layers
model.add(Dense(12, 
                activation=activation_function_hidden_layer, 
                input_shape = (nodes_input_layer,) # note: the final comma is important
               )
         )
model.add(Dense(nodes_output_layer, 
                activation=activation_function_output_layer )
         )

#### Compile the Model

How should I choose a loss function?
* Regression:
* Binary Classification:
* Multi-Class Classification:


Further reading: 
* [Jason Brownlee](https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/)
* [Keras Documentation](https://keras.io/api/losses/)

In [None]:
# loss function
loss_function='categorical_crossentropy'

In [None]:
# optimization algo
optimization_algorithm='adam'

In [None]:
# metrics for evaluation during training
list_of_metrics=['accuracy']

In [None]:
# compile the model
model.compile(loss=loss_function, 
              optimizer=optimization_algorithm, 
              metrics=list_of_metrics
             )

#### Fit the model

In [None]:
# how many epochs?
epochs=10

In [None]:
# batch size
batch_size=10

In [None]:
# early stopping
early_stopping_monitor = EarlyStopping(patience=2)

In [None]:
# class weight
class_weight = {0:ratio, 1:1-ratio}

In [None]:
# fit the keras model on the dataset
model.fit(X_train, 
          y_train, 
          # validation_data=(X_test,y_test), 
          epochs=epochs, 
          # batch_size=batch_size,
          # class_weight=class_weight,
          # callbacks = [early_stopping_monitor]
         )

In [None]:
# make probability predictions with the model (they come in pairs)
y_probs = model.predict(X_test)
# make class predictions with the model
y_preds = (y_probs > 0.5).astype(int)
# Evaluate the model
print(metrics.classification_report(y_test, y_preds))

In [None]:
y_probs 

In [None]:
# make class predictions with the model
y_preds

In [None]:
y_test