# Titanic Survival Classification - Neural Network Model and Ensembling (Part 4)

This notebook will be to build a neural network model to see how that performs and put together our final overall ensemble model.

So for this notebook I will be using the keras (with tensorflow backend) framework to build a simple neural network.

Due to the nature of the problem as a simple binary classification problem a simple fully connected neural network should suffice, and no more complex architectures are necessary.

So firstly lets import our packages.

In [1]:
#Import Tensorflow
import tensorflow as tf

#Import Keras
from keras import layers
from keras.layers import Input, Dense, Activation, BatchNormalization, Dropout
from keras.layers.advanced_activations import LeakyReLU, PReLU
from keras.models import Model
from keras import regularizers

#Import mathematical functions
from random import *
import math

#Import  Scikit learn framework
import sklearn as sk
from sklearn import svm
from sklearn.ensemble import (RandomForestClassifier, AdaBoostClassifier, 
                              GradientBoostingClassifier, ExtraTreesClassifier)


  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Of course import our pre-built functions and data as well.

In [2]:
#Import the functions built in Parts 1 and 2
from Titanic_Import import *

full_set = pd.read_csv('D:/Datasets/Titanic/train.csv')

#import data we cleansed last time
X_Train, X_CV, Y_Train, Y_CV = Cleanse_Training_Data(full_set)

Now lets build our first simple model, and ensure that it takes a list as the architecture so we can use this function to quickly iterate over a variety of architectures.

In [3]:
def NN_model(input_shape, layers, act_reg, ker_reg):
    #Having dynamic input shape as I may do feature engineering later.
    X_input = Input(input_shape)
    
    X = Dense(layers[0], input_dim=input_shape, activation='relu')(X_input)
    #X = LeakyReLU()(X)

    #Our NN Layers
    for i in range(len(layers) - 1):
      X = Dense(layers[i + 1], activation='relu', activity_regularizer = act_reg, kernel_regularizer = ker_reg)(X)
      #X = LeakyReLU()(X)

    
    X = Dense(1, activation='sigmoid')(X)

    # Create model. This creates your Keras model instance, you'll use this instance to train/test the model.
    model = Model(inputs = X_input, outputs = X, name='Simple_model')

    return model

Now we have our function lets test its performance with a architecture, so for simplicity lets make a intuitive architecture.

In [6]:
nn_archit = [20, 14, 10, 7, 5]

first_model = NN_model((24, ), nn_archit, None, regularizers.l2(0.01))
first_model.compile(optimizer = "Adam", loss = "binary_crossentropy", metrics = ["accuracy"])
first_model.fit(x = X_Train, y = Y_Train, epochs = 64)

Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64
Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64


<keras.callbacks.History at 0x1d0d5748>

Now to evaluate how our model does

In [7]:
### START CODE HERE ### (1 line)
preds = first_model.evaluate(x = X_Train, y = Y_Train)
### END CODE HERE ###~
print()
print ("Loss = " + str(preds[0]))
print ("Accuracy = " + str(preds[1]))


Loss = 0.442087286793628
Accuracy = 0.8394437427014376


In [8]:
### START CODE HERE ### (1 line)
preds2 = first_model.evaluate(x = X_CV, y = Y_CV)
### END CODE HERE ###
print()
print ("Loss = " + str(preds2[0]))
print ("Test Accuracy = " + str(preds2[1]))


Loss = 0.42887641668319704
Test Accuracy = 0.84


So overall our first neural network performs..okay but not great.  So after spending longer than I'd like to iterating over random architectures, lets build a function to take care of that for us and output a neural network architecture.

The idea is to randomly search over an array of architectures and return the best one, in an ideal world some form of reinforcement learning algorithm would be used to do this, however for now lets do a brute force method with a random search.  

In [10]:
def Find_Architecture(X_Train_2, Y_Train_2, X_CV_2, Y_CV_2, max_layers = 10, num_iters = 32): 
    best_perf = 0.0
    #Iterate through n interations
    for i in range(num_iters):
        #Reset hyperparameters and initalize nn depth
        layers = []
        num_layers = randint(3, max_layers)
        prev_layer = X_Train_2.shape[1]
        
        for j in range(num_layers):
            #Randomly generate number of units per layer
            min_size = math.ceil(prev_layer / 2.0)
            lay_size = randint(min_size, prev_layer)
            layers.append(lay_size)
            prev_layer = lay_size
            
        #Build and test model
        test_model = NN_model((X_Train_2.shape[1], ), layers, None, regularizers.l2(0.01))
        test_model.compile(optimizer = "Adam", loss = "binary_crossentropy", metrics = ["accuracy"])
        test_model.fit(x = X_Train_2, y = Y_Train_2, epochs = 64, verbose = 0)
        train_pred = test_model.evaluate(x = X_Train_2, y = Y_Train_2)
        cv_pred = test_model.evaluate(x = X_CV_2, y = Y_CV_2)
        
        #Evaluate performance by weighted sum of accuracies
        perform = train_pred[1]*0.6 + cv_pred[1]
        
        if perform > best_perf :
            best_perf = perform
            best_arch = layers
            best_train = train_pred
            best_cv = cv_pred
        
    return best_arch, best_train, best_cv

In [11]:
nn_architecture, train_perf, cv_perf = Find_Architecture(X_Train, Y_Train, X_CV, Y_CV, 10, 32)



In [12]:
print(nn_architecture)
print()
print ("Train Loss = " + str(train_perf[0]))
print ("Train Accuracy = " + str(train_perf[1]))
print()
print ("CV Loss = " + str(cv_perf[0]))
print ("CV Accuracy = " + str(cv_perf[1]))


[19, 11, 11, 8, 7, 4, 4, 4]

Train Loss = 0.4919141821855239
Train Accuracy = 0.8305941837475966

CV Loss = 0.4609449076652527
CV Accuracy = 0.86


So even with our best neural network model, our performance is still not great, approximately on par with all of the previous models.  

So far of the individual models the Gradient Boosting Classifier is the best performing.

## Building an Ensemble

The next step is to see if we can get a bit better performance by making a whole that is greater than the sum of its parts - with an ensemble of all of our previous models.

So I will bind each of our models together by virtue of another smaller neural network and use that to feed into our final output.

So the first step is to define our parameters for each of the models.

In [13]:
# Random Forest parameters
rf_params = {
    'n_jobs': -1,
    'n_estimators': 2000,
     'warm_start': False, 
     #'max_features': 0.2,
    'max_depth': 300,
    'min_samples_leaf': 2,
    'max_features' : 'sqrt',
    'verbose': 0
}

#Neural Network parameters
nn_params = {
    'layers' : [19, 11, 11, 8, 7, 4, 4, 4],
    'act_reg' : None,
    'ker_reg' : regularizers.l2(0.01),
    'input_shape' : (24,)
}

# Extra Trees Parameters
et_params = {
    'n_jobs': -1,
    'n_estimators': 1000,
    #'max_features': 0.5,
    'max_depth': 32,
    'min_samples_leaf': 2,
    'verbose': 0
}

# Gradient Boosting parameters
gb_params = {
    'n_estimators': 500,
    'learning_rate' : 0.05,
    'max_depth': None,
    'min_samples_leaf': 2,
    'verbose': 0
}

# AdaBoost parameters
ada_params = {
    'n_estimators': 5000,
    'learning_rate' : 0.75
}

#SVM parameters
svm_params = {
    'kernel' : 'rbf',
    'C' : 1.0  
}

Now to define our classes that can handle building and prediction on each of the models and produce our outputs. 

This will be a two step process of first building our first layer of the ensemble which will contain each of our 6 models and contain functions to initialize, train and generate predictions.

The second output layer will then contain functions to build the inner layers, define and build a second layer neural network from the output of the inner layer and generate our final output vector $\hat{y}$.


For now I will not include any hyperparameter tuning in this class outside of the second layer neural network, although this may change later.

In [14]:
class Ensemble_inner_models(object):
    #Initialization method to take all of our model parameters
    def __init__(self, rf_params, ada_params, gb_params, svm_params, et_params, nn_params):
        #Initialize all parameter variables
        self.rf_params = rf_params
        self.ada_params = ada_params
        self.gb_params = gb_params
        self.svm_params = svm_params
        self.et_params = et_params
        self.nn_params = nn_params
        
        
    
    #Now to initialize each of our models
    def initialize_models(self):
        self.svm_mod = svm.SVC(**self.svm_params)
        self.rf_mod =  RandomForestClassifier(**self.rf_params)
        self.ada_mod = AdaBoostClassifier(**self.ada_params)
        self.gb_mod = GradientBoostingClassifier(**self.gb_params)
        self.et_mod = ExtraTreesClassifier(**self.et_params)
        self.nn_mod = NN_model(**self.nn_params)
        self.nn_mod.compile(optimizer = "Adam", loss = "binary_crossentropy", metrics = ["accuracy"])
    
    #Method to train each of the models
    def train_models(self, X_Train, Y_Train, num_epochs = 64):
        self.svm_mod.fit(X_Train, Y_Train)
        self.rf_mod.fit(X_Train, Y_Train)
        self.ada_mod.fit(X_Train, Y_Train)
        self.gb_mod.fit(X_Train, Y_Train)
        self.et_mod.fit(X_Train, Y_Train)
        self.nn_mod.fit(x = X_Train, y = Y_Train, epochs = num_epochs, verbose = 0)
    
    #Debugging function to get the outputs of each system individually
    def get_pred_matrix(self, X_Test):
        #Matrix output will be of shape (m, num_models) - for now hardcoded
        pred_mat = np.zeros((X_Test.shape[0], 6))
        
        #Build prediction matrix
        pred_mat[:, 0] = self.svm_mod.predict(X_Test)
        pred_mat[:, 1] = self.rf_mod.predict(X_Test)
        pred_mat[:, 2] = self.ada_mod.predict(X_Test)
        pred_mat[:, 3] = self.gb_mod.predict(X_Test)
        pred_mat[:, 4] = self.et_mod.predict(X_Test)
        pred_mat[:, 5] = self.nn_mod.predict(X_Test, verbose=0).reshape((X_Test.shape[0]))
        
        return pred_mat


Now to put that into our final output layer and build the relevant functions.

In [15]:
class ensemble_model(object):
    def __init__(self, rf_params, ada_params, gb_params, svm_params, et_params, nn_params):
        #Initialize all parameter variables
        self.rf_params = rf_params
        self.ada_params = ada_params
        self.gb_params = gb_params
        self.svm_params = svm_params
        self.et_params = et_params
        self.nn_params = nn_params
    
    def build_inner_models(self, X_Train, Y_Train, num_epochs = 32):
        self.inner_models = Ensemble_inner_models(self.rf_params, self.ada_params, self.gb_params, self.svm_params, self.et_params, self.nn_params)
        self.inner_models.initialize_models()
        self.inner_models.train_models(X_Train, Y_Train)

    #Debugging function to optimize the architecture second layer NN
    def find_ensem_layer(self, X_Train, Y_Train, X_CV, Y_CV, max_depth = 10, num_iters = 20):
        Mod_train_preds = self.inner_models.get_pred_matrix(X_Train)
        Mod_CV_preds = self.inner_models.get_pred_matrix(X_CV)
        best_arch, best_train, best_cv = Find_Architecture(Mod_train_preds, Y_Train, Mod_CV_preds, Y_CV, max_depth, num_iters)
        return best_arch, best_train, best_cv
    
    #Function to build the ensemble layer
    def build_ensem_layer(self, arch, X_Train_2, Y_Train_2, num_epochs = 32, print_result = 0):
        #Compile Ensemble layer
        self.ensemble_mod = NN_model((6,), arch, None, regularizers.l2(0.01))
        self.ensemble_mod.compile(optimizer = "Adam", loss = "binary_crossentropy", metrics = ["accuracy"])
        
        #Perform inner calculations
        Train_preds = self.inner_models.get_pred_matrix(X_Train_2)
        
        #Fit ensemble layer
        self.ensemble_mod.fit(x = Train_preds, y = Y_Train_2, epochs = num_epochs, verbose = 1)
        
        #Output performance metrics
        train_perf = self.ensemble_mod.evaluate(x = Train_preds, y = Y_Train_2)   
        if print_result == 1:
            print ("Train Loss = " + str(train_perf[0]))
            print ("Train Accuracy = " + str(train_perf[1]))
            
        return train_perf
    
    def print_evaluation(self, X_eval, Y_eval):
        inner_preds = self.inner_models.get_pred_matrix(X_eval)
        perf = self.ensemble_mod.evaluate(x = inner_preds, y = Y_eval)
        print ("Loss = " + str(perf[0]))
        print ("Accuracy = " + str(perf[1]))
        
        
    def get_predictions(self, X_Test):
        inner_preds = self.inner_models.get_pred_matrix(X_Test)
        final_preds_raw = self.ensemble_mod.predict(X_Test, verbose=0).reshape((X_Test.shape[0]))
        
        final_preds = np.around(final_preds_raw)
        return final_preds

Now we have our final model built lets see how it does.

In [29]:
#Initialize model
fin_model = ensemble_model(rf_params, ada_params, gb_params, svm_params, et_params, nn_params)
fin_model.build_inner_models(X_Train, Y_Train)

#Previously ran function to find a good architecture
fin_model.build_ensem_layer([4, 4, 3, 2], X_Train, Y_Train, print_result = 1, num_epochs = 128)

Train Loss = 0.09697772002532871
Train Accuracy = 0.9873577749683944


[0.09697772002532871, 0.9873577749683944]

In [30]:
#Print Training performance
fin_model.print_evaluation(X_Train, Y_Train)

Loss = 0.09697772002532871
Accuracy = 0.9873577749683944


In [31]:
#Print CV Performance
fin_model.print_evaluation(X_CV, Y_CV)


Loss = 0.6704975652694702
Accuracy = 0.84


So our ensemble model is alas not greater than the sum of its parts, in fact it seems to have a heavy overreliance on the heavily overfitting classifiers.  

Perhaps this could be fixed by better regularization of the decision-tree based classifiers, or implementation of dropout regularization in the second layer network. 

So let us return to using a simple neural networking classifier and try out different combinations of feature engineering to test performance and hopefully get a better model.

Part 5 to follow.