# Simple NN Using Keras API

Before trying to dive into building NN from first-principles with TensorFlow, will first implement a simple NN making heavy use of the Keras API.

This NN will have 1 input layer, 1 hidden layer, and 1 output layer. It will learn to predict whether or not a given passenger survived on the Titanic. Since there are only two possible outputs, this is a *binary classification* problem

In [6]:
import pandas as pd
import tensorflow as tf
print('~*~*~* TensorFlow Version {} *~*~*~'.format(tf.__version__))
import numpy as np
import math

~*~*~* TensorFlow Version 2.1.0 *~*~*~


## Load data

Load data that we've already pre-processed for ML training & testing

In [7]:
# load
files_dir = '../../files/'
df = pd.read_csv(files_dir+'processed_data.csv')

# split into test and train depending on if has 'Surivived' label
x_train = df[pd.notnull(df['Survived'])].drop(['Survived'], axis=1)
y_train = df[pd.notnull(df['Survived'])]['Survived']
x_test = df[pd.isnull(df['Survived'])].drop(['Survived'], axis=1)

# reshape labels to be tensors (to match dims of logits) and extract values from dataframe
y_train = y_train.iloc[:].values.reshape(-1,1)
x_train = x_train.iloc[:].values

print('x train: {}'.format(x_train.shape))
print('y train: {}'.format(y_train.shape))

x train: (891, 14)
y train: (891, 1)


## Build NN

The Keras API makes building NN models very simple by stacking sequential layers. For our simple model, we only want 3 stacked modules:

1. **Input layer:** Flattens the data for the rest of the NN. Will have same number of neurons/units as there are dims in the input data.

2. **Hidden layer:** Hidden layers are composed of 2 parts: (i) A linear layer and (ii) a non-linear activation function layer. Can use any number of nodes for the hidden layer. Keras will automatically initialise a matrix of weights, will matrix multiply these weights by the input data, and will add some bias vector to this multiplied matrix. This linear operation will give the data to input to the non-linear activation function (many diff types of activation func).

3. **Output layer:** The output layer takes the outputs of the previous layer & performs a linear operation by matrix multiplying by some set of weights and adding biases. The output of this process is a vector whose elements are known as 'logits'. Logits are raw (non-normalised) 'predictions' generated by the classification NN model. The magnitude of each logit represents how 'confident' the NN is that the class represented by the particular logit is the true class of the data that was input into the model. Therefore the class with the highest value logit is the chosen classification of the NN. In our binary classification case, there are only 2 categories (0 = died, survived = 1), therefore only need 1 neuron/unit in output layer.

After the output layer, the logits of the output vector are usually normalsied (such that the sum of the logits is 1) using e.g. a softmax func to give a probability distribution over all possible classes. In our binary classification case, only have 2 possible values, therefore we use the signmoid function to get an output between 0 and 1. Later on during testing, we will then round this to 0 or 1 to get the final answer of the NN.

In [8]:
model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(x_train.shape[1],)),
        tf.keras.layers.Dense(units=20, activation='relu'),
        tf.keras.layers.Dense(units=1, activation=tf.nn.sigmoid)
        ])

#### Example

As an example, we can use the first datum in our training data and our initialised NN model to get the probability that the NN thinks the datum is each possible class. N.B. in our above model, if we had not specified an activation in the last output layer, then the output model would be the model's non-normalised prediction/logit vector; since we have included a sigmoid activation function, this logit vector has been converted into a probability.

In [9]:
print('First datum:\n{}'.format(x_train[:1]))
predictions = model(x_train[:1]).numpy()
print('Initialised model output probability that passenger survived: {}'.format(predictions))


First datum:
[[-0.58162831 -0.4449995   0.84191642  1.          0.48128777  0.
   0.          1.          1.          0.          0.          0.
   0.          0.        ]]


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Initialised model output probability that passenger survived: [[0.73537254]]


## Define Loss Function

Every NN needs a loss func for update rule/optimiser/backprop to use. This loss is the negative log probability of the true class: If it returns 0, the model is sure of the correct class.

So far, we have only initialised our model; we have not trained it. Therefore our model's weights and biases will not have been optimised, and the untrained model will give close to random probabilities for each class being true were we to test this model.

A common loss function for binary classification problems is the binary cross-entropy loss function

In [10]:
loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

## Compile Model & Train

We are now ready to compile our model by setting the optimiser, loss function, and the metric(s) to keep track of, and train our NN.

In [11]:
# compile nn
model.compile(optimizer='adam',
                  loss=loss_fn,
                  metrics=['accuracy'])

# train nn
model.fit(x_train, y_train, epochs=10, batch_size=1)

Train on 891 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7efe443cbe10>

## Test Model

With our model now trained, we can test it on the unseen (and unlabelled) test data. To get firm 0-1 predictions for our binary classifier, we round the probabilities output by the classifier to 0 or 1. Since this is for a Kaggle competition, we save the predictions/solution generated by our trained NN to a CSV file in the required format.

In [13]:
test = pd.read_csv(files_dir+'test.csv')
test['Survived'] = model.predict(x_test.iloc[:].values)
test['Survived'] = test['Survived'].apply(lambda x: round(x, 0)).astype('int')
solution = test[['PassengerId', 'Survived']]

In [14]:
# save trained nn solution for Kaggle entry
solution.to_csv(files_dir+'nn_solution.csv', index=False)