# Assembling the Model

### Task
To create a machine learning algorithm, which is able to predict if a customer will buy again. This classification has two classes, won't buy and will buy, represented by 0 and 1 respectively.

### Import the relevant libraries

In [1]:
import numpy as np
import tensorflow as tf

### Data

In [2]:
npz = np.load('Audiobook_train_data.npz')

train_inputs = npz['inputs'].astype(float)
train_targets = npz['targets'].astype(float)
# astype() creates a copy of the array, cast to a specific type
# to ensure our model learns correctly we accept all inputs to be float
# even though our targets are 0 and 1, we are not completely certain that they will extracted as integers, floats or booleans, so we cast targets too

npz = np.load('Audiobook_validation_data.npz')

validation_inputs = npz['inputs'].astype(float)
validation_targets = npz['targets'].astype(float)

npz = np.load('Audiobook_test_data.npz')

test_inputs , test_targets = npz['inputs'].astype(float) , npz['targets'].astype(float)

### Model

Outline, Optimizers, Loss, Early Stopping and Training

In [3]:
input_size = 10
hidden_layer_size = 50
output_size = 2

model = tf.keras.Sequential([
                                tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
                                tf.keras.layers.Dropout(0.2),
                                tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
                                tf.keras.layers.Dropout(0.2),
                                tf.keras.layers.Dense(output_size, activation = 'softmax')
                            ])
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

batch_size = 100
max_epochs = 100

# we can feed  2 tuple object or two simple seperated  arrays when feeding training data
# when dealing with arrays, indicating batch size will automatically batch the data during the training process
'''
# case 1: no early stopping
model.fit(train_inputs, 
          train_targets,
          batch_size= batch_size,
          epochs = max_epochs,
          validation_data = (validation_inputs, validation_targets),
          verbose = 2,
         )
'''
# case 2: with early stopping
early_stopping = tf.keras.callbacks.EarlyStopping(patience=1)
# by defualt it stops the first time the validation loss starts increasing.
model.fit(train_inputs, 
          train_targets,
          batch_size= batch_size,
          epochs = max_epochs,
          callbacks = [early_stopping],  # list of callbacks []
          validation_data = (validation_inputs, validation_targets),
          verbose = 2,
         )

Epoch 1/100
36/36 - 1s - loss: 0.6476 - accuracy: 0.6038 - val_loss: 0.5351 - val_accuracy: 0.7405 - 1s/epoch - 31ms/step
Epoch 2/100
36/36 - 0s - loss: 0.5263 - accuracy: 0.7346 - val_loss: 0.4575 - val_accuracy: 0.7606 - 116ms/epoch - 3ms/step
Epoch 3/100
36/36 - 0s - loss: 0.4717 - accuracy: 0.7558 - val_loss: 0.4270 - val_accuracy: 0.7673 - 109ms/epoch - 3ms/step
Epoch 4/100
36/36 - 0s - loss: 0.4584 - accuracy: 0.7538 - val_loss: 0.4100 - val_accuracy: 0.7740 - 125ms/epoch - 3ms/step
Epoch 5/100
36/36 - 0s - loss: 0.4393 - accuracy: 0.7614 - val_loss: 0.4007 - val_accuracy: 0.7875 - 119ms/epoch - 3ms/step
Epoch 6/100
36/36 - 0s - loss: 0.4306 - accuracy: 0.7717 - val_loss: 0.3932 - val_accuracy: 0.7875 - 113ms/epoch - 3ms/step
Epoch 7/100
36/36 - 0s - loss: 0.4248 - accuracy: 0.7723 - val_loss: 0.3891 - val_accuracy: 0.7875 - 112ms/epoch - 3ms/step
Epoch 8/100
36/36 - 0s - loss: 0.4155 - accuracy: 0.7804 - val_loss: 0.3856 - val_accuracy: 0.7875 - 118ms/epoch - 3ms/step
Epoch 9/10

<keras.callbacks.History at 0x24cc40da770>

1. In case no early stopping : When we train our model for so long, there is a chance that we are overfitting the model. As the loss was decreasing consistently, our validation loss was sometimes increasing and sometimes decreasing -> we have over fitted our model
2. In case of early stopping: fit() contains an argument called callbacks. These are functions called at certain steps during model training. We will focus on EarlyStopping. Here each time the validation loss is calculated, it is compared to the validation loss one epoch ago, if it starts increasing , our model is overfitting.
3. When the validation loss increase is significantly low, we may slide 1 or 2 validation increases. Hence we adjust EarlyStopping for this Tolerance.

### Test the data

In [4]:
model.evaluate(test_inputs,test_targets)



[0.35102730989456177, 0.8013392686843872]