# Import the relevant libraries

### Problem
You are given data from an Audiobook app. Logically, it relates only to the audio versions of books. Each customer in the database has made a purchase at least once, that's why he/she is in the database. We want to create a machine learning algorithm based on our available data that can predict if a customer will buy again from the Audiobook company.

The main idea is that if a customer has a low probability of coming back, there is no reason to spend any money on advertizing to him/her. If we can focus our efforts ONLY on customers that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again. Identifying new customers creates value and growth opportunities.

You have a .csv summarizing the data. There are several variables: Customer ID, Book length in mins_avg (average of all purchases), Book length in minutes_sum (sum of all purchases), Price Paid_avg (average of all purchases), Price paid_sum (sum of all purchases), Review (a Boolean variable), Review (out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number), and Last visited minus purchase date (in days).

So these are the inputs (excluding customer ID, as it is completely arbitrary. It's more like a name, than a number).

The targets are a Boolean variable (so 0, or 1). We are taking a period of 2 years in our inputs, and the next 6 months as targets. So, in fact, we are predicting if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months. 6 months sounds like a reasonable time. If they don't convert after 6 months, chances are they've gone to a competitor or didn't like the Audiobook way of digesting information.

The task is simple: create a machine learning algorithm, which is able to predict if a customer will buy again.

This is a classification problem with two classes: won't buy and will buy, represented by 0s and 1s.


# Import the relevant libraries

In [8]:
import numpy as np
import tensorflow as tf

# Data

In [9]:
# we declare a temporary variable named npz that store each of the three data set as we load them
npz = np.load('Audiobooks_data_train.npz')
# as in each npz file inputs and targets are specified, we extract them separately
train_inputs = npz['inputs'].astype(np.float) # to make sure our algorithm works correctly we make sure all inputs to be float

# targets must be int because of sparse_categorical_crossentropy (we want to be able to smoothly one-hot encode them)
train_targets = npz['targets'].astype(np.int) # to make sure our algorithm works correctly we make sure all targets to be int

# us npz again to load the other data
npz = np.load('Audiobooks_data_validation.npz')
validation_inputs = npz['inputs'].astype(np.float)
validation_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_test.npz')
test_inputs = npz['inputs'].astype(np.float)
test_targets = npz['targets'].astype(np.int)

# Finally note that unlike before our train validation and test data is simply an array form 
# instead of the iterator we use for the MNIST.

# Build the model

### 1- Outline the model

In [10]:
input_size = 10  # we have 10 features (predictors)
output_size = 2  # buy or not buy (0,1)

# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
    
# define how the model will look like
model = tf.keras.Sequential([
    
   # as we have already preprocessed our data no need to flatten it
    # removw : tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    # we use 'relu' as our activation functions
    # to know which activation function is better we can simply run the model with different activation functions and compare the results
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax since our model is a classifier
    # why softmex: when we are creating a classifier, the activation function of the output layer must transform values into probabilities.
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
    ])

### 2- Choose the optimizer and the loss function

In [11]:
# we define the optimizer we'd like to use, we know that ADAM is the best becuase it combines momentum with RMSprop 
# the loss function, cross entropthy is the first option, however there are different types of cross entrophy. in tensorflow2
# there are 3 differnet built-in variations of cross entrophy: binary cross entrophy, categorical cross entrophy, and sparse categorical cross entrophy.   
# binary cross entrophy is for binary encoding, 
# categorical cross entrophy: for one-hot encoding,
# we use sparse_categorical_crossentropy to ensure our integer targets are one-hot encoded appropriately when calculating the loss. 
# and the metrics we are interested in obtaining at each iteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # metrics=['accuracy'] : because we want to calculate accuracy after each epoch



### 3- Training

In [5]:
# speaking of batch size we already said that we won't take advantage of iterable objects that contain the data, instead we will employ simple arrays, so the batching itself will be indicated when we fit the model.
# setting HYPERPARAMETERS:
BATCH_SIZE = 100
MAX_EPOCHS = 100

# we fit the model, specifying the training data, the total number of epochs
# and the validation data we just created ourselves in the format: (inputs,targets)
# we can also fit the data seperately
model.fit(train_inputs,
          train_targets,
          batch_size = BATCH_SIZE,
          epochs=MAX_EPOCHS, validation_data = (validation_inputs, validation_targets), verbose=2)

Train on 3579 samples, validate on 447 samples
Epoch 1/100
3579/3579 - 0s - loss: 0.5166 - accuracy: 0.8002 - val_loss: 0.3840 - val_accuracy: 0.8837
Epoch 2/100
3579/3579 - 0s - loss: 0.3525 - accuracy: 0.8785 - val_loss: 0.3067 - val_accuracy: 0.8814
Epoch 3/100
3579/3579 - 0s - loss: 0.3152 - accuracy: 0.8846 - val_loss: 0.2880 - val_accuracy: 0.8949
Epoch 4/100
3579/3579 - 0s - loss: 0.2957 - accuracy: 0.8908 - val_loss: 0.2668 - val_accuracy: 0.8971
Epoch 5/100
3579/3579 - 0s - loss: 0.2825 - accuracy: 0.8955 - val_loss: 0.2603 - val_accuracy: 0.8993
Epoch 6/100
3579/3579 - 0s - loss: 0.2739 - accuracy: 0.8994 - val_loss: 0.2462 - val_accuracy: 0.9083
Epoch 7/100
3579/3579 - 0s - loss: 0.2671 - accuracy: 0.9000 - val_loss: 0.2443 - val_accuracy: 0.9083
Epoch 8/100
3579/3579 - 0s - loss: 0.2624 - accuracy: 0.9011 - val_loss: 0.2405 - val_accuracy: 0.9128
Epoch 9/100
3579/3579 - 0s - loss: 0.2575 - accuracy: 0.9042 - val_loss: 0.2384 - val_accuracy: 0.9128
Epoch 10/100
3579/3579 - 0

Epoch 80/100
3579/3579 - 0s - loss: 0.2062 - accuracy: 0.9193 - val_loss: 0.2131 - val_accuracy: 0.9306
Epoch 81/100
3579/3579 - 0s - loss: 0.2084 - accuracy: 0.9209 - val_loss: 0.2203 - val_accuracy: 0.9284
Epoch 82/100
3579/3579 - 0s - loss: 0.2072 - accuracy: 0.9195 - val_loss: 0.2132 - val_accuracy: 0.9351
Epoch 83/100
3579/3579 - 0s - loss: 0.2090 - accuracy: 0.9204 - val_loss: 0.2116 - val_accuracy: 0.9351
Epoch 84/100
3579/3579 - 0s - loss: 0.2033 - accuracy: 0.9226 - val_loss: 0.2246 - val_accuracy: 0.9306
Epoch 85/100
3579/3579 - 0s - loss: 0.2036 - accuracy: 0.9212 - val_loss: 0.2090 - val_accuracy: 0.9351
Epoch 86/100
3579/3579 - 0s - loss: 0.2018 - accuracy: 0.9237 - val_loss: 0.2231 - val_accuracy: 0.9217
Epoch 87/100
3579/3579 - 0s - loss: 0.2050 - accuracy: 0.9201 - val_loss: 0.2271 - val_accuracy: 0.9306
Epoch 88/100
3579/3579 - 0s - loss: 0.2030 - accuracy: 0.9223 - val_loss: 0.2326 - val_accuracy: 0.9262
Epoch 89/100
3579/3579 - 0s - loss: 0.2032 - accuracy: 0.9209 - 

<tensorflow.python.keras.callbacks.History at 0x29e025c2b48>

In [12]:
# as you see from the above reaults: while the train-loss is somehow decreasing the validations loss is flactuating
# which is an idicator of overfitting. 
# so epoch = 100 is too big and we should have employed an early stopping procedure which was missed


#### Correction: set up an early stopping mechanism 

In [13]:
# in the fit method we have an argument called "callbacks"
# callbacks are functions that are called at certain point during model training
# there are many available functions for callbacks. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
# here we use EarlyStopping: stop training when a monitores quantity has stopped improving
# if we use validation cost, it compares the current validation cost to one epoch ago.if it is increasing it stops training.  
# so we need to define another HYPERPARAMETER
EARLY_STOPPING = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

# add it to our model
model.fit(train_inputs,
          train_targets,
          batch_size = BATCH_SIZE,
          epochs=MAX_EPOCHS,
          callbacks = [EARLY_STOPPING],
          validation_data = (validation_inputs, validation_targets), verbose=2)

Train on 3579 samples, validate on 447 samples
Epoch 1/100
3579/3579 - 0s - loss: 0.5117 - accuracy: 0.7972 - val_loss: 0.4000 - val_accuracy: 0.8613
Epoch 2/100
3579/3579 - 0s - loss: 0.3534 - accuracy: 0.8768 - val_loss: 0.3134 - val_accuracy: 0.8881
Epoch 3/100
3579/3579 - 0s - loss: 0.3115 - accuracy: 0.8849 - val_loss: 0.2870 - val_accuracy: 0.8926
Epoch 4/100
3579/3579 - 0s - loss: 0.2920 - accuracy: 0.8921 - val_loss: 0.2673 - val_accuracy: 0.8993
Epoch 5/100
3579/3579 - 0s - loss: 0.2780 - accuracy: 0.8961 - val_loss: 0.2541 - val_accuracy: 0.9060
Epoch 6/100
3579/3579 - 0s - loss: 0.2691 - accuracy: 0.8994 - val_loss: 0.2445 - val_accuracy: 0.9128
Epoch 7/100
3579/3579 - 0s - loss: 0.2613 - accuracy: 0.9003 - val_loss: 0.2392 - val_accuracy: 0.9128
Epoch 8/100
3579/3579 - 0s - loss: 0.2562 - accuracy: 0.9033 - val_loss: 0.2322 - val_accuracy: 0.9150
Epoch 9/100
3579/3579 - 0s - loss: 0.2516 - accuracy: 0.9036 - val_loss: 0.2401 - val_accuracy: 0.9217
Epoch 10/100
3579/3579 - 0

<tensorflow.python.keras.callbacks.History at 0x29e03d48088>

In [14]:
# you can see it terminates the model after 7 epochs, and the accuracy is 91%, it means that we had overfitting
# if the increase is insignificant we can allow the model to go for 1 or 2 more slide, we can change the patience
# which is set by default to zero. because we should not be so srict about small increase in validation_loss


### 4- Test

In [16]:
# sometimes we have test_data which is ready for evaluate:
# test_loss, test_accuracy = model.evaluate(test_data)
# here our test data is Audiobooks_data_test, but we can't put it there because is not ready, but
# as you remember we change the type of the numbers in this data
# test_inputs = npz['inputs'].astype(np.float) ,test_targets = npz['targets'].astype(np.int)
# so instead of using the whole data we can put targets and inputs separately
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [17]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.28. Test accuracy: 88.62%


In [None]:
# that is the final accuracy of the model. 