# Business Case: Audiobook App Buyer Behaviour
## Machine Learning Models

Below I create a machine learning algorithm based on our available data that can predict if a customer will buy again from an Audiobook company.Each customer in the database has made a purchase at least once, that's why he/she is in the database. 

It is ineffective to spend money on marketting to customers with a low proabbility of making additional purchaces. Through targettted marketting campaings catering to customers with higher probabilities of making additional purchaces, the company can prevent the ineffective spending of its marketing budget. Moreover, this model can identify the most important metrics that lead to returning customers, thereby identifying opportunities for growth.

The pre-processed data contains the following variables: Book length in mins_avg (average of all purchases), Book length in minutes_sum (sum of all purchases), Price Paid_avg (average of all purchases), Price paid_sum (sum of all purchases), Review (a Boolean variable), Review (out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number), and Last visited minus purchase date (in days).

The targets are a Boolean variable (so 0, or 1). We are taking a period of 2 years in our inputs, and the next 6 months as targets. Therefore the model predicts if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months.


## Machine learning algorithm
### Import libraries


In [5]:
import numpy as np
import tensorflow as tf

ModuleNotFoundError: No module named 'tensorflow'

### Data

In [None]:
# create temporary variable npz, to store each of the three Audiobooks datasets
# training data
npz = np.load('Audiobooks_data_train.npz')
# extract the inputs and ensure all are floats
train_inputs = npz['inputs'].astype(np.float)
# extract targets and ensure all are integer (for one-hot encoding)
train_targets = npz['targets'].astype(np.int)

# validation data 
npz = np.load('Audiobooks_data_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

# testing data
npz = np.load('Audiobooks_data_test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

### Model

- outline
- optimizers
- loss
- early stopping
- training


In [None]:
# Set the input and output sizes
input_size = 10
output_size = 2
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
    
# define how the model will look like
model = tf.keras.Sequential([
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # most important argumenrts here are hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    # make sure to activate last layer with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])


### Choose optimizer, loss function and metrics after each itteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Trainging the model

In [None]:
# set the batch size
batch_size = 100

# set a maximum number of training epochs
max_epochs = 100

# set an early stopping mechanism
# let's set patience=2, to be a bit tolerant against random validation loss increases
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

### Fitting the model

In [None]:
model.fit(train_inputs, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
          # callbacks are functions called by a task when a task is completed
          # task here is to check if val_loss is increasing
          callbacks=[early_stopping], # early stopping
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # making sure we get enough information about the training process
          )  



## Test the model

Test the final prediction power of our model by running it on the test dataset that the algorithm has not seen before. Only test once completely done with adjusting your model. Adjustment of model after testingleads to overfitting.


In [None]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)


In [None]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

The final test accuracy should be roughly around 91%.