# 3- Audiobooks business case - Data Modelling

## Problem

You are given data from an Audiobook app. Logically, it relates only to the audio versions of books. Each customer in the database has made a purchase at least once, that's why he/she is in the database. We want to create a machine learning algorithm based on our available data that can predict if a customer will buy again from the Audiobook company.

The main idea is that if a customer has a low probability of coming back, there is no reason to spend any money on advertizing to him/her. If we can focus our efforts ONLY on customers that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again. Identifying new customers creates value and growth opportunities.

You have a .csv summarizing the data. There are several variables: Customer ID, Book length in mins_avg (average of all purchases), Book length in minutes_sum (sum of all purchases), Price Paid_avg (average of all purchases), Price paid_sum (sum of all purchases), Review (a Boolean variable), Review (out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number), and Last visited minus purchase date (in days).

So these are the inputs (excluding customer ID, as it is completely arbitrary. It's more like a name, than a number).

The targets are a Boolean variable (so 0, or 1). We are taking a period of 2 years in our inputs, and the next 6 months as targets. So, in fact, we are predicting if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months. 6 months sounds like a reasonable time. If they don't convert after 6 months, chances are they've gone to a competitor or didn't like the Audiobook way of digesting information. 

The task is simple: create a machine learning algorithm, which is able to predict if a customer will buy again. 

This is a classification problem with two classes: won't buy and will buy, represented by 0s and 1s. 


## Create the machine learning algorithm



### Import the relevant libraries

In [1]:
# we must import the libraries once again since we haven't imported them in this file
import numpy as np
import tensorflow as tf

### Load Data
- when we saved each of the Train, Validation and Test, they were saved as 2 tuples of inputs and targets
- When loading them, we will load them as inputs and targets seperately

In [2]:
# let's create a temporary variable npz, where we will store each of the three Audiobooks datasets
npz = np.load('Audiobooks_data_train.npz')

# we extract the inputs using the keyword under which we saved them
# to ensure that they are all floats, let's also take care of that
train_inputs = npz['inputs'].astype(np.float)
# targets must be int because of sparse_categorical_crossentropy (we want to be able to smoothly one-hot encode them)
train_targets = npz['targets'].astype(np.int)

# we load the validation data in the temporary variable
npz = np.load('Audiobooks_data_validation.npz')
# we can load the inputs and the targets in the same line
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

# we load the test data in the temporary variable
npz = np.load('Audiobooks_data_test.npz')
# we create 2 variables that will contain the test inputs and the test targets
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)
#Train Data


### Model
Outline, optimizers, loss, early stopping and training

In [3]:
# Set the input and output sizes
input_size = 10
output_size = 2
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
    
# define how the model will look like

model = tf.keras.Sequential([
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    # Unlike the Mnist we dont need the first layer (Flatten the matrix to vector) - We already have preprocessed inputs
    
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    # the final layer is no different, we just make sure to activate it with softmax - the output is a classifier
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])


### Choose the optimizer and the loss function

# we define the optimizer we'd like to use, 
# the loss function, chosen as that so we have our targets one hot encoded when compared to outputs
# and the metrics we are interested in obtaining at each iteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
# That's where we train the model we have built.

# set the batch size
batch_size = 100

# set a maximum number of training epochs
max_epochs = 100


# fit the model
# note that this time the train, validation and test data are not iterable
# the batch size is directly done when we are fitting not using .batch
# this time unlike the MNIST which was 1 iterable 2 Tuple object, this time we have inputs and targets seperately

model.fit(train_inputs, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # making sure we get enough information about the training process
          )  

# We need a mechanism that we dont overfit - Notice the Validation accuracy oscillating
# We didnt care about the early stopping mechanism or overfitting since the data was well prerprocessed - but we need that
# Early stopping = Either updates too small means loss function minimized OR validation loss divergess from Training Loss 
# Whichever is first 

Train on 3579 samples, validate on 447 samples
Epoch 1/100
3579/3579 - 0s - loss: 0.5656 - accuracy: 0.7664 - val_loss: 0.4250 - val_accuracy: 0.8770
Epoch 2/100
3579/3579 - 0s - loss: 0.3760 - accuracy: 0.8704 - val_loss: 0.3238 - val_accuracy: 0.8904
Epoch 3/100
3579/3579 - 0s - loss: 0.3222 - accuracy: 0.8807 - val_loss: 0.2965 - val_accuracy: 0.8993
Epoch 4/100
3579/3579 - 0s - loss: 0.3007 - accuracy: 0.8860 - val_loss: 0.2815 - val_accuracy: 0.9016
Epoch 5/100
3579/3579 - 0s - loss: 0.2872 - accuracy: 0.8910 - val_loss: 0.2664 - val_accuracy: 0.9016
Epoch 6/100
3579/3579 - 0s - loss: 0.2767 - accuracy: 0.8949 - val_loss: 0.2590 - val_accuracy: 0.9105
Epoch 7/100
3579/3579 - 0s - loss: 0.2721 - accuracy: 0.8975 - val_loss: 0.2506 - val_accuracy: 0.9083
Epoch 8/100
3579/3579 - 0s - loss: 0.2633 - accuracy: 0.8989 - val_loss: 0.2450 - val_accuracy: 0.9083
Epoch 9/100
3579/3579 - 0s - loss: 0.2605 - accuracy: 0.9028 - val_loss: 0.2374 - val_accuracy: 0.9128
Epoch 10/100
3579/3579 - 0

Epoch 80/100
3579/3579 - 0s - loss: 0.2151 - accuracy: 0.9198 - val_loss: 0.2160 - val_accuracy: 0.9262
Epoch 81/100
3579/3579 - 0s - loss: 0.2131 - accuracy: 0.9198 - val_loss: 0.2257 - val_accuracy: 0.9262
Epoch 82/100
3579/3579 - 0s - loss: 0.2172 - accuracy: 0.9195 - val_loss: 0.2205 - val_accuracy: 0.9284
Epoch 83/100
3579/3579 - 0s - loss: 0.2130 - accuracy: 0.9201 - val_loss: 0.2139 - val_accuracy: 0.9217
Epoch 84/100
3579/3579 - 0s - loss: 0.2139 - accuracy: 0.9181 - val_loss: 0.2206 - val_accuracy: 0.9284
Epoch 85/100
3579/3579 - 0s - loss: 0.2116 - accuracy: 0.9193 - val_loss: 0.2131 - val_accuracy: 0.9284
Epoch 86/100
3579/3579 - 0s - loss: 0.2107 - accuracy: 0.9206 - val_loss: 0.2368 - val_accuracy: 0.9239
Epoch 87/100
3579/3579 - 0s - loss: 0.2142 - accuracy: 0.9198 - val_loss: 0.2335 - val_accuracy: 0.9239
Epoch 88/100
3579/3579 - 0s - loss: 0.2140 - accuracy: 0.9206 - val_loss: 0.2248 - val_accuracy: 0.9262
Epoch 89/100
3579/3579 - 0s - loss: 0.2115 - accuracy: 0.9190 - 

<tensorflow.python.keras.callbacks.History at 0x18ab0e5e388>

### *** Side Note - BATCHING
- For Batching, why didn't we do it like Mnist and used .Batch(batch_size) method. Whats the difference between that and batching when directly fitting the model?
- train_data.batch(BATCH_SIZE) is a method of tf.data data sets. Its is the same as batching via model.fit().

### *** Side Note - VALIDATION STEPS
- Validation steps is used when validation data is a tf.data dataset. Here, validation data is just a numpy array. That's why validation steps is not required.