# Practical example. Audiobooks

## Problem

You are given data from an Audiobook app. Logically, it relates only to the audio versions of books. Each customer in the database has made a purchase at least once, that's why he/she is in the database. We want to create a machine learning algorithm based on our available data that can predict if a customer will buy again from the Audiobook company.

The main idea is that if a customer has a low probability of coming back, there is no reason to spend any money on advertizing to him/her. If we can focus our efforts ONLY on customers that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again. Identifying new customers creates value and growth opportunities.

You have a .csv summarizing the data. There are several variables: Customer ID, Book length in mins_avg (average of all purchases), Book length in minutes_sum (sum of all purchases), Price Paid_avg (average of all purchases), Price paid_sum (sum of all purchases), Review (a Boolean variable), Review (out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number), and Last visited minus purchase date (in days).

So these are the inputs (excluding customer ID, as it is completely arbitrary. It's more like a name, than a number).

The targets are a Boolean variable (so 0, or 1). We are taking a period of 2 years in our inputs, and the next 6 months as targets. So, in fact, we are predicting if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months. 6 months sounds like a reasonable time. If they don't convert after 6 months, chances are they've gone to a competitor or didn't like the Audiobook way of digesting information. 

The task is simple: create a machine learning algorithm, which is able to predict if a customer will buy again. 

This is a classification problem with two classes: won't buy and will buy, represented by 0s and 1s. 

Good luck!

## Create the machine learning algorithm



### Import the relevant libraries

In [1]:
import numpy as np
import tensorflow as tf

### Data

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
#Load the npz file we created
npz = np.load('/content/drive/My Drive/Customer Analytics/Audiobooks_data_train.npz')

train_inputs = npz['inputs'].astype(np.float)
train_targets = npz['targets'].astype(np.int)

npz = np.load('/content/drive/My Drive/Customer Analytics/Audiobooks_data_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

npz = np.load('/content/drive/My Drive/Customer Analytics/Audiobooks_data_test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

### Model
Outline, optimizers, loss, early stopping and training

In [4]:
#input_size = 10
#Set the output size
output_size = 2
hidden_layer_size = 50

#Define the model
model = tf.keras.Sequential([
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')    
                            ])
#Choose the optimizer and the loss function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics = ['accuracy'])

#Training the function, set the batch size, max epochs and early stopping 
batch_size = 100

max_epochs=100

early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

#Fit the model
model.fit(train_inputs,
          train_targets,
          batch_size=batch_size,
          epochs=max_epochs,
          callbacks=[early_stopping],
          validation_data=(validation_inputs, validation_targets),
          verbose=2
          )

Epoch 1/100
36/36 - 1s - loss: 0.5394 - accuracy: 0.7910 - val_loss: 0.4171 - val_accuracy: 0.8904
Epoch 2/100
36/36 - 0s - loss: 0.3730 - accuracy: 0.8734 - val_loss: 0.3145 - val_accuracy: 0.8949
Epoch 3/100
36/36 - 0s - loss: 0.3215 - accuracy: 0.8829 - val_loss: 0.2829 - val_accuracy: 0.9083
Epoch 4/100
36/36 - 0s - loss: 0.3013 - accuracy: 0.8846 - val_loss: 0.2683 - val_accuracy: 0.9083
Epoch 5/100
36/36 - 0s - loss: 0.2884 - accuracy: 0.8908 - val_loss: 0.2560 - val_accuracy: 0.9150
Epoch 6/100
36/36 - 0s - loss: 0.2783 - accuracy: 0.8938 - val_loss: 0.2486 - val_accuracy: 0.9150
Epoch 7/100
36/36 - 0s - loss: 0.2710 - accuracy: 0.8986 - val_loss: 0.2353 - val_accuracy: 0.9239
Epoch 8/100
36/36 - 0s - loss: 0.2659 - accuracy: 0.8963 - val_loss: 0.2323 - val_accuracy: 0.9195
Epoch 9/100
36/36 - 0s - loss: 0.2606 - accuracy: 0.9022 - val_loss: 0.2275 - val_accuracy: 0.9217
Epoch 10/100
36/36 - 0s - loss: 0.2563 - accuracy: 0.9011 - val_loss: 0.2234 - val_accuracy: 0.9262
Epoch 11/

<tensorflow.python.keras.callbacks.History at 0x7f85a45d84d0>

## Test the model

In [5]:
#Utilize the evaluate function to get the loss and accuracy.
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [6]:
#Print formatted result
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.24. Test accuracy: 91.07%


## Obtain the probability for a customer to convert

In [7]:
#Predict the probability of test dataset.
model.predict(test_inputs).round(2)

array([[0.  , 1.  ],
       [0.02, 0.98],
       [0.88, 0.12],
       [0.  , 1.  ],
       [0.02, 0.98],
       [0.91, 0.09],
       [0.96, 0.04],
       [0.17, 0.83],
       [0.22, 0.78],
       [0.  , 1.  ],
       [0.94, 0.06],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [0.92, 0.08],
       [1.  , 0.  ],
       [0.83, 0.17],
       [1.  , 0.  ],
       [0.  , 1.  ],
       [0.  , 1.  ],
       [1.  , 0.  ],
       [0.03, 0.97],
       [0.91, 0.09],
       [0.71, 0.29],
       [0.91, 0.09],
       [0.85, 0.15],
       [0.97, 0.03],
       [0.5 , 0.5 ],
       [0.  , 1.  ],
       [0.94, 0.06],
       [0.  , 1.  ],
       [0.89, 0.11],
       [0.7 , 0.3 ],
       [0.47, 0.53],
       [0.54, 0.46],
       [0.01, 0.99],
       [0.69, 0.31],
       [0.29, 0.71],
       [0.17, 0.83],
       [0.3 , 0.7 ],
       [0.24, 0.76],
       [0.01, 0.99],
       [0.91, 0.09],
       [0.  , 1.  ],
       [0.  , 1.  ],
       [0.98, 0.02],
       [1.  , 0.  ],
       [0.88, 0.12],
       [0.99,

In [8]:
model.predict(test_inputs)[:,1].round(0)

array([1., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
       1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0.,
       1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1.,
       1., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
       1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 1.,
       1., 0., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.,
       0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0., 1., 1., 1.,
       0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1.,
       1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 1., 1., 1., 0.,
       1., 1., 1., 1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1., 1.,
       1., 1., 0., 0., 1.

In [9]:
#Utilize the argmax function to get the position of highest number.
np.argmax(model.predict(test_inputs), axis=1)

array([1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1,
       1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1,
       0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0,
       1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1,
       1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0,
       1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1,
       1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,

## Save the model

In [10]:
model.save('audiobooks_model.h5')

In [11]:
%cp audiobooks_model.h5 ./drive/MyDrive/
