<h1>Practical example. Audiobooks</h1>

## Problem

You are given data from an Audiobook app. Logically, it relates only to the audio versions of books. Each customer in the database has made a purchase at least once, that's why he/she is in the database. We want to create a machine learning algorithm based on our available data that can predict if a customer will buy again from the Audiobook company.

The main idea is that if a customer has a low probability of coming back, there is no reason to spend any money on advertizing to him/her. If we can focus our efforts ONLY on customers that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again. Identifying new customers creates value and growth opportunities.

You have a .csv summarizing the data. There are several variables: Customer ID, Book length in mins_avg (average of all purchases), Book length in minutes_sum (sum of all purchases), Price Paid_avg (average of all purchases), Price paid_sum (sum of all purchases), Review (a Boolean variable), Review (out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number), and Last visited minus purchase date (in days).

So these are the inputs (excluding customer ID, as it is completely arbitrary. It's more like a name, than a number).

The targets are a Boolean variable (so 0, or 1). We are taking a period of 2 years in our inputs, and the next 6 months as targets. So, in fact, we are predicting if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months. 6 months sounds like a reasonable time. If they don't convert after 6 months, chances are they've gone to a competitor or didn't like the Audiobook way of digesting information. 

The task is simple: create a machine learning algorithm, which is able to predict if a customer will buy again. 

This is a classification problem with two classes: won't buy and will buy, represented by 0s and 1s. 

Good luck!

<h2>Create the machine learning algorithm</h2>

<h2>Import the relevant libraries</h2>

In [14]:
import numpy as np
from datetime import datetime

import tensorflow_datasets as tfds
import tensorflow as tf

In [16]:
npz = np.load('Audiobooks_data_train.npz')

train_inputs = npz['inputs'].astype(np.float)
train_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

<h2>Model</h2>

Outline, optimizers, loss, early stopping and training

In [24]:
input_size = 10
output_size = 2
hidden_layer_size = 50

model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

batch_size = 100

max_epochs = 100

early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

logdir="logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

model.fit(train_inputs,
          train_targets,
          batch_size = batch_size,
          epochs = max_epochs,
          validation_data=(validation_inputs, validation_targets),
          verbose=2,
          callbacks=[tensorboard_callback, early_stopping])

Epoch 1/100
36/36 - 0s - loss: 0.5751 - accuracy: 0.7298 - val_loss: 0.4628 - val_accuracy: 0.8568
Epoch 2/100
36/36 - 0s - loss: 0.3924 - accuracy: 0.8692 - val_loss: 0.3707 - val_accuracy: 0.8725
Epoch 3/100
36/36 - 0s - loss: 0.3335 - accuracy: 0.8799 - val_loss: 0.3411 - val_accuracy: 0.8770
Epoch 4/100
36/36 - 0s - loss: 0.3097 - accuracy: 0.8860 - val_loss: 0.3244 - val_accuracy: 0.8814
Epoch 5/100
36/36 - 0s - loss: 0.2955 - accuracy: 0.8908 - val_loss: 0.3098 - val_accuracy: 0.8814
Epoch 6/100
36/36 - 0s - loss: 0.2858 - accuracy: 0.8944 - val_loss: 0.3036 - val_accuracy: 0.8814
Epoch 7/100
36/36 - 0s - loss: 0.2795 - accuracy: 0.8975 - val_loss: 0.2905 - val_accuracy: 0.8814
Epoch 8/100
36/36 - 0s - loss: 0.2696 - accuracy: 0.8991 - val_loss: 0.2828 - val_accuracy: 0.8904
Epoch 9/100
36/36 - 0s - loss: 0.2631 - accuracy: 0.9005 - val_loss: 0.2773 - val_accuracy: 0.8881
Epoch 10/100
36/36 - 0s - loss: 0.2589 - accuracy: 0.9011 - val_loss: 0.2714 - val_accuracy: 0.8904
Epoch 11/

<keras.callbacks.History at 0x7f0adc781750>

<h2>Test the model</h2>

In [29]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

[[-2.28424667 -1.12491153 -0.38189654 ... -0.8635056   2.23179102
  -0.57015264]
 [ 0.64678203  2.22511345  0.20727535 ... -0.54895547 -0.20536617
  -0.68724869]
 [ 1.18956512  0.36398846  0.67728889 ... -0.8635056  -0.20536617
   0.728549  ]
 ...
 [-0.76445401 -0.75268653 -0.38189654 ... -0.8635056  -0.20536617
  -0.77240946]
 [ 1.18956512  0.36398846  0.29995408 ...  0.24017908 -0.20536617
  -0.77240946]
 [-1.8500202  -1.37306153 -0.38189654 ...  1.53700858 -0.20536617
   0.61145295]]


In [28]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.25. Test accuracy: 91.07%
