# Audiobooks business case

## Overview

This is a Machine Learning problem wich consists on estimating the likelihood of an audiobook customer making another purchase (conversion), given past purchases and usage information. If a potential recurring customer can be identified, marketing campaings can be more effectively by focusing efforts and resources to these clients.  

We were given a raw .csv file  which needs to processed in order to extrac relevant data to build the model. Our file represents 2 years of past user engagement data and a final column indicating if any purchase was made by the customer in the following 6 months after that period. In other words, past data contains the inputs and purchase info is the target of the model.

The present notebook uses the files created in the previous preprocessing notebook to build, train and test the model. 

## Creating the algorithm

### Importing libraries

In [1]:
import numpy as np
import tensorflow as tf

### Import the data

In [2]:
# Using a temporary variable 'npz' to store the data

npz = np.load('Audiobooks_data_train.npz')
train_inputs = npz['inputs'].astype(np.float)
train_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_valid.npz')
valid_inputs = npz['inputs'].astype(np.float)
valid_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_test.npz')
test_inputs = npz['inputs'].astype(np.float)
test_targets = npz['targets'].astype(np.int)

### Building the model

Here the model is built, while the hyperparameters, width and length of the model are set. The values here were tweaked in order to improve the model accuracy during testing.

In [3]:
# Setting the hyperparameters
output_size = 2
hidden_layer_size = 200
batch_size = 50
epochs = 100

# Outlining the model layout
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

# Defining the objective function (optimizer) and loss function
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

# Setting an early stopping mechanism
early_stop = tf.keras.callbacks.EarlyStopping(patience=3)

# Fitting the model
model.fit(
    train_inputs,
    train_targets,
    batch_size = batch_size,
    epochs = epochs,
    callbacks = early_stop,
    validation_data = (valid_inputs,valid_targets),
    verbose = 2
)

Epoch 1/100
72/72 - 0s - loss: 0.4761 - accuracy: 0.7505 - val_loss: 0.4509 - val_accuracy: 0.7293
Epoch 2/100
72/72 - 0s - loss: 0.4148 - accuracy: 0.7793 - val_loss: 0.4242 - val_accuracy: 0.7405
Epoch 3/100
72/72 - 0s - loss: 0.3865 - accuracy: 0.7932 - val_loss: 0.4047 - val_accuracy: 0.7696
Epoch 4/100
72/72 - 0s - loss: 0.3724 - accuracy: 0.8005 - val_loss: 0.4243 - val_accuracy: 0.7517
Epoch 5/100
72/72 - 0s - loss: 0.3762 - accuracy: 0.7977 - val_loss: 0.4086 - val_accuracy: 0.7494
Epoch 6/100
72/72 - 0s - loss: 0.3718 - accuracy: 0.7991 - val_loss: 0.4021 - val_accuracy: 0.7494
Epoch 7/100
72/72 - 0s - loss: 0.3640 - accuracy: 0.8108 - val_loss: 0.4069 - val_accuracy: 0.7718
Epoch 8/100
72/72 - 0s - loss: 0.3594 - accuracy: 0.8103 - val_loss: 0.3890 - val_accuracy: 0.7629
Epoch 9/100
72/72 - 0s - loss: 0.3631 - accuracy: 0.7980 - val_loss: 0.4094 - val_accuracy: 0.7696
Epoch 10/100
72/72 - 0s - loss: 0.3612 - accuracy: 0.8044 - val_loss: 0.4008 - val_accuracy: 0.7539
Epoch 11/

<tensorflow.python.keras.callbacks.History at 0x2956fd64888>

## Testing the model

In [5]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets, verbose=0)
print('\nTest loss: {0:.2f}. \nTest accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.33. 
Test accuracy: 81.92%


## Takeaway

The model achieved an accuracy of roughly 82%. In other words, out of 10 customers, it could correctly figure out if 8 of them would convert again, based on their past data. It's a remarkable result, which could translate to better cost allocation on marketing campaings. 