<p class="h1">Deep Neural Network for Churn Prediction for an Audiobook App's Users with Tensorflow 2</p>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction:-Audiobooks-App-Business-Case" data-toc-modified-id="Introduction:-Audiobooks-App-Business-Case-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction: Audiobooks App Business Case</a></span></li><li><span><a href="#Preprocess-the-Data" data-toc-modified-id="Preprocess-the-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Preprocess the Data</a></span><ul class="toc-item"><li><span><a href="#Extract-the-data-from-the-csv" data-toc-modified-id="Extract-the-data-from-the-csv-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Extract the data from the csv</a></span><ul class="toc-item"><li><span><a href="#Viewing-the-data" data-toc-modified-id="Viewing-the-data-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Viewing the data</a></span></li></ul></li><li><span><a href="#Balance-the-dataset" data-toc-modified-id="Balance-the-dataset-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Balance the dataset</a></span></li><li><span><a href="#Standardize-the-inputs" data-toc-modified-id="Standardize-the-inputs-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Standardize the inputs</a></span></li><li><span><a href="#Shuffle-the-data" data-toc-modified-id="Shuffle-the-data-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Shuffle the data</a></span></li><li><span><a href="#Split-the-dataset-into-train,-validation,-and-test" data-toc-modified-id="Split-the-dataset-into-train,-validation,-and-test-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Split the dataset into train, validation, and test</a></span></li><li><span><a href="#Save-the-train,-test,-and-validation-datasets-to-external-.npz-files" data-toc-modified-id="Save-the-train,-test,-and-validation-datasets-to-external-.npz-files-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Save the train, test, and validation datasets to external <code>.npz</code> files</a></span></li></ul></li><li><span><a href="#Preparing-the-Machine-Learning-Model" data-toc-modified-id="Preparing-the-Machine-Learning-Model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Preparing the Machine Learning Model</a></span><ul class="toc-item"><li><span><a href="#Model" data-toc-modified-id="Model-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Model</a></span></li></ul></li><li><span><a href="#Test-the-Model" data-toc-modified-id="Test-the-Model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Test the Model</a></span></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Conclusion</a></span></li></ul></div>

## Introduction: Audiobooks App Business Case

This project employs a deep neural network model to determine whether or not an audio book app user will make another purchase on the platform based on purchase, and usage data collected over a two year period.

The data is from an Audiobook App. Logically, it relates to the audio versions of books ONLY. Each customer in the database has made a purchase at least once, that's why they are in the database.  
The objective is to create a machine learning algorithm based on our available data that can predict if a customer will buy again from the Audiobook company.

The main idea is that if a customer has a low probability of coming back, there is no reason to spend any money on advertising to him/her. If we can focus our efforts SOLELY on customers that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again. Identifying new customers creates value and growth opportunities.

###### Dataset Summary
You have a `.csv` summarizing the data. There are several variables:   
- Customer ID,  
- Book length overall (sum of the minute length of all purchases),  
- Book length avg (average length in minutes of all purchases),  
- Price paid_overall (sum of all purchases),  
- Price Paid avg (average of all purchases),  
- Review (a Boolean variable whether the customer left a review),   
- Review out of 10 (if the customer left a review, his/her review out of 10),   
- Total minutes listened,   
- Completion,   
- Support requests (number of support requests; everything from forgotten password to assistance for using the App), and   
- Last visited minus purchase date (in days).

These are the input features of the dataset.

The target is a Boolean variable (0 or 1). The data for the inputs is collected over a period of 2 years. Data of whether a customer churns or not is taken at the end of following six (6) months following the initial 2 year period.
So, in fact, we are predicting if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months. If they don't convert after 6 months, chances are they've gone to a competitor or didn't like the Audiobook way of digesting information.

###### Objective
The task is simple: create a machine learning algorithm, which is able to predict if a customer will buy again.

This is a binary classification problem with two classes: won't buy and will buy, represented by 0s and 1s.  
- 1: The customer converted   
- 0: The customer did not convert.

## Preprocess the Data

### Extract the data from the csv

In [1]:
import numpy as np
from sklearn import preprocessing

raw_csv_data = np.loadtxt('dataset/Audiobooks_data.csv', delimiter = ',')

# The inputs are all columns in the csv, except the first one (Customer ID), and the last one (targets)
unscaled_inputs_all = raw_csv_data[:,1:-1]

# The targets are in the last column.
targets_all = raw_csv_data[:,-1]

#### Viewing the data

In [2]:
import pandas as pd


columns = ["Customer ID", "Book length overall", "Book length avg", "Price paid_overall", "Price Paid avg", 
           "Review", "Review out of 10", "Total minutes listened", "Completion", "Support requests", 
           "Last visited minus purchase date (in days)", "target"]

# select the first 10 rows of the dataset for viewing 
pd.DataFrame(raw_csv_data[:10,:], columns=columns)

Unnamed: 0,Customer ID,Book length overall,Book length avg,Price paid_overall,Price Paid avg,Review,Review out of 10,Total minutes listened,Completion,Support requests,Last visited minus purchase date (in days),target
0,994.0,1620.0,1620.0,19.73,19.73,1.0,10.0,0.99,1603.8,5.0,92.0,0.0
1,1143.0,2160.0,2160.0,5.33,5.33,0.0,8.91,0.0,0.0,0.0,0.0,0.0
2,2059.0,2160.0,2160.0,5.33,5.33,0.0,8.91,0.0,0.0,0.0,388.0,0.0
3,2882.0,1620.0,1620.0,5.96,5.96,0.0,8.91,0.42,680.4,1.0,129.0,0.0
4,3342.0,2160.0,2160.0,5.33,5.33,0.0,8.91,0.22,475.2,0.0,361.0,0.0
5,3416.0,2160.0,2160.0,4.61,4.61,0.0,8.91,0.0,0.0,0.0,0.0,0.0
6,4949.0,2160.0,2160.0,5.33,5.33,0.0,8.91,0.04,86.4,0.0,366.0,0.0
7,9011.0,648.0,648.0,5.33,5.33,0.0,8.91,0.0,0.0,0.0,0.0,1.0
8,9282.0,2160.0,2160.0,5.33,5.33,0.0,8.91,0.26,561.6,0.0,33.0,0.0
9,10500.0,2160.0,2160.0,5.33,5.33,1.0,10.0,0.27,583.2,0.0,366.0,0.0


### Balance the dataset
To avoid introducing bias into the model, the training data will be balanced.  
To balance a dataset means making the dataset such that the training data is comprised of an equal number of samples belonging to each of the two target classes i.e the number of samples with target=0 are equal to the number of samples with target=1, and then dropping the rest.  
This will result in the training data used for the model having a 50-50 split for targets 0 and 1.

In [3]:
# Count how many targets are 1 
num_one_targets = int(np.sum(targets_all))
                      
# Set a counter for targets that are 0 
zero_targets_counter = 0

indices_to_remove = []
                      
# Extract the index of the remaining rows after the number of rows for targets 1 and 0 are equal.
for i in range(targets_all.shape[0]):
    if targets_all[i] ==0:
        zero_targets_counter += 1
        if zero_targets_counter > num_one_targets:
            indices_to_remove.append(i)

# variable to store the input and target after the excess rows have been deleted.
unscaled_inputs_equal_priors = np.delete(unscaled_inputs_all, indices_to_remove, axis = 0)
targets_equal_priors = np.delete (targets_all, indices_to_remove, axis=0)

### Standardize the inputs
Preprocess the data by bringing standardizing each feature of the input data. This process will result in the each feature having a mean equal to 0, and a standard deviation equal to 1.

In [4]:
scaled_inputs = preprocessing.scale(unscaled_inputs_equal_priors)

### Shuffle the data
The dataset is sorted by date, this might make the model learn trends according to this arrangement of the data.  
Shuffling will ensure the data will be as randomly spread as possible.

In [5]:
shuffled_indices = np.arange(scaled_inputs.shape[0])
np.random.shuffle(shuffled_indices)

shuffled_inputs = scaled_inputs[shuffled_indices]
shuffled_targets = targets_equal_priors[shuffled_indices]

### Split the dataset into train, validation, and test

In [6]:
# Count the total number of samples
samples_count = shuffled_inputs.shape[0]

# Count the samples in each subset, assuming an 80-10-10 distribution of training, validation, and test.
train_samples_count = int(0.8*samples_count)
validation_samples_count = int(0.1*samples_count)
test_samples_count = samples_count - train_samples_count - validation_samples_count

# Create variables that record the inputs and targets for training
# In our shuffled dataset, they are the first "train_samples_count" observations
train_inputs = shuffled_inputs[:train_samples_count]
train_targets = shuffled_targets[:train_samples_count]

# Create variables that record the inputs and targets for test.
# They are everything that is remaining.
validation_inputs = shuffled_inputs[train_samples_count:train_samples_count+validation_samples_count]
validation_targets = shuffled_targets[train_samples_count:train_samples_count+validation_samples_count]

# Create variables that record the inputs and targets for test.
# They are everything that is remaining.
test_inputs = shuffled_inputs[train_samples_count+validation_samples_count:]
test_targets = shuffled_targets[train_samples_count+validation_samples_count:]

# Check if the train, validation, and test data are balanced
# Print the number of targets that are 1s, the total number of samples, and the proportion for training, validation, and test.
print(np.sum(train_targets), train_samples_count, np.sum(train_targets) / train_samples_count)
print(np.sum(validation_targets), validation_samples_count, np.sum(validation_targets) / validation_samples_count)
print(np.sum(test_targets), test_samples_count, np.sum(test_targets) / test_samples_count)

1771.0 3579 0.4948309583682593
229.0 447 0.5123042505592841
237.0 448 0.5290178571428571


### Save the train, test, and validation datasets to external `.npz` files

In [7]:
np.savez('Dataset/Audiobooks_data_train', inputs=train_inputs, targets=train_targets)
np.savez('Dataset/Audiobooks_data_validation', inputs=validation_inputs, targets=validation_targets)
np.savez('Dataset/Audiobooks_data_test', inputs=test_inputs, targets=test_targets)

## Preparing the Machine Learning Model

In [8]:
import tensorflow as tf

In [9]:
# create a temporary variable npz, to store each of the three Audiobooks datasets
npz = np.load('Dataset/Audiobooks_data_train.npz')

# extract the inputs using the keyword under which they were saved. Convert the values to floats.
train_inputs = npz['inputs'].astype(np.float)
# targets must be int because of sparse_categorical_crossentropy (to ensure they are smoothly one-hot encoded).
train_targets = npz['targets'].astype(np.int)

# load the validation data in the temporary variable
npz = np.load('Dataset/Audiobooks_data_validation.npz')
# load the inputs and the targets.
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

# load the test data in the temporary variable
npz = np.load('Dataset/Audiobooks_data_test.npz')
# create 2 variables that will contain the test inputs and the test targets
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

In [10]:
# shape of the dataset
print("Train: ", train_inputs.shape, train_targets.shape, '\n')
print("Validation: ", validation_inputs.shape, validation_targets.shape, '\n')
print("Test: ", test_inputs.shape, test_targets.shape)

Train:  (3579, 10) (3579,) 

Validation:  (447, 10) (447,) 

Test:  (448, 10) (448,)


### Model
Create the deep learning neural network model architecture. Its architecture comprises: a network depth of **five (5) hidden layers**, each layer with **ReLU** activation function (read about activation functions [here](https://datasatrapy.herokuapp.com/post/1#Activation-Function)), a **Sparse Categorical Cross-Entropy** loss function, an **ADAM** optimizer (read about loss functions and optimizers [here](https://datasatrapy.herokuapp.com/post/1#Choose-the-Optimization-Algorithm-and-the-Loss-Function)), and its performance is measured by **Accuracy** metric.  
As a measure to prevent overfitting, an early stopping callback will be added as a parameter to the `.fit()` method of the model. `EarlyStopping` will interrupt the model training once the validation error stops decreasing after training for a set number of epochs (determined by the `patience` argument).

The output layer uses a **Softmax** activation function. The Softmax activation function is generally the preferred choice for the output layers of classification problems. It works by transforming a bunch of arbitrary numbers into a valid probability distribution. 
In this case of binary classification, the layer will output two values (one value for each target variable), each would range from 0-1, and both will sum up to one. 
In other words, the two values are the probabilities of whether a customer converted or not. The higher probability will determine the final result for that customer.

Set the optimizers, loss, and early stopping. Train the model on the train input data:

In [11]:
# Set the input and output sizes
input_size = 10
output_size = 2

# Using the same hidden layer size for both hidden layers.
hidden_layer_size = 200
    
# define how the model will look like
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])


# Choose the optimizer and the loss function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# set the batch size
batch_size = 100

# set a maximum number of training epochs
max_epochs = 100

# set an early stopping mechanism . Set patience=3, to be a bit tolerant against random validation loss increases
early_stopping = tf.keras.callbacks.EarlyStopping(patience=3)

# fit the model
model.fit(train_inputs, 
          train_targets, 
          batch_size=batch_size,
          epochs=max_epochs, 
          callbacks=[early_stopping],
          validation_data=(validation_inputs, validation_targets), 
          verbose = 2 
          )  

Epoch 1/100
36/36 - 1s - loss: 0.4701 - accuracy: 0.7460 - val_loss: 0.3726 - val_accuracy: 0.7964
Epoch 2/100
36/36 - 1s - loss: 0.3700 - accuracy: 0.7963 - val_loss: 0.3600 - val_accuracy: 0.8121
Epoch 3/100
36/36 - 0s - loss: 0.3597 - accuracy: 0.8039 - val_loss: 0.3446 - val_accuracy: 0.8143
Epoch 4/100
36/36 - 0s - loss: 0.3474 - accuracy: 0.8178 - val_loss: 0.3429 - val_accuracy: 0.8076
Epoch 5/100
36/36 - 0s - loss: 0.3501 - accuracy: 0.8075 - val_loss: 0.3544 - val_accuracy: 0.8009
Epoch 6/100
36/36 - 0s - loss: 0.3403 - accuracy: 0.8153 - val_loss: 0.3269 - val_accuracy: 0.8121
Epoch 7/100
36/36 - 0s - loss: 0.3363 - accuracy: 0.8161 - val_loss: 0.3367 - val_accuracy: 0.8255
Epoch 8/100
36/36 - 0s - loss: 0.3303 - accuracy: 0.8212 - val_loss: 0.3222 - val_accuracy: 0.8166
Epoch 9/100
36/36 - 0s - loss: 0.3308 - accuracy: 0.8125 - val_loss: 0.3299 - val_accuracy: 0.8098
Epoch 10/100
36/36 - 1s - loss: 0.3260 - accuracy: 0.8167 - val_loss: 0.3310 - val_accuracy: 0.8233
Epoch 11/

<keras.callbacks.History at 0x25e8faac4c0>

## Test the Model

Next step is to test the final prediction power of the model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. To measure the true performance of the model, it needs to be evaluated on data that was not used in anyway during training

In [12]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [13]:
print(f'Test loss: {test_loss:.2f}. Test accuracy: {test_accuracy*100.0:.2f}%')

Test loss: 0.31. Test accuracy: 81.03%


## Conclusion
The business case problem was solved by employing artificial intelligence.  
A deep learning neural network model was developed to predict if a customer will convert i.e make another purchase on the platform, or not.

To prepare the dataset for the algorithm, it was first preprocessed. The dataset was balanced to obtain an approximate 50-50 split of the data between targets=0 and targets=1. Next, the input features' values were rescaled by standardization technique using `scale()` from `sklearn.preprocessing`. Then, the data was shuffled and split into train, validation and test sets. Finally, each set was saved into external separate numpy `.npz` files.

The algorithm was created by training a deep neural network with 5 hidden layers and batch size of 100, on the data. The model was compiled with 'Adaptive Moment Estimation, (ADAM)' optimization algorithm, a sparse categorical crossentropy loss function, and an accuracy performance metric.  
The model achieved a 81.03% accuracy on the test data; this means it can predict whether or not a customer will convert with a 81% accuracy.