# Deep Learning Case Study: Audiobooks 
## Part 2 : Predicting on new data
#### by Sooyeon Won 

### Keywords 
- Deep Learning 
- TensorFlow2 - Keras
- Unbalanced Data
- Classification Problem


### Contents 

<ul>    
    ------------- Part1 -------------
<li><a href="#Introduction">1.  Introduction</a></li>
<li><a href="#Preprocessing">2.  Data Preprocessing</a></li>
<li><a href="#Analysis">3.  Data Analysis</a></li>
<li><a href="#Test">4.  Test the Model</a></li>
    ------------- Part2 -------------
<li><a href="#Prediction">5.  Predicting on new Data</a></li>
</ul>



In [1]:
# Import the relevant libraries
import numpy as np
import tensorflow as tf
import pickle

### 5. 1. Load the Scaler and the Model from Part 1

In [2]:
# Load the scaler from Part 1, using 'pickle' method 
scaler_deep_learning = pickle.load(open('scaler_deep_learning.pickle', 'rb'))

# Load the model from Part 1, using the TensorFlow (Keras) function relevant for the operation
model = tf.keras.models.load_model('audiobooks_model.h5')

> Note that a warning message will pop up, since I did not specify the input shape of my inputs in the modeling part. For feed-forward neural networks this is not an issue.

### 5. 2. Load the new data

In [3]:
# Load the new data
raw_data = np.loadtxt('New_Audiobooks_Data.csv',delimiter=',')

# As before, I used all variables as inputs except for the first column (Customer ID)
new_data_inputs = raw_data[:,1:] # There is no target in the new data

### 5. 3. Predict the probability of a customer to convert

In [4]:
# Scale the new data in the same way I scaled the train data
new_data_inputs_scaled = scaler_deep_learning.transform(new_data_inputs)

In [5]:
# Predict the probability of each customer to convert
model.predict(new_data_inputs_scaled)[:,1].round(2) # rounded to 2 digits after the dot

array([0.  , 0.  , 0.05, 1.  , 0.  , 0.04, 0.04, 0.1 , 0.03, 0.74, 0.  ,
       0.63, 0.79, 0.  , 0.08, 0.12, 0.79, 0.65, 0.75, 0.97, 1.  , 1.  ,
       1.  , 0.  , 0.  , 0.99, 0.27, 0.  , 1.  , 1.  ], dtype=float32)

In [6]:
# Implement the better approach which is independent of the number of classes
np.argmax(model.predict(new_data_inputs_scaled),1)

array([0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 0, 0, 1, 0, 0, 1, 1], dtype=int64)