# **Apple Music Business Case**

## **1-What is Problem?**

We have a dataset from Apple music Application users. Each user in the database has made a purchase at least once, that's why he/she is in the database. We want to create a machine learning algorithm based on our available data that can predict if a user will buy again from the Apple company.

The main idea is that if a user has a low probability of coming back, there is no reason to spend any money on advertising to him/her. If we can focus our efforts SOLELY on users that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again. Identifying new customers creates value and growth opportunities.

## **2-Familiar with Data**


We have a .csv summarizing the data. There are several variables:
* Customer ID
* overall_music_length
* average_muci_length
* overall_music_price
* average_music_price
* has_review
* review
* support_ticket
* tenure
* target

## **3-Data Preprocessing**

Since we are dealing with real life data, we will need to preprocess it a bit.

### **3.1-Extract the data from the csv**

Here we first load data with `Numpy` package.

In [66]:
import numpy as np

PATH = '/content/drive/MyDrive/TensorFlow Project/apple_music.csv'

raw_csv_data = np.loadtxt(PATH,
                          delimiter = ',',
                          skiprows = 1)

The inputs are all columns in the csv, except for the first one which is `user id`.

In [67]:
unscaled_inputs_all = raw_csv_data[: , 1:-1]
unscaled_inputs_all

array([[1620.  , 1620.  ,   19.73, ...,   10.  ,    5.  ,   92.  ],
       [2160.  , 2160.  ,    5.33, ...,    8.91,    0.  ,    0.  ],
       [2160.  , 2160.  ,    5.33, ...,    8.91,    0.  ,  388.  ],
       ...,
       [2160.  , 2160.  ,    6.14, ...,    8.91,    0.  ,    0.  ],
       [1620.  , 1620.  ,    5.33, ...,    8.  ,    0.  ,   90.  ],
       [1674.  , 3348.  ,    5.33, ...,    8.91,    0.  ,    0.  ]])

The targets are in the last column. That's how datasets are conventionally organized.

In [68]:
targets_all = raw_csv_data[: , -1]
targets_all

array([0., 0., 0., ..., 0., 0., 1.])

### **3.2-Data Shuffling**

When the data was collected it was actually arranged by date.
Shuffle the indices of the data, so the data is not arranged in any way when we feed it.
Since we will be batching, we want the data to be as randomly spread out as possible.

The we use the shuffled indices to shuffle the inputs and targets.

In [70]:
shuffled_indices = np.arange(unscaled_inputs_all.shape[0])
np.random.shuffle(shuffled_indices)

shuffled_inputs = unscaled_inputs_all[shuffled_indices]
shuffled_targets = targets_all[shuffled_indices]

### **3.3-Data Splitting**

In this step, we need to split data into three parts. These are the steps we walk-through:
1. Count the total number of samples
2. Count the samples in each subset, assuming we want 80-10-10 distribution of training, validation, and test.
3. The 'test' dataset contains all remaining data.
4. Create variables that record the inputs and targets for training
5. Create variables that record the inputs and targets for validation.
6. Create variables that record the inputs and targets for test.


In [71]:
# Step 1
samples_count = shuffled_inputs.shape[0]

# Step 2
train_samples_count = int(0.8 * samples_count)
validation_samples_count = int(0.1 * samples_count)

# Step 3
test_samples_count = samples_count - train_samples_count - validation_samples_count

# Step 4
train_inputs = shuffled_inputs[:train_samples_count]
train_targets = shuffled_targets[:train_samples_count]

# Step 5
validation_inputs = shuffled_inputs[train_samples_count : train_samples_count + validation_samples_count]
validation_targets = shuffled_targets[train_samples_count : train_samples_count + validation_samples_count]

# Step 6
test_inputs = shuffled_inputs[train_samples_count + validation_samples_count:]
test_targets = shuffled_targets[train_samples_count + validation_samples_count:]

In [72]:
print(f"training target is: {np.sum(train_targets)}, num rows is: {train_samples_count}, and % is: {np.sum(train_targets) / train_samples_count}")
print(f"val target is: {np.sum(validation_targets)}, num rows is: {validation_samples_count}, and % is: {np.sum(validation_targets) / validation_samples_count}")
print(f"test target is: {np.sum(test_targets)}, num rows is: {test_samples_count}, and % is: {np.sum(test_targets) / test_samples_count}")

training target is: 1785.0, num rows is: 11267, and % is: 0.15842726546551877
val target is: 235.0, num rows is: 1408, and % is: 0.1669034090909091
test target is: 217.0, num rows is: 1409, and % is: 0.1540099361249113


### **3.4-Data Preprocessing**

That's the only place we use sklearn functionality. We will take advantage of its preprocessing capabilities. It's a simple line of code, which standardizes the inputs, as we explained in one of the lectures.


In [73]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
train_inputs = scaler.fit_transform(train_inputs)
validation_inputs = scaler.transform(validation_inputs)
test_inputs = scaler.transform(test_inputs)

## **4-Building Model**

### **4.1- Set Hyperparameters and Construct Model**

We must import the TensorFlow library

In [74]:
import tensorflow as tf

First of all, we set some hyperparameters, such as input size, output size and hidden layer to define width and depth of the model.

In [75]:
INPUT_SIZE = 10
OUTPUT_SIZE = 1
HIDDEN_LAYER_SIZE = 50
DROPOUT_RATE = 0.3

Then, we should define the sequence of model's steps.

In [76]:
from tensorflow.keras import regularizers

model = tf.keras.Sequential([
    tf.keras.layers.Dense(HIDDEN_LAYER_SIZE, activation='relu', kernel_regularizer=regularizers.l2(1e-4)),

    tf.keras.layers.Dropout(DROPOUT_RATE),

    tf.keras.layers.Dense(HIDDEN_LAYER_SIZE, activation='relu', kernel_regularizer=regularizers.l2(1e-4)),

    tf.keras.layers.Dropout(DROPOUT_RATE),

    tf.keras.layers.Dense(1, activation='sigmoid')
])


In the next step, we set the optimizer, loss function and metric.

In [77]:
model.compile(optimizer = 'adam',
                            loss = 'binary_crossentropy',
                            metrics = ['accuracy'])

In the next step, we set number of batches. Because we don't want to update weights after each epoch and instead we use `mini-batch` method to update weight several times during one epoch.

In [78]:
BATCH_SIZE = 100

EPOCH = 200

We also set early stopping to prevenet overfitting with patience 2.

In [79]:
PATIENCE = 2

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor = "val_loss",
    patience = PATIENCE,
    restore_best_weights = True
)

### **4.2- Train the Model**

Now, all is set and we are ready to fit and train the model. In this step, before fitting, since we have imbalanced data, we should define weight for each lable in trainign set to allow model knows this imbalance.

In [80]:
from sklearn.utils.class_weight import compute_class_weight

classes = np.unique(train_targets)
weights = compute_class_weight(class_weight="balanced", classes=classes, y=train_targets)
class_weight = dict(zip(classes, weights))

model.fit(
    train_inputs,
    train_targets,
    validation_data=(validation_inputs, validation_targets),
    class_weight=class_weight,
          epochs=200, batch_size=100,
          callbacks=[early_stopping],
          verbose=2)

Epoch 1/200
113/113 - 1s - 11ms/step - accuracy: 0.7041 - loss: 0.6484 - val_accuracy: 0.7528 - val_loss: 0.5762
Epoch 2/200
113/113 - 0s - 2ms/step - accuracy: 0.7258 - loss: 0.6102 - val_accuracy: 0.8111 - val_loss: 0.5384
Epoch 3/200
113/113 - 0s - 2ms/step - accuracy: 0.7482 - loss: 0.6011 - val_accuracy: 0.8224 - val_loss: 0.5383
Epoch 4/200
113/113 - 0s - 2ms/step - accuracy: 0.7661 - loss: 0.5930 - val_accuracy: 0.8260 - val_loss: 0.5433
Epoch 5/200
113/113 - 0s - 2ms/step - accuracy: 0.7792 - loss: 0.5912 - val_accuracy: 0.7884 - val_loss: 0.5730


<keras.src.callbacks.history.History at 0x7cb660264a70>

## **5-Testing Model**

After training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset.

In [81]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8479 - loss: 0.5337 

Test loss: 0.54. Test accuracy: 83.39%
