# Deep Learning

Profits can be heavily impacted by your campaign’s CTR. In this chapter, you’ll learn how deep learning can be used to reduce that risk. You’ll focus on multi-layer perceptron (MLP) and neural network models, and learn how these can be used to capture the complex relationship between variables to more accurately predict CTR. Lastly, you’ll explore how to apply the basics of hyperparameter tuning and regularization to classification models.

## MLPs for CTR
In this exercise, you will evaluate both the accuracy score and AUC of the ROC curve for a basic MLP model on the ad CTR dataset. Remember to standardize the features before splitting into training and testing!

In [1]:
from pandas import read_pickle

df = read_pickle('data/data_ch3.pkl')
# # Get non-categorical columns, with a filter
num_df = df.select_dtypes(include=['int', 'float'])
filter_cols = ['banner_pos', 'hour_of_day']
new_df = num_df[num_df.columns[~num_df.columns.isin(filter_cols)]]

X = new_df.loc[:, ~new_df.columns.isin(['click'])]
y = new_df.click

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc, accuracy_score


# Scale features and split into training and testing
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size = .2, random_state = 0)

# Create classifier and produce predictions
clf = MLPClassifier(hidden_layer_sizes = (8, ), max_iter = 100)
y_score = clf.fit(X_train, y_train).predict_proba(X_test) 
y_pred = clf.fit(X_train, y_train).predict(X_test) 

# Get accuracy and AUC of ROC curve 
fpr, tpr, thresholds = roc_curve(y_test, y_score[:, 1])
roc_auc = auc(fpr, tpr)
print("Accuracy: %s" %(accuracy_score(y_test, y_pred)))
print("ROC of AUC curve: %s" %(roc_auc))

Accuracy: 0.89
ROC of AUC curve: 0.4177732379979571




Notice that the accuracy score and AUC of ROC curve seem to be ballpark around that of other classifiers.

## Varying hyperparameters
The number of iterations of training, and the size of hidden layers are two primary hyperparameters that can be varied when working with a MLP classifier. In this exercise, you will vary both separately and note how performance in terms of accuracy and AUC of the ROC curve may vary.

In [3]:
from sklearn.metrics import roc_auc_score

# Loop over various max_iter configurations
max_iter_list = [10, 20, 30]
for max_iter in max_iter_list:
	clf = MLPClassifier(hidden_layer_sizes = (4, ), 
                        max_iter = max_iter, random_state = 0)
   	# Extract relevant predictions
	y_score = clf.fit(X_train, y_train).predict_proba(X_test)
	y_pred = clf.fit(X_train, y_train).predict(X_test)

	# Get ROC curve metrics
	print("Accuracy for max_iter = %s: %s" %(
      max_iter, accuracy_score(y_test, y_pred)))
	print("AUC for max_iter = %s: %s" %(
      max_iter, roc_auc_score(y_test, y_score[:, 1])))



Accuracy for max_iter = 10: 0.89
AUC for max_iter = 10: 0.4851889683350357
Accuracy for max_iter = 20: 0.89
AUC for max_iter = 20: 0.48416751787538304
Accuracy for max_iter = 30: 0.89
AUC for max_iter = 30: 0.49846782431052095




In [4]:
# Create and loop over various hidden_layer_sizes configurations
hidden_layer_sizes_list = [(4,), (8,), (16,)]
for hidden_layer_sizes in hidden_layer_sizes_list:
	clf = MLPClassifier(hidden_layer_sizes = hidden_layer_sizes, 
                        max_iter = 10, random_state = 0)
   	# Extract relevant predictions
	y_score = clf.fit(X_train, y_train).predict_proba(X_test)
	y_pred = clf.fit(X_train, y_train).predict(X_test)

	# Get ROC curve metrics
	print("Accuracy for hidden_layer_sizes = %s: %s" %(
      hidden_layer_sizes, accuracy_score(y_test, y_pred)))
	print("AUC for hidden_layer_sizes = %s: %s" %(
      hidden_layer_sizes, roc_auc_score(y_test, y_score[:, 1])))



Accuracy for hidden_layer_sizes = (4,): 0.89
AUC for hidden_layer_sizes = (4,): 0.4851889683350357




Accuracy for hidden_layer_sizes = (8,): 0.59
AUC for hidden_layer_sizes = (8,): 0.6414708886618998
Accuracy for hidden_layer_sizes = (16,): 0.77
AUC for hidden_layer_sizes = (16,): 0.5168539325842696




Notice that having more hidden layers seemed to improve performance.

## MLP Grid Search
Hyperparameter tuning can be done by sklearn through providing various input parameters, each of which can be encoded using various functions from numpy. One method of tuning, which exhaustively looks at all combinations of input hyperparameters specified via param_grid, is grid search. In this exercise, you will use grid search to look over the hyperparameters for a MLP classifier.

In [5]:
from sklearn.model_selection import GridSearchCV

# Create list of hyperparameters 
max_iter = [10, 20]
hidden_layer_sizes = [(8, ), (16, )]
param_grid = {'max_iter': max_iter, 'hidden_layer_sizes': hidden_layer_sizes}

# Use Grid search CV to find best parameters using 4 jobs
mlp = MLPClassifier()
clf = GridSearchCV(estimator = mlp, param_grid = param_grid, 
           scoring = 'roc_auc', n_jobs = 4)
clf.fit(X_train, y_train)
print("Best Score: ")
print(clf.best_score_)
print("Best Estimator: ")
print(clf.best_estimator_)



Best Score: 
0.5386975883986754
Best Estimator: 
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(8,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=20, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)




As seen here, the best AUC is around 0.63 and from the model with the maximum number of iterations (20) and most number of hidden layer units (16).

## F-beta score
The F-beta score is a weighted harmonic mean between precision and recall, and is used to weight precision and recall differently. It is likely that one would care more about weighting precision over recall, which can be done with a lower beta between 0 and 1. In this exercise, you will calculate the precision and recall of an MLP classifier along with the F-beta score using a beta = 0.5.

In [6]:
from sklearn.metrics import fbeta_score, precision_score, recall_score

# Set up MLP classifier, train and predict
X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size = .2, random_state = 0)
clf = MLPClassifier(hidden_layer_sizes = (16, ), 
                    max_iter = 10, random_state = 0)
y_pred = clf.fit(X_train, y_train).predict(X_test) 

# Evaluate precision and recall
prec = precision_score(y_test, y_pred, average = 'weighted')
recall = recall_score(y_test, y_pred, average = 'weighted')
fbeta = fbeta_score(y_test, y_pred, beta  = 0.5, average = 'weighted')
print("Precision: %s, Recall: %s, F-beta score: %s" %(prec, recall, fbeta))

Precision: 0.8215040650406503, Recall: 0.77, F-beta score: 0.8095677674727688




Notice how the F-beta score is in between the precision and recall, and closer to the precision.

## Precision, ROI, and AUC
The return on investment (ROI) can be decomposed into the precision multiplied by a ratio of return to cost. As discussed, it is possible for the precision of a model to be low, even while AUC of the ROC curve is high. If the precision is low, then the ROI will also be low. In this exercise, you will use a MLP to compute a sample ROI assuming a fixed r, the return on a click per number of impressions, and cost, the cost per number of impressions, along with precision and AUC of ROC curve values to check how the three values vary.

In [7]:
# Get precision and total ROI
prec = precision_score(y_test, y_pred, average = 'weighted')
r = 0.2
cost = 0.05 
roi = prec * r / cost

# Get AUC
roc_auc = roc_auc_score(y_test, y_score[:, 1])

print("Total ROI: %s, Precision: %s, AUC of ROC curve: %s" %(
  roi, prec, roc_auc))

Total ROI: 3.2860162601626013, Precision: 0.8215040650406503, AUC of ROC curve: 0.5168539325842696


Note the ROI was > 1 and the precision and ROC of the AUC are both > 0.65, suggesting this not a case of low precision and high AUC.

## Model comparison warmup
In this exercise, you will run a basic comparison of the four categories of outcomes between MLPs and Random Forests using a confusion matrix. This is in preparation for an analysis of all the models we have covered. Doing this warm-up exercise will allow you to compare and contrast the implementation of these models and their evaluation for CTR prediction.

In [8]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix

# Create the list of models in the order below
names = ['Random Forest', 'Multi-Layer Perceptron']
classifiers = [RandomForestClassifier(), 
               MLPClassifier(hidden_layer_sizes = (10, ),
                             max_iter = 40)]

# Produce a confusion matrix for all classifiers
for name, classifier in zip(names, classifiers):
  print("Evaluating classifier: %s" %(name))
  classifier.fit(X_train, y_train)
  y_pred = classifier.predict(X_test)
  conf_matrix = confusion_matrix(y_test, y_pred)
  print(conf_matrix)

Evaluating classifier: Random Forest
[[82  7]
 [ 9  2]]
Evaluating classifier: Multi-Layer Perceptron
[[43 46]
 [ 5  6]]




In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

# Create list of classifiers
names = ['Logistic Regression',  'Decision Tree',
         'Random Forest', 'Multi-Layer Perceptron']
clfs = [LogisticRegression(), 
        DecisionTreeClassifier(), RandomForestClassifier(), 
        MLPClassifier(hidden_layer_sizes = (5, ), max_iter = 40)]

# Fit each classifier and evaluate AUC of ROC curve 
for name, classifier in zip(names, clfs):
  classifier.fit(X_train, y_train)
  y_score = classifier.predict_proba(X_test)
  y_pred = classifier.predict(X_test) 
  prec = precision_score(y_test, y_pred, average = 'weighted')
  print("Precision for %s: %s " %(name, prec))

Precision for Logistic Regression: 0.7921 
Precision for Decision Tree: 0.8414539007092199 
Precision for Random Forest: 0.8414539007092199 


  'precision', 'predicted', average, warn_for)


Precision for Multi-Layer Perceptron: 0.8282352941176471 




In [10]:
# Create list of classifiers
names = ['Logistic Regression',  'Decision Tree',
         'Random Forest', 'Multi-Layer Perceptron']
clfs = [LogisticRegression(), 
        DecisionTreeClassifier(), RandomForestClassifier(), 
        MLPClassifier(hidden_layer_sizes = (5, ), max_iter = 50)]
  
# Produce a classification report for all classifiers
for name, classifier in zip(names, clfs):
  classifier.fit(X_train, y_train)
  y_pred = classifier.predict(X_test) 
  prec = precision_score(y_test, y_pred, average = 'weighted')
  r, cost = 0.2, 0.05 
  roi = prec * r / cost
  print("ROI for %s: %s " %(name, roi))

  'precision', 'predicted', average, warn_for)


ROI for Logistic Regression: 3.1684 
ROI for Decision Tree: 3.3987368421052633 
ROI for Random Forest: 3.3217391304347825 
ROI for Multi-Layer Perceptron: 3.608888888888889 




Notice that the relative ordering by ROI was the same as for precision, since the return and cost were held constant.

## Total scoring
Remember that precision and recall might be weighted differently and therefore the F-beta score is an important evaluation metric. Additionally, the ROC of the AUC curve is an important complementary metric to precision and recall since you saw prior how it may be the case that a model might have a high AUC but low precision. In this exercise, you will calculate the full set of evaluation metrics for each classifier.

In [11]:
# Create classifiers
clfs = [LogisticRegression(), DecisionTreeClassifier(), RandomForestClassifier(), 
        MLPClassifier(hidden_layer_sizes = (10, ), max_iter = 50)]

# Produce all evaluation metrics for each classifier
for name, clf in zip(names, clfs):
  print("Evaluating classifier: %s" %(name))
  y_score = clf.fit(X_train, y_train).predict_proba(X_test)
  y_pred = clf.fit(X_train, y_train).predict(X_test)
  prec = precision_score(y_test, y_pred, average = 'weighted')
  recall = recall_score(y_test, y_pred, average = 'weighted')
  fbeta = fbeta_score(y_test, y_pred, beta = 0.5, average = 'weighted')
  roc_auc = roc_auc_score(y_test, y_score[:, 1])
  print("Precision: %s: Recall: %s, F-beta score: %s, AUC of ROC curve: %s" 
        %(prec, recall, fbeta, roc_auc))

Evaluating classifier: Logistic Regression
Precision: 0.7921: Recall: 0.89, F-beta score: 0.8099182004089981, AUC of ROC curve: 0.5280898876404495
Evaluating classifier: Decision Tree
Precision: 0.8414539007092199: Recall: 0.87, F-beta score: 0.844869431643625, AUC of ROC curve: 0.5495403472931564
Evaluating classifier: Random Forest
Precision: 0.8414539007092199: Recall: 0.87, F-beta score: 0.844869431643625, AUC of ROC curve: 0.5367722165474974
Evaluating classifier: Multi-Layer Perceptron
Precision: 0.7880208333333333: Recall: 0.85, F-beta score: 0.7996828752642706, AUC of ROC curve: 0.5566905005107252


  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
