# ML Modelling & Evaluation of Models

## Introduction

In this notebook, I will run the machine learning model for all three models:  
   1) model_bogo  
   2) model_discount 
   3) model_info

As machine learning model, I will use XGBoost since it's especially good with tabular data and imblanced datasets.

### Load libraries, xgboost model and datasets

In [1]:
import sys
sys.path.append("..")
from source.utils import load_data
from source.model import XGBoostModel


In [2]:
# initiate XGBoost class
model = XGBoostModel()

In [3]:
bogo_data = load_data('../data/bogo_data.pkl')
discount_data = load_data('../data/discount_data.pkl')
info_data = load_data('../data/info_data.pkl')

### Machine Learning Modelling

### Model Evaluation

#### bogo data

We start off with the bogo data. Let's see how good our XGBoost model performs.

In [4]:
# Train xgboost model
X_train, X_test, y_train, y_test = model.split_train_test(bogo_data, 'send_offer')
model.pipeline = model.build_pipeline(X_train)
model.fit_model(X_train, y_train)

Fitting 5 folds for each of 27 candidates, totalling 135 fits
[CV 1/5] END classifier__learning_rate=0.1, classifier__max_depth=3, classifier__n_estimators=100, classifier__objective=binary:logistic, classifier__scale_pos_weight=0.21;, score=0.699 total time=   0.1s
[CV 2/5] END classifier__learning_rate=0.1, classifier__max_depth=3, classifier__n_estimators=100, classifier__objective=binary:logistic, classifier__scale_pos_weight=0.21;, score=0.692 total time=   0.2s
[CV 3/5] END classifier__learning_rate=0.1, classifier__max_depth=3, classifier__n_estimators=100, classifier__objective=binary:logistic, classifier__scale_pos_weight=0.21;, score=0.708 total time=   0.0s
[CV 4/5] END classifier__learning_rate=0.1, classifier__max_depth=3, classifier__n_estimators=100, classifier__objective=binary:logistic, classifier__scale_pos_weight=0.21;, score=0.697 total time=   0.1s
[CV 5/5] END classifier__learning_rate=0.1, classifier__max_depth=3, classifier__n_estimators=100, classifier__objecti

In [5]:
# Evaluation
y_pred = model.predict(X_test)
model.evaluate(y_test, y_pred)

Baseline Accuracy (always predict majority class): 0.8408385093167702

Evaluation Metrics:
Accuracy: 0.7458592132505176
F1 Score: 0.8355108877721943
ROC AUC Score: 0.6992575576725296

Confusion Matrix:
[[ 388  227]
 [ 755 2494]]

Classification Report:
              precision    recall  f1-score   support

           0       0.34      0.63      0.44       615
           1       0.92      0.77      0.84      3249

    accuracy                           0.75      3864
   macro avg       0.63      0.70      0.64      3864
weighted avg       0.82      0.75      0.77      3864



In [6]:
model.plot_feature_importance()

AttributeError: 'GridSearchCV' object has no attribute 'get_booster'

#### discount data

Next, let's run the model for discount offers.

In [None]:
# Train xgboost model
X_train, X_test, y_train, y_test = model.split_train_test(discount_data, 'send_offer')
model.pipeline = model.build_pipeline(X_train)
model.fit_model(X_train, y_train)

In [None]:
# Evaluation
y_pred = model.predict(X_test)
model.evaluate(y_test, y_pred)

#### info data

Lastly, let's run the model for informational offers.

In [None]:
# Train xgboost model
X_train, X_test, y_train, y_test = model.split_train_test(info_data, 'send_offer')
model.pipeline = model.build_pipeline(X_train)
model.fit_model(X_train, y_train)

In [None]:
# Evaluation
y_pred = model.predict(X_test)
model.evaluate(y_test, y_pred)