## Tunning

In [11]:
library("ISLR")
library('rsample')  # data splitting 
library("dplyr")
library("class") # knn
library("caret") 
library("klaR") # naive bayes
library("ggplot2")
options(repr.plot.width = 16, repr.plot.height = 6)

In [3]:
str(Caravan)

'data.frame':	5822 obs. of  86 variables:
 $ MOSTYPE : num  33 37 37 9 40 23 39 33 33 11 ...
 $ MAANTHUI: num  1 1 1 1 1 1 2 1 1 2 ...
 $ MGEMOMV : num  3 2 2 3 4 2 3 2 2 3 ...
 $ MGEMLEEF: num  2 2 2 3 2 1 2 3 4 3 ...
 $ MOSHOOFD: num  8 8 8 3 10 5 9 8 8 3 ...
 $ MGODRK  : num  0 1 0 2 1 0 2 0 0 3 ...
 $ MGODPR  : num  5 4 4 3 4 5 2 7 1 5 ...
 $ MGODOV  : num  1 1 2 2 1 0 0 0 3 0 ...
 $ MGODGE  : num  3 4 4 4 4 5 5 2 6 2 ...
 $ MRELGE  : num  7 6 3 5 7 0 7 7 6 7 ...
 $ MRELSA  : num  0 2 2 2 1 6 2 2 0 0 ...
 $ MRELOV  : num  2 2 4 2 2 3 0 0 3 2 ...
 $ MFALLEEN: num  1 0 4 2 2 3 0 0 3 2 ...
 $ MFGEKIND: num  2 4 4 3 4 5 3 5 3 2 ...
 $ MFWEKIND: num  6 5 2 4 4 2 6 4 3 6 ...
 $ MOPLHOOG: num  1 0 0 3 5 0 0 0 0 0 ...
 $ MOPLMIDD: num  2 5 5 4 4 5 4 3 1 4 ...
 $ MOPLLAAG: num  7 4 4 2 0 4 5 6 8 5 ...
 $ MBERHOOG: num  1 0 0 4 0 2 0 2 1 2 ...
 $ MBERZELF: num  0 0 0 0 5 0 0 0 1 0 ...
 $ MBERBOER: num  1 0 0 0 4 0 0 0 0 0 ...
 $ MBERMIDD: num  2 5 7 3 0 4 4 2 1 3 ...
 $ MBERARBG: num  5 0 0 

In [4]:
# check classes
summary(Caravan$Purchase)

In [5]:
# check NA values
any(is.na(Caravan))

In [6]:
set.seed(123)
# stratified split 70% for training, and the rest for testing
split <- initial_split(Caravan, prop = 0.7, strata = "Purchase")
train <- training(split)
test  <- testing(split)

In [7]:
# distribution of train
table(train$Purchase) 


  No  Yes 
3825  250 

In [8]:
# distribution of test set
table(test$Purchase)


  No  Yes 
1649   98 

In [9]:
# create stratified training and testing
features <- setdiff(names(train), "Purchase")
# training
x_train <- train[, features]
y_train <- train$Purchase
# testing
x_test <- test[,features]
y_test <- test$Purchase

In [12]:
# set up 10-fold cross validation procedure
train_control <- trainControl(method = "cv", number = 10)

In [13]:
# set up  grid search
search_grid <- expand.grid(usekernel = TRUE,
                          fL = 0,
                          adjust = seq(1, 3, by = 1))

In [15]:
# train model
naive.bayes2 <- train(x = x_train, 
                      y = y_train, 
                      method = "nb", 
                      trControl = train_control,
                      tuneGrid = search_grid,
                      preProc = c("scale"))

"Numerical 0 probability for all classes with observation 2"
"Numerical 0 probability for all classes with observation 35"
"Numerical 0 probability for all classes with observation 46"
"Numerical 0 probability for all classes with observation 77"
"Numerical 0 probability for all classes with observation 80"
"Numerical 0 probability for all classes with observation 83"
"Numerical 0 probability for all classes with observation 103"
"Numerical 0 probability for all classes with observation 110"
"Numerical 0 probability for all classes with observation 116"
"Numerical 0 probability for all classes with observation 119"
"Numerical 0 probability for all classes with observation 120"
"Numerical 0 probability for all classes with observation 128"
"Numerical 0 probability for all classes with observation 129"
"Numerical 0 probability for all classes with observation 136"
"Numerical 0 probability for all classes with observation 161"
"Numerical 0 probability for all classes with observation 162"

In [16]:
naive.bayes2

Naive Bayes 

4075 samples
  85 predictor
   2 classes: 'No', 'Yes' 

Pre-processing: scaled (85) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 3668, 3668, 3667, 3667, 3668, 3668, ... 
Resampling results across tuning parameters:

  adjust  Accuracy   Kappa        
  1       0.9384051  -0.0004735745
  2       0.9384051   0.0063080129
  3       0.9381600  -0.0009471491

Tuning parameter 'fL' was held constant at a value of 0
Tuning
 parameter 'usekernel' was held constant at a value of TRUE
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were fL = 0, usekernel = TRUE and adjust
 = 1.

In [17]:
# prediction and results
y_pred <- predict(naive.bayes2, x_test, type="raw")
confusionMatrix(y_pred, y_test)

"Numerical 0 probability for all classes with observation 57"
"Numerical 0 probability for all classes with observation 64"
"Numerical 0 probability for all classes with observation 86"
"Numerical 0 probability for all classes with observation 101"
"Numerical 0 probability for all classes with observation 112"
"Numerical 0 probability for all classes with observation 123"
"Numerical 0 probability for all classes with observation 134"
"Numerical 0 probability for all classes with observation 165"
"Numerical 0 probability for all classes with observation 172"
"Numerical 0 probability for all classes with observation 183"
"Numerical 0 probability for all classes with observation 186"
"Numerical 0 probability for all classes with observation 187"
"Numerical 0 probability for all classes with observation 201"
"Numerical 0 probability for all classes with observation 202"
"Numerical 0 probability for all classes with observation 233"
"Numerical 0 probability for all classes with observation 

Confusion Matrix and Statistics

          Reference
Prediction   No  Yes
       No  1649   98
       Yes    0    0
                                          
               Accuracy : 0.9439          
                 95% CI : (0.9321, 0.9542)
    No Information Rate : 0.9439          
    P-Value [Acc > NIR] : 0.5268          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.9439          
         Neg Pred Value :    NaN          
             Prevalence : 0.9439          
         Detection Rate : 0.9439          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : No              
                        