In [2]:
library(caret)

Loading required package: lattice
Loading required package: ggplot2


In [3]:
set.seed(1234)


#Create data set
data_set <- twoClassSim(2000,
                        intercept = -6,
                        linearVars = 8,
                        noiseVars = 4)


#Create train/tests sets
index <- createDataPartition(data_set$Class, p = .9, list=FALSE)

train_set <- data_set[index,]
test_set <- data_set[-index,]

In [11]:
#Use rfe to select important variables

#Website for different Funcs
#https://rdrr.io/cran/caret/man/caretFuncs.html
control <- rfeControl(functions = rfFuncs,
                      method = "repeatedcv",
                      repeats = 3,
                      verbose = FALSE)

#Dependent / Independent Variables
Y <-'Class'
X_all <-names(train_set)[!names(train_set) %in% Y]

predictors <- names(train_set)[!names(train_set) %in% Y]

#Run rfe
Pred_Profile <- rfe(train_set[,X_all], train_set[,Y],
                         rfeControl = control)

#Important Variables
X_imp <- Pred_Profile$optVariables

#Selected Variables
Pred_Profile

X_imp


Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 3 times) 

Resampling performance over subset size:

 Variables Accuracy  Kappa AccuracySD KappaSD Selected
         4   0.8203 0.6246    0.02289 0.04743         
         8   0.8584 0.7022    0.01694 0.03493        *
        16   0.8558 0.6953    0.01634 0.03497         
        17   0.8517 0.6865    0.01563 0.03313         

The top 5 variables (out of 8):
   TwoFactor1, TwoFactor2, Linear2, Linear3, Linear4


In [16]:
#Fit single-hidden-layer neural network, possibly with skip-layer connections.
modelLookup(model='nnet')

ctrl  <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 5)

model_nnet <- train(train_set[,X_imp],
                  train_set[,Y],
                  method='nnet',
                  trControl = ctrl)

#Sometimes caret doesn't like names for columns?
model_nnet <- train(train_set[, which(names(train_set) %in% X_imp) ],
                  train_set[, which(names(train_set) %in% Y) ],
                  method='nnet',
                  trControl = ctrl)

model_nnet

model,parameter,label,forReg,forClass,probModel
nnet,size,#Hidden Units,True,True,True
nnet,decay,Weight Decay,True,True,True


# weights:  11
initial  value 978.728650 
iter  10 value 628.989009
iter  20 value 610.865003
iter  30 value 604.179002
iter  40 value 602.689179
iter  50 value 602.198901
iter  60 value 602.189830
iter  70 value 602.119002
final  value 602.113347 
converged
# weights:  31
initial  value 969.189016 
iter  10 value 613.034716
iter  20 value 440.743434
iter  30 value 382.751473
iter  40 value 364.908615
iter  50 value 358.635515
iter  60 value 356.337604
iter  70 value 353.476877
iter  80 value 349.692731
iter  90 value 346.634273
iter 100 value 343.404006
final  value 343.404006 
stopped after 100 iterations
# weights:  51
initial  value 974.926813 
iter  10 value 535.370318
iter  20 value 421.744570
iter  30 value 370.726537
iter  40 value 354.397903
iter  50 value 347.055817
iter  60 value 342.377887
iter  70 value 336.676206
iter  80 value 333.249822
iter  90 value 329.996844
iter 100 value 326.594972
final  value 326.594972 
stopped after 100 iterations
# weights:  11
initial  value

# weights:  11
initial  value 1050.053557 
iter  10 value 635.875901
iter  20 value 587.274022
iter  30 value 581.008164
iter  40 value 578.661137
iter  50 value 577.633363
iter  60 value 577.591923
iter  70 value 577.395840
iter  80 value 577.355466
iter  80 value 577.355465
final  value 577.355230 
converged
# weights:  31
initial  value 1157.713224 
iter  10 value 699.932513
iter  20 value 449.057870
iter  30 value 385.422440
iter  40 value 378.981950
iter  50 value 377.336531
iter  60 value 375.811917
iter  70 value 373.286325
iter  80 value 348.538289
iter  90 value 337.362298
iter 100 value 333.787770
final  value 333.787770 
stopped after 100 iterations
# weights:  51
initial  value 979.732084 
iter  10 value 490.377685
iter  20 value 410.604694
iter  30 value 392.367943
iter  40 value 368.236801
iter  50 value 353.447221
iter  60 value 347.970962
iter  70 value 342.952752
iter  80 value 322.640438
iter  90 value 306.410181
iter 100 value 300.127923
final  value 300.127923 
stop

Neural Network 

1801 samples
   8 predictor
   2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 5 times) 
Summary of sample sizes: 1441, 1441, 1441, 1440, 1441, 1441, ... 
Resampling results across tuning parameters:

  size  decay  Accuracy   Kappa    
  1     0e+00  0.8033299  0.5835924
  1     1e-04  0.8035522  0.5840619
  1     1e-01  0.8024429  0.5824595
  3     0e+00  0.8858443  0.7614129
  3     1e-04  0.8822896  0.7543842
  3     1e-01  0.8885131  0.7671174
  5     0e+00  0.8775162  0.7447600
  5     1e-04  0.8799584  0.7492188
  5     1e-01  0.8857350  0.7614604

Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were size = 3 and decay = 0.1.

In [20]:
NNPredictions <-predict(model_nnet, test_set)
# Create confusion matrix
cmNN <-confusionMatrix(NNPredictions, test_set$Class)

cmNN

Confusion Matrix and Statistics

          Reference
Prediction Class1 Class2
    Class1    113     11
    Class2      6     69
                                          
               Accuracy : 0.9146          
                 95% CI : (0.8667, 0.9494)
    No Information Rate : 0.598           
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.8205          
 Mcnemar's Test P-Value : 0.332           
                                          
            Sensitivity : 0.9496          
            Specificity : 0.8625          
         Pos Pred Value : 0.9113          
         Neg Pred Value : 0.9200          
             Prevalence : 0.5980          
         Detection Rate : 0.5678          
   Detection Prevalence : 0.6231          
      Balanced Accuracy : 0.9060          
                                          
       'Positive' Class : Class1          
                                          

In [18]:
#Fit rf
model_rf <- train(train_set[,X_imp],
                  train_set[,Y],
                  trControl = ctrl,
                  method='rf')

model_rf

Random Forest 

1801 samples
   8 predictor
   2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 5 times) 
Summary of sample sizes: 1441, 1440, 1441, 1441, 1441, 1440, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
  2     0.8535232  0.6912734
  5     0.8547458  0.6954192
  8     0.8540819  0.6945268

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was mtry = 5.