# **k-Fold Cross Validation & Grid Search in R**

## **Importing the dataset**

In [1]:
ds = read.csv('/content/Social_Network_Ads.csv')
ds = ds[3:5]
head(ds)

Unnamed: 0_level_0,Age,EstimatedSalary,Purchased
Unnamed: 0_level_1,<dbl>,<dbl>,<int>
1,19,19000,0
2,35,20000,0
3,26,43000,0
4,27,57000,0
5,19,76000,0
6,27,58000,0


## **Encoding the target feature as factor**

In [2]:
ds$Purchased = factor(ds$Purchased, levels = c(0, 1))

## **Splitting the data set & Feature scaling**

In [3]:
# Splitting the dataset into the Training set and Test set
install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(ds$Purchased, SplitRatio = 0.75)
train_set = subset(ds, split == TRUE)
test_set = subset(ds, split == FALSE)

# Feature Scaling
train_set[-3] = scale(train_set[-3])
test_set[-3] = scale(test_set[-3])

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



## **Fitting Kernel SVM to the Train set & Predicting the Test set**

In [4]:
# Fitting Kernel SVM to the Training set
install.packages('e1071')
library(e1071)
classifier = svm(formula = Purchased ~ .,
                 data = train_set,
                 type = 'C-classification',
                 kernel = 'radial')

# Predicting the Test set results
y_pred = predict(classifier, newdata = test_set[-3])

# Making the Confusion Matrix
cm = table(test_set[, 3], y_pred)
cat('\n The confusion matrix for Kernel SVM model is: \n \n')
cm

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)




 The confusion matrix for Kernel SVM model is: 
 


   y_pred
     0  1
  0 58  6
  1  4 32

## **Evaluation Metrics**

In [5]:
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
diag = diag(cm) # number of correctly classified instances per class 
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
p = rowsums / n # distribution of instances over the actual classes
q = colsums / n # distribution of instances over the predicted classes
accuracy = sum(diag) / n 
cat("\n Accuracy of Kernel SVM  Model is:", accuracy)  
precision = diag / colsums 
recall = diag / rowsums 
f1 = 2 * precision * recall / (precision + recall)
cat("\n \nThe Evaluation Metrics of Kernel SVM  Model is: \n \n")
data.frame(precision, recall, f1)


 Accuracy of Kernel SVM  Model is: 0.9
 
The Evaluation Metrics of Kernel SVM  Model is: 
 


Unnamed: 0_level_0,precision,recall,f1
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
0,0.9354839,0.90625,0.9206349
1,0.8421053,0.8888889,0.8648649


## **Applying k-Fold Cross Validation**

In [6]:
install.packages('caret')
library(caret)
folds = createFolds(train_set$Purchased, k = 10)
cv = lapply(folds, function(x) {
  training_fold = train_set[-x, ]
  test_fold = train_set[x, ]
  classifier = svm(formula = Purchased ~ .,
                   data = training_fold,
                   type = 'C-classification',
                   kernel = 'radial')
  y_pred = predict(classifier, newdata = test_fold[-3])
  cm = table(test_fold[, 3], y_pred)
  accuracy = (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] + cm[1,2] + cm[2,1])
  return(accuracy)
})
accuracy = mean(as.numeric(cv))
cat("\nAccuracy of Kernel SVM k-Fold Cross Validated  Model is:", accuracy)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Loading required package: ggplot2

Loading required package: lattice

“running command 'timedatectl' had status 1”



Accuracy of Kernel SVM k-Fold Cross Validated  Model is: 0.9162848

## **Applying Grid Search to find the best parameters**

In [7]:
# install.packages('caret')
install.packages('kernlab')
library(caret)
classifier = train(form = Purchased ~ ., data = train_set, method = 'svmRadial')
classifier
classifier$bestTune

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



Support Vector Machines with Radial Basis Function Kernel 

300 samples
  2 predictor
  2 classes: '0', '1' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 300, 300, 300, 300, 300, 300, ... 
Resampling results across tuning parameters:

  C     Accuracy   Kappa    
  0.25  0.9145693  0.8130036
  0.50  0.9159184  0.8157252
  1.00  0.9186723  0.8215380

Tuning parameter 'sigma' was held constant at a value of 1.327355
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 1.327355 and C = 1.

Unnamed: 0_level_0,sigma,C
Unnamed: 0_level_1,<dbl>,<dbl>
3,1.327355,1
