In [1]:
# Set random seed to 42
set.seed(42)

# Install Package and Import Library
install.packages("kernlab")
library(kernlab)

package 'kernlab' successfully unpacked and MD5 sums checked


"restored 'kernlab'"


The downloaded binary packages are in
	C:\Users\sjarr\AppData\Local\Temp\RtmpMrsT2S\downloaded_packages


In [2]:
# Load the Dataset
data <- read.table("credit_card_data-headers.txt", 
                   stringsAsFactors=FALSE, 
                   header=TRUE)
head(data)

A1,A2,A3,A8,A9,A10,A11,A12,A14,A15,R1
1,30.83,0.0,1.25,1,0,1,1,202,0,1
0,58.67,4.46,3.04,1,0,6,1,43,560,1
0,24.5,0.5,1.5,1,1,0,1,280,824,1
1,27.83,1.54,3.75,1,0,5,0,100,3,1
1,20.17,5.625,1.71,1,1,0,1,120,0,1
1,32.08,4.0,2.5,1,1,0,0,360,0,1


In [3]:
# Apply KSVM model, scaling the data and using C=100
model <- ksvm(R1~.,
              data=data,
              type="C-svc",
              kernel="vanilladot",
              C=100,
              scaled=TRUE)

# Use model to create predictions, find coefficients a and intercept a0
pred <- predict(model,data[,1:10])
a <- colSums(model@xmatrix[[1]] * model@coef[[1]])
a0 <- - model@b

 Setting default kernel parameters  


In [5]:
model
print(' C: 100')
print(' Coefficients: ')
a
print(paste0(' A0: ', a0))
print(paste0(' Equation: 0 = ', a[1], ' x1 + ', a[2], ' x2 + ', a[3], ' x3 + ', a[4], ' x8 + ',
             a[5], ' x9 + ', a[6], ' x10 + ', a[7], ' x11 + ', a[8], ' x12 + ', a[9], ' x14 + ', a[10], ' x15 + ', a0))
print(paste0(' Accuracy: ', sum(pred == data[,11]) / nrow(data)))

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 100 

Linear (vanilla) kernel function. 

Number of Support Vectors : 189 

Objective Function Value : -17887.92 
Training error : 0.136086 

[1] " C: 100"
[1] " Coefficients: "


[1] " A0: 0.081584921659538"
[1] " Equation: 0 = -0.00100653481057611 x1 + -0.00117290480611665 x2 + -0.00162619672236963 x3 + 0.0030064202649194 x8 + 1.00494056410556 x9 + -0.00282594323043472 x10 + 0.000260029507016313 x11 + -0.000534955143494997 x12 + -0.00122837582291523 x14 + 0.106363399527188 x15 + 0.081584921659538"
[1] " Accuracy: 0.863914373088685"


**Try the same model without scaling the data**

In [6]:
# Apply KSVM model, without scaling the data and using C=100
model <- ksvm(R1~.,
              data=data,
              type="C-svc",
              kernel="vanilladot",
              C=100,
              scaled=FALSE)

# Use model to create predictions, find coefficients a and intercept a0
pred <- predict(model,data[,1:10])
a <- colSums(model@xmatrix[[1]] * model@coef[[1]])
a0 <- - model@b

 Setting default kernel parameters  


In [8]:
model
print(' C: 100 ')
print(' Coefficients: ')
a
print(paste0(' A0: ', a0))
print(paste0(' Accuracy: ', sum(pred == data[,11]) / nrow(data)))

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 100 

Linear (vanilla) kernel function. 

Number of Support Vectors : 186 

Objective Function Value : -2213.731 
Training error : 0.278287 

[1] " C: 100 "
[1] " Coefficients: "


[1] " A0: 0.525539327910409"
[1] " Accuracy: 0.7217125382263"


**Discussion:**
Scaling the data resulted in higher accuracy in the model's predictions (scaled: 0.86, not scaled: 0.52). The coefficients, _a_, and intercept, _a0_ are very different between the two models. In this scenario, scaling the data improves the model's ability to make accurate predictions.

**Try Different Values of C**

In [9]:
# Create a list of c values
c_vals <- c(1e-8, 1e-4, 10, 1e4, 1e8)
acc <- list()

In [10]:
# Train model and make predictions for each value of C
for (c_val in c_vals){
  model <- ksvm(R1~.,
                data=data,
                type="C-svc",
                kernel="vanilladot",
                C=c_val,
                scaled=TRUE)
    
  pred <- predict(model,data[,1:10])
  accuracy <- sum(pred == data[,11]) / nrow(data)
  acc <- append(acc, accuracy)
}

 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  


In [11]:
for (i in 1:length(c_vals)) {
    print(paste0(' C: ', c_vals[i], '  Accuracy: ', acc[i]))
}

[1] " C: 1e-08  Accuracy: 0.547400611620795"
[1] " C: 1e-04  Accuracy: 0.547400611620795"
[1] " C: 10  Accuracy: 0.863914373088685"
[1] " C: 10000  Accuracy: 0.862385321100917"
[1] " C: 1e+08  Accuracy: 0.663608562691132"


**Discussion:** Extremely low values of C produced much lower accuracy values (0.54 for 1e-8 and 1e-4). The accuracy of the model did not change between using a C value of 1e-8 and 1e-4. The C values of 10 and 1e4 produced an accuracy of 0.86, which is the same as the model using a C value of 100 explored in the previous section. Finally, when C was set to 1e8, the accuracy was 0.66, higher than the extremely low C values, but lower than the values of 10 and 1e4.

**Takeaway:** The model performed best with C values 10 to 1e4. Choosing a C value at the lower end of this ranger is favorable to give a larger margin for later test predictions.

**Try Different Kernels**

In [12]:
# Create a list of kernels
kernels <- c('rbfdot', 'polydot', 'vanilladot', 'tanhdot', 'laplacedot', 'besseldot', 'anovadot', 'splinedot')
acc <- list()

In [13]:
# Train model and make predictions using each kernel
for (k in kernels){
  model <- ksvm(R1~.,
                data=data,
                type="C-svc",
                kernel=k,
                C=100,
                scaled=TRUE)
    
  pred <- predict(model,data[,1:10])
  accuracy <- sum(pred == data[,11]) / nrow(data)
  acc <- append(acc, accuracy)
}

 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  


In [14]:
for (i in 1:length(kernels)) {
    print(paste0(' Kernel: ', kernels[i], '  Accuracy: ', acc[i]))
}

[1] " Kernel: rbfdot  Accuracy: 0.957186544342508"
[1] " Kernel: polydot  Accuracy: 0.863914373088685"
[1] " Kernel: vanilladot  Accuracy: 0.863914373088685"
[1] " Kernel: tanhdot  Accuracy: 0.7217125382263"
[1] " Kernel: laplacedot  Accuracy: 1"
[1] " Kernel: besseldot  Accuracy: 0.925076452599388"
[1] " Kernel: anovadot  Accuracy: 0.906727828746177"
[1] " Kernel: splinedot  Accuracy: 0.978593272171254"


**Discussion:** Using a c value of 100, as determined previously, different kernels were tested. The best performing kernels, rbfdot (0.95), splinedot (0.97), and laplacedot (1.00) performed better than the linear vanilladot kernel (0.86) originally used. It is suspicious that these three kernels produced such high accuracy. This may be due to the use of training the entire data set and not using a training and testing set, resulting in overfitting.

**Takeaway:** The model performed best at C=100 using the laplacedot kernel, but is probably an overfitted model. There are caveats since the data was not split into train and test sets.

**Try k-nearest neighbors and check multiple values of k**

In [15]:
# Install Package and Import Library
install.packages("kknn")
library(kknn)

package 'kknn' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\sjarr\AppData\Local\Temp\RtmpMrsT2S\downloaded_packages


"package 'kknn' was built under R version 3.6.3"

In [17]:
# Create a list of potential k values
k_vals <- c(1:20)
print(k_vals)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20


In [18]:
# Model using rectangular kernel
for (k_val in k_vals){
    pred <- list()

    for (i in 1:nrow(data)){
        model <- kknn(R1~., 
                      data[-i,], 
                      data[i,], 
                      k=k_val,
                      distance=2, 
                      kernel="rectangular",
                      scale = TRUE)
        
        pred[i] <- fitted(model)
      }
    acc = sum(pred == data[,11]) / nrow(data)
    print(paste0(' K: ', k_val, '  Accuracy: ', acc)) 
}

[1] " K: 1  Accuracy: 0.814984709480122"
[1] " K: 2  Accuracy: 0.785932721712538"
[1] " K: 3  Accuracy: 0.767584097859327"
[1] " K: 4  Accuracy: 0.741590214067278"
[1] " K: 5  Accuracy: 0.717125382262997"
[1] " K: 6  Accuracy: 0.700305810397553"
[1] " K: 7  Accuracy: 0.685015290519878"
[1] " K: 8  Accuracy: 0.67737003058104"
[1] " K: 9  Accuracy: 0.665137614678899"
[1] " K: 10  Accuracy: 0.662079510703364"
[1] " K: 11  Accuracy: 0.654434250764526"
[1] " K: 12  Accuracy: 0.645259938837921"
[1] " K: 13  Accuracy: 0.636085626911315"
[1] " K: 14  Accuracy: 0.631498470948012"
[1] " K: 15  Accuracy: 0.623853211009174"
[1] " K: 16  Accuracy: 0.619266055045872"
[1] " K: 17  Accuracy: 0.603975535168196"
[1] " K: 18  Accuracy: 0.596330275229358"
[1] " K: 19  Accuracy: 0.58868501529052"
[1] " K: 20  Accuracy: 0.584097859327217"


**Try a different kernel**

In [19]:
# Model using optimal kernel
for (k_val in k_vals){
    pred <- list()

    for (i in 1:nrow(data)){
        model <- kknn(R1~., 
                      data[-i,], 
                      data[i,], 
                      k=k_val,
                      distance=2, 
                      kernel="optimal",
                      scale = TRUE)
        
        pred[i] <- fitted(model)
      }
    acc = sum(pred == data[,11]) / nrow(data)
    print(paste0(' K: ', k_val, '  Accuracy: ', acc)) 
}

[1] " K: 1  Accuracy: 0.814984709480122"
[1] " K: 2  Accuracy: 0.785932721712538"
[1] " K: 3  Accuracy: 0.767584097859327"
[1] " K: 4  Accuracy: 0.741590214067278"
[1] " K: 5  Accuracy: 0.717125382262997"
[1] " K: 6  Accuracy: 0.700305810397553"
[1] " K: 7  Accuracy: 0.685015290519878"
[1] " K: 8  Accuracy: 0.67737003058104"
[1] " K: 9  Accuracy: 0.665137614678899"
[1] " K: 10  Accuracy: 0.662079510703364"
[1] " K: 11  Accuracy: 0.654434250764526"
[1] " K: 12  Accuracy: 0.645259938837921"
[1] " K: 13  Accuracy: 0.636085626911315"
[1] " K: 14  Accuracy: 0.631498470948012"
[1] " K: 15  Accuracy: 0.623853211009174"
[1] " K: 16  Accuracy: 0.619266055045872"
[1] " K: 17  Accuracy: 0.603975535168196"
[1] " K: 18  Accuracy: 0.596330275229358"
[1] " K: 19  Accuracy: 0.58868501529052"
[1] " K: 20  Accuracy: 0.584097859327217"


**Discussion:** As the k values (number of nearest neighbors) increased, the accuracy decreased. The highest accuracy was using one k nearest neighbor. However, using one neighbor is not realistic and does not provide a margin for future predictions. K valus from 2-5 provide the best results.

Two different kernels were tested, rectangular and optimal. The rectangular model attributes an even 'vote' between the neighbors closest to the point. The optimal kernel weights points closer to the point higher. The accuracies were same between using the two kernels, potentially indicating a more even spread of data.