# SVM LAB

The Support Vector Classifier lab discussed how support vector machines are used to classify data using non linear boundaries.  The dependent variable in IRIS data is scattered every where, making it hard to make a linear boundary for classifying the observations. Fit a SVM model to classify the observations into classes Setosa, Virginica and Versicolor using a polynomial kernal.

### Load R Library with Iris Data

In [None]:
library("e1071")

Show yourself the top of the file

In [None]:
head(iris,5)

Attach the iris data in memory so you can reference it directly.

In [None]:
attach(iris)

Now create a model using the "svm" function using a linear kernal first and evaluate its performance.

**Reference: ** 

   [SVM tutorial 1](http://www.di.fc.ul.pt/~jpn/r/svm/svm.html#non-linearly-separable-data) 
   
   [SVM tutorial 2](https://rpubs.com/ryankelly/svm)

In [None]:
# Fit the model using Sepal.Length and Sepal.Width as the predictors. Use a linear kernal to fit the model.
svm.model <- svm(Species ~ Sepal.Length + Sepal.Width, data = iris, kernel = "linear")

# Plot the Species and show the support vectors on graph. 
# the + signs are support vectors
plot(iris$Sepal.Length, iris$Sepal.Width, col = as.integer(iris[, 5]), # color the points based on species 
     pch = c("o","+")[1:150 %in% svm.model$index + 1], # Mark the support vectors with a `+` sign and test with a `o` sign
                                                       # "1:150 %in% svm.model$index" will generate a vector of size 150
                                                       # with TRUE and FALSE values. A TRUE is assigned if the value is a 
                                                       # support vector. Addimg one to the vector will give values 1 and 2 
                                                       # instead of TRUE(1) and FALSE(0). Every 1 in the vector is displayed
                                                       # as o and 2 is displayed as +. 
     cex = 2, 
     xlab = "Sepal length", ylab = "Sepal width")

In [None]:
# Plot the Species by splitting the feature space into three different regions according to species class
plot(svm.model, iris, Sepal.Width ~ Sepal.Length, # Plot the model predictions with sepal.width on y-axis and sepal.length
                                                  # on x-axis
     slice = list(sepal.width = 1, sepal.length = 2)) # a list of named numeric values for the dimensions held constant 
                                                      # slice is needed if more than two variables are used.

In [None]:
# Make predictions of species using the svm model built
svm.pred  <- predict(svm.model, iris[,-5]) 

# Build a confusion matrix for the predictions made against the original classes of flowers
library(caret)
confusionMatrix(svm.pred, iris[,5])

The svm model did not do a great job with a linear kernal. The accuracy of the model is 81.3
         
         (49+38+35)/150  --- number of TRUE predictions/total observations 
         
**Reference: ** [Confusion matrix function and its results](http://rpubs.com/prcuny/161764)

In [None]:
# Fit the model using a polynomial kernal and Sepal.Length, Sepal.Width as predictor variables.
svm.model <- svm(Species ~ Sepal.Length + Sepal.Width, data = iris, kernel = 'polynomial', degree=8, coef0=1)
                      # For polynomial kernels we use the parameter degree to adjust the polynomial order. 
                      # For radial kernels we use the gamma parameter to adjust the y value.
                      # Independent term in kernel function. It is only significant in ‘polynomial’ and ‘sigmoid’ kernals
                  
plot(svm.model, iris, Sepal.Width ~ Sepal.Length,      # Plot the predictions
     slice = list(Sepal.Width = 1, Sepal.Length = 2)) 

In [None]:
svm.pred  <- predict(svm.model, iris[,-5]) 
confusionMatrix(svm.pred, iris[,5]) # show the confusion matrix

In [None]:
# There is no improvement in the accuracy of the model even after using a polynomial of degree 8. 
# We only used just two attributes for making predictions. Use all independent variables for building the model.
svm.model <- svm(Species ~ ., data = iris, kernel = 'polynomial', degree=8, coef0=1)
plot(svm.model, iris, Sepal.Width ~ Sepal.Length, 
     slice = list(Petal.Width = 3, Petal.Length = 2.5)) # showing a 2D slice of the 4D space

In [None]:
svm.pred  <- predict(svm.model, iris[,-5]) 
confusionMatrix(svm.pred, iris[,5]) # show the confusion matrix

There it is. Using all variables as predcitos we got 98% accuracy in our model. 