# MATH 3375 Examples Notebook #15

# Support Vector Machines (SVM)

We explore another algorithm for predicting classification (categorical response), using the **mtcars** and **iris** data sets.

Again, our response variable is **am**, the transmission type, where 0=automatic, and 1=manual.


In [None]:
#install.packages("e1071")
library(e1071)

In [None]:
#Look at data set
head(mtcars)

## Another Way to Leverage 'Separation' of Data Points 

As before, we visualize 2 predictors of transmission type in a 2-dimensional plot, color-coding the points by transmission type. Our 2 predictors are displacement (**disp**) and rear axle ratio (**drat**).

In [None]:
shapes = c(16,17)
colors = c("red", "green")
plot(disp ~ drat, main="Displacement and Rear Axle Ratio by Transmission Type", xlab="Rear Axle Ratio", ylab="Displacement",
    col=colors[factor(mtcars$am)], pch=shapes[factor(mtcars$am)], data=mtcars)

legend("topright",
       legend = c("Automatic","Manual"),
       pch = shapes,
       col = colors)

## Finding a _Decision Boundary_

Below we create a simple **_Support Vector Machine (SVM)_** to create a _hyperplane_ between the two 2-dimensional regions. 

* The shape of the hyperplane is governed by the **_kernel_** used to create it. In this example, we use a linear kernel.
* Because we are in 2 dimensions, the linear kernel will be a line; in 3 dimensions, it would be a plane. ('Hyperplane' is a general term that can apply in any number of dimensions.)
* The hyperplane should maximize the overall distance between itself and the points being classified on either side.
* Points on the 'wrong' side will penalize this overall distance.

In [None]:
subset_2d = data.frame(mtcars[,c(3,5)], y = as.factor(mtcars$am))
model_svm_tran01 = svm(y ~ ., data = subset_2d, kernel = "linear")
summary(model_svm_tran01)

### Characteristics of the Model

* This is a **_parametric_** model (like regression); the model parameters define the boundary mathematically.
* The 'support vectors' are the set of points in the data set that precisely fix the shape and position of the boundary (they are supporting structures holding the the boundary in its place). This is the origin of the term 'support vector' machine.
* The above model summary describes the following:
    * The model has a linear kernel
    * The model is predicting one of two classes
    * The model includes 14 of the data points to form the support vectors-- these are typically the points closest to the boundary
    
The plot below shows the data with an approximation of the decision boundary. This plot is only available when the model has 2 dimensions. 

In this plot, the COLOR of the points represents the **_actual_** classification (black=automatic, red=manual).

The points that make up the "support vectors" are indicated with an _x_, while all other points are indicated with an o.

In [None]:
plot(model_svm_tran01,subset_2d)

### Changing the Kernel

Below we create a second model with the same 2 dimensions (predictors) but a radial kernel.

In [None]:
model_svm_tran02 = svm(y ~ ., data = subset_2d, kernel = "radial")
summary(model_svm_tran02)

In [None]:
plot(model_svm_tran02,subset_2d)

### Increasing the Dimensions

Below we create a model using _all_ available features as predictors. This model will have 10 dimensions, so its decision boundary cannot be visualized.

The **svm** command will select an appropriate kernel when one is not specified.

In [None]:
model_svm_tran03 <- svm(as.factor(am)~.,data=mtcars)
summary(model_svm_tran03)

## Comparing Model Performance

Below we can see the predictions of all 3 models, alongside the actual classification. Recall that we are looking at **_in-sample_** performance here (these are only predictions on the training data).

In [None]:
data.frame(Linear_2D=fitted(model_svm_tran01),Radial_2D=fitted(model_svm_tran02),
           Radial_Full=fitted(model_svm_tran03),Actual=mtcars$am)

### Confusion Matrices

In [None]:
Actual <- mtcars$am
Pred_2D_Linear <- fitted(model_svm_tran01)
Pred_2D_Radial <- fitted(model_svm_tran02)
Pred_Full_Radial <- fitted(model_svm_tran03)

table(Pred_2D_Linear,Actual)
table(Pred_2D_Radial,Actual)
table(Pred_Full_Radial,Actual)

### Visualizing Model Performance

The plots below convey visually the actual correct class of each point AND how the point was classified by the SVM. This way we can see the same information given in the confusion matrix (True and False Positives and Negatives) with more more context.  **_Note that the plot of the full model only represents the data points with TWO dimensions, even though the SVM and its predictions were made with 10._**

The COLOR of the point indicates its **_actual_** classification (red=automatic, green=manual).

The SHAPE of the point indicates its **_predicted_** class: o is automatic (negative), + is manual (positive).

In [None]:
colors=c("red","green")
symbols = as.integer(fitted(model_svm_tran01))

plot(mtcars$drat,mtcars$disp, main = "SVM Linear Kernel - 2 Predictors",
     col=colors[factor(mtcars$am)], pch = c("o","+")[symbols], cex=c(1.2,1.5)[symbols])

legend("topright",
       legend = c("True Positive","True Negative","False Positive","False Negative"),
       pch = c("+","o"),
       col = colors[c(2,1,1,2)])

In [None]:
predicted = as.integer(fitted(model_svm_tran03))
plot(mtcars$drat,mtcars$disp, main = "SVM Radial Kernel with ALL Predictors",
     col=colors[factor(mtcars$am)], pch = c("o","+")[predicted], cex=c(1.2,1.5)[predicted])

legend("topright",
       legend = c("True Positive","True Negative","False Positive","False Negative"),
       pch = c("+","o"),
       col = colors[c(2,1,1,2)])

## Support Vector Machine for More than Two Classes

We will repeat the above process with the **iris** data set to predict Species, which has 3 possible classes. 

In [None]:
head(iris)

### Two Dimensions with Linear Kernel

We will use the Petal Length and Sepal Width features as our two dimensions.

In [None]:
shapes = c(15,16,17)
colors = c("red", "blue", "mediumorchid")
plot(iris$Petal.Length,iris$Sepal.Width, main="Iris Dimensions by Species", xlab="Petal Length", ylab="Sepal Width",
    col=colors[factor(iris$Species)], pch=shapes[factor(iris$Species)])

legend("topright",
       legend = levels(iris$Species),
       pch = shapes,
       col = colors)

In [None]:
subset_2d = data.frame(iris[,c(2,3)], y = iris$Species)
model_svm_iris01 = svm(y ~ ., data = subset_2d, kernel = "linear")
summary(model_svm_iris01)

In [None]:
plot(model_svm_iris01,subset_2d)

### Two Dimensions with Radial Kernel

In [None]:
model_svm_iris02 = svm(y ~ ., data = subset_2d, kernel = "radial")
summary(model_svm_iris02)

In [None]:
plot(model_svm_iris02,subset_2d)

### Four Dimensions with Default Kernel

In [None]:
model_svm_iris03 = svm(Species ~ ., data = iris)
summary(model_svm_iris03)

### Model Comparison

In [None]:
data.frame(Linear_2D=fitted(model_svm_iris01),Radial_2D=fitted(model_svm_iris02),
           Radial_Full=fitted(model_svm_iris03),Actual=iris$Species)

#### Confusion Matrices with More Than 2 Classes 


In [None]:
Actual <- iris$Species
Pred_2D_Linear <- fitted(model_svm_iris01)
Pred_2D_Radial <- fitted(model_svm_iris02)
Pred_Full_Radial <- fitted(model_svm_iris03)

table(Pred_2D_Linear,Actual)
table(Pred_2D_Radial,Actual)
table(Pred_Full_Radial,Actual)

#### Differentiating Species by Each Predictor

As with k_Nearest Neighbors, some dimensions will make a stronger contribution to the decision boundary by more clearly separating the data points.

## Suggestion

Divide the **iris** data set into a training and testing set.  Create k-Nearest Neighbors (kNN) and a Support Vector Machines (SVMs) to predict Species. Create models with the training set and test them with the test set.

* Which variables are the most helpful to include in each model? 
* Which kernels give the best predictions? 
* Which model gives better predictions for this data set-- kNN or SVM?