In this chapter you will learn all about the details of support vector machines. You'll learn about tuning hyperparameters for these models and using kernels to fit non-linear decision boundaries.

# 1- Support vectors


video

# 2- Support vector definition


<p>Which of the following is a true statement about support vectors? To help you out, here&apos;s the picture of support vectors from the video (top),
as well as the hinge loss from Chapter 2 (bottom).</p>
<p><img src="https://s3.amazonaws.com/assets.datacamp.com/production/course_6199/datasets/slides_svm_sv.png" alt></p>
<p><img src="https://s3.amazonaws.com/assets.datacamp.com/production/course_6199/datasets/diagram_hinge_loss.png" alt></p>

# 3- Effect of removing examples

<p>Support vectors are defined as training examples that influence the decision boundary. In this exercise, you&apos;ll observe this behavior by removing non support vectors from the training set. </p>
<p>The wine quality dataset is already loaded into <code>X</code> and <code>y</code> (first two features only). (Note: we specify <code>lims</code> in <code>plot_classifier()</code> so that the two plots are forced to use the same axis limits and can be compared directly.)</p>

<ul>
<li>Train a linear SVM on the whole data set.</li>
<li>Create a new data set containing only the support vectors.</li>
<li>Train a new linear SVM on the smaller data set.</li>
</ul>

In [None]:
# Train a linear SVM
svm = SVC(kernel="linear")
svm.fit(X, y)
plot_classifier(X, y, svm, lims=(11,15,0,6))

# Make a new data set keeping only the support vectors
print("Number of original examples", len(X))
print("Number of support vectors", len(svm.support_))
X_small = X[svm.support_]
y_small = y[svm.support_]

# Train a new SVM using only the support vectors
svm_small = SVC(kernel="linear")
svm_small.fit(X_small, y_small)
plot_classifier(X_small, y_small, svm_small, lims=(11,15,0,6))





# 4- Kernel SVMs


video

# 5- GridSearchCV warm-up


<p>In the video we saw that increasing the RBF kernel hyperparameter <code>gamma</code> increases training accuracy. In this exercise we&apos;ll search for the <code>gamma</code> that maximizes cross-validation accuracy using scikit-learn&apos;s <code>GridSearchCV</code>. A binary version of the handwritten digits dataset, in which you&apos;re just trying to predict whether or not an image is a &quot;2&quot;, is already loaded into the variables <code>X</code> and <code>y</code>.</p>

<ul>
<li>Create a <code>GridSearchCV</code> object.</li>
<li>Call the <code>fit()</code> method to select the best value of <code>gamma</code> based on cross-validation accuracy.</li>
</ul>

In [None]:
# Instantiate an RBF SVM
svm = SVC()

# Instantiate the GridSearchCV object and run the search
parameters = {'gamma':[0.00001, 0.0001, 0.001, 0.01, 0.1]}
searcher = GridSearchCV(svm, parameters)
searcher.fit(X,y)

# Report the best parameters
print("Best CV params", searcher.best_params_)

# 6- Jointly tuning gamma and C with GridSearchCV

<p>In the previous exercise the best value of <code>gamma</code> was 0.001 using the default value of <code>C</code>, which is 1. In this exercise you&apos;ll search for the best combination of <code>C</code> and <code>gamma</code> using <code>GridSearchCV</code>.</p>
<p>As in the previous exercise, the 2-vs-not-2 digits dataset is already loaded, but this time it&apos;s split into the variables <code>X_train</code>, <code>y_train</code>, <code>X_test</code>, and <code>y_test</code>. Even though cross-validation already splits the training set into parts, it&apos;s often a good idea to hold out a separate test set to make sure the cross-validation results are sensible.</p>

<ul>
<li>Run <code>GridSearchCV</code> to find the best hyperparameters using the training set.</li>
<li>Print the best values of the parameters.</li>
<li>Print out the accuracy on the test set, which was not used during the cross-validation procedure.</li>
</ul>

In [None]:
# Instantiate an RBF SVM
svm = SVC()

# Instantiate the GridSearchCV object and run the search
parameters = {'C':[0.1, 1, 10], 'gamma':[0.00001, 0.0001, 0.001, 0.01, 0.1]}
searcher = GridSearchCV(svm, parameters)
searcher.fit(X_train, y_train)

# Report the best parameters and the corresponding score
print("Best CV params", searcher.best_params_)
print("Best CV accuracy", searcher.best_score_)

# Report the test accuracy using these best parameters
print("Test accuracy of best grid search hypers:", searcher.score(X_test, y_test))

You got it! Note that the best value of gamma, 0.0001, is different from the value of 0.001 that we got in the previous exercise, when we fixed C=1. Hyperparameters can affect each other!

# 7- Comparing logistic regression and SVM (and beyond)


video

# 8- An advantage of SVMs


Which of the following is an advantage of SVMs over logistic regression?



# 9- An advantage of logistic regression

Which of the following is an advantage of logistic regression over SVMs?



# 10- Using SGDClassifier

<p>In this final coding exercise, you&apos;ll do a hyperparameter search over the regularization type, regularization strength, and the loss (logistic regression vs. linear SVM) using <code>SGDClassifier()</code>.</p>

<ul>
<li>Instantiate an <code>SGDClassifier</code> instance with <code>random_state=0</code>.</li>
<li>Search over the regularization strength, the <code>hinge</code> vs. <code>log</code> losses, and L1 vs. L2 regularization.</li>
</ul>

In [None]:
# We set random_state=0 for reproducibility 
linear_classifier = SGDClassifier(random_state=0)

# Instantiate the GridSearchCV object and run the search
parameters = {'alpha':[0.00001, 0.0001, 0.001, 0.01, 0.1, 1], 
             'loss':['hinge', 'log'], 'penalty':['l1','l2']}
searcher = GridSearchCV(linear_classifier, parameters, cv=10)
searcher.fit(X_train, y_train)

# Report the best parameters and the corresponding score
print("Best CV params", searcher.best_params_)
print("Best CV accuracy", searcher.best_score_)
print("Test accuracy of best grid search hypers:", searcher.score(X_test, y_test))