# Chapter 5 Exercises

# Exercise 1
What is the fundamental idea behind Support Vector Machines?

Support Vector Machines try to find the best line that separates the classes we are classifying (by trying to get the line with the highest distance to each class).

## Exercise 2
What is a support vector?

A support vector is an instance of a class that is determining the decision boundary of the support vector machine.

## Exercise 3
Why is it important to scale the inputs when using SVMs?

The decision boundary made by the SVM is affected by features scales. If we have very different scales along each feature we will get a poor decision boundary.

## Exercise 4
Can an SVM classifier output a confidence score when it classifies an instance? What about a probability?

The SVM can output the distance between the instance and the decision boundary, which can be used as a confidence score of the classification. However, support vector machines can't output a probability.

## Exercise 5
Should you use the primal or the dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?

The primal form should be used in this case, as its complexity is linear in regards to the number of features m, while the complexity of the dual form ranges from m^2 to m^3. Dual form allows us to do the kernel trick, and is much faster when the number of instances is smaller than the number of features.

## Exercise 6
Say you trained an SVM classifier with an RBF kernel. It seems to underfit the training set: should you increase or decrease gamma? What about C?

Since the classifier is suffering from high bias (underfitting) the best solution would be to decrease regularization, which means we should increase both gamma and C.

## Exercise 7
How should you set the QP parameters (H, f, A, and b) to solve the soft margin linear SVM classifier problem using an off-the-shelf QP solver?

:(

## Exercise 8
Train a LinearSVC on a linearly separable dataset. Then train an SVC and a SGDClassifier on the same dataset. See if you can get them to produce roughly the same model.

In [18]:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = iris["target"]

setosa_or_versicolor = (y == 0) | (y == 1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

scaler = StandardScaler()
X = scaler.fit_transform(X)

In [19]:
C = 4
alpha = 1 / (C * len(X))

In [25]:
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier

classifiers = [
    LinearSVC(loss="hinge", C=C),
    SVC(kernel="linear", C=C),
    SGDClassifier(max_iter=1000, alpha=alpha)
]

for clf in classifiers:
    clf.fit(X, y)
    print("Classifier: {0} - Intercept: {1} - Coefs: {2}".format(
            clf.__class__.__name__, clf.intercept_, clf.coef_))

Classifier: LinearSVC - Intercept: [0.28480924] - Coefs: [[1.05542607 1.09851927]]
Classifier: SVC - Intercept: [0.31933577] - Coefs: [[1.1223101  1.02531081]]
Classifier: SGDClassifier - Intercept: [0.3210888] - Coefs: [[1.12603939 1.02562733]]
