# Chapter 5 Exercises

# Exercise 1
What is the fundamental idea behind Support Vector Machines?

Support Vector Machines try to find the best line that separates the classes we are classifying (by trying to get the line with the highest distance to each class).

## Exercise 2
What is a support vector?

A support vector is an instance of a class that is determining the decision boundary of the support vector machine.

## Exercise 3
Why is it important to scale the inputs when using SVMs?

The decision boundary made by the SVM is affected by features scales. If we have very different scales along each feature we will get a poor decision boundary.

## Exercise 4
Can an SVM classifier output a confidence score when it classifies an instance? What about a probability?

The SVM can output the distance between the instance and the decision boundary, which can be used as a confidence score of the classification. However, support vector machines can't output a probability.

## Exercise 5
Should you use the primal or the dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?

The primal form should be used in this case, as its complexity is linear in regards to the number of features m, while the complexity of the dual form ranges from m^2 to m^3. Dual form allows us to do the kernel trick, and is much faster when the number of instances is smaller than the number of features.

## Exercise 6
Say you trained an SVM classifier with an RBF kernel. It seems to underfit the training set: should you increase or decrease gamma? What about C?

Since the classifier is suffering from high bias (underfitting) the best solution would be to decrease regularization, which means we should increase both gamma and C.

## Exercise 7
How should you set the QP parameters (H, f, A, and b) to solve the soft margin linear SVM classifier problem using an off-the-shelf QP solver?

:(

## Exercise 8
Train a LinearSVC on a linearly separable dataset. Then train an SVC and a SGDClassifier on the same dataset. See if you can get them to produce roughly the same model.

In [1]:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = iris["target"]

setosa_or_versicolor = (y == 0) | (y == 1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

scaler = StandardScaler()
X = scaler.fit_transform(X)

In [2]:
C = 4
alpha = 1 / (C * len(X))

In [3]:
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier

classifiers = [
    LinearSVC(loss="hinge", C=C),
    SVC(kernel="linear", C=C),
    SGDClassifier(max_iter=1000, alpha=alpha)
]

for clf in classifiers:
    clf.fit(X, y)
    print("Classifier: {0} - Intercept: {1} - Coefs: {2}".format(
            clf.__class__.__name__, clf.intercept_, clf.coef_))

Classifier: LinearSVC - Intercept: [0.2848091] - Coefs: [[1.05542597 1.09851911]]
Classifier: SVC - Intercept: [0.31933577] - Coefs: [[1.1223101  1.02531081]]
Classifier: SGDClassifier - Intercept: [0.32089878] - Coefs: [[1.12693779 1.02328669]]


## Exercise 9

Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 10 digits. What accuracy can you reach?

In [4]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X = digits['data']
y = digits['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)

In [5]:
from sklearn.model_selection import cross_val_predict, GridSearchCV
from sklearn.metrics import accuracy_score

params = {
    'loss': ('hinge', 'squared_hinge'),
    'C': [0.1, 0.33, 0.66, 1, 1.33, 2, 3, 5, 10],
    'multi_class': ('ovr', 'crammer_singer'),
}

clf = LinearSVC()
grid = GridSearchCV(clf, params)
grid.fit(X_train, y_train)

y_pred = cross_val_predict(grid, X_train, y_train)
print("Accuracy: ", accuracy_score(y_pred, y_train))

Accuracy:  0.9568545581071677


In [6]:
grid.best_params_

{'C': 0.1, 'loss': 'hinge', 'multi_class': 'crammer_singer'}

## Exercise 10

Train an SVM regressor on the California housing dataset.

In [22]:
from sklearn.datasets import fetch_california_housing

housing_data = fetch_california_housing()
X = housing_data['data']
y = housing_data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)

In [23]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [20]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV, cross_val_predict
from sklearn.svm import LinearSVR

params = {
    'C': [0.1, 0.33, 0.66, 1, 1.33, 1.66, 3, 10],
    'loss': ('epsilon_insensitive', 'squared_epsilon_insensitive'),
    'epsilon': (0, 0.01, 0.1, 0.33, 1),
}

lin_svr = LinearSVR(random_state=7)
grid = GridSearchCV(LinearSVR(), params)
grid.fit(X_train_scaled, y_train)

y_pred = grid.best_estimator_.predict(X_train_scaled)
print("MSE: ", mean_squared_error(y_train, y_pred))

MSE:  3.2433864205869885


In [25]:
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import reciprocal, uniform

param_distributions = {"gamma": reciprocal(0.001, 0.1), "C": uniform(1, 10)}
rnd_search_cv = RandomizedSearchCV(SVR(), param_distributions, n_iter=10, verbose=2, random_state=42)
rnd_search_cv.fit(X_train_scaled, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] C=4.745401188473625, gamma=0.07969454818643928 ..................
[CV] ... C=4.745401188473625, gamma=0.07969454818643928, total=   5.7s
[CV] C=4.745401188473625, gamma=0.07969454818643928 ..................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    7.7s remaining:    0.0s


[CV] ... C=4.745401188473625, gamma=0.07969454818643928, total=   5.6s
[CV] C=4.745401188473625, gamma=0.07969454818643928 ..................
[CV] ... C=4.745401188473625, gamma=0.07969454818643928, total=   5.6s
[CV] C=8.31993941811405, gamma=0.015751320499779724 ..................
[CV] ... C=8.31993941811405, gamma=0.015751320499779724, total=   5.2s
[CV] C=8.31993941811405, gamma=0.015751320499779724 ..................
[CV] ... C=8.31993941811405, gamma=0.015751320499779724, total=   5.3s
[CV] C=8.31993941811405, gamma=0.015751320499779724 ..................
[CV] ... C=8.31993941811405, gamma=0.015751320499779724, total=   5.2s
[CV] C=2.560186404424365, gamma=0.002051110418843397 .................
[CV] .. C=2.560186404424365, gamma=0.002051110418843397, total=   4.8s
[CV] C=2.560186404424365, gamma=0.002051110418843397 .................
[CV] .. C=2.560186404424365, gamma=0.002051110418843397, total=   4.8s
[CV] C=2.560186404424365, gamma=0.002051110418843397 .................
[CV] .

[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:  3.5min finished


RandomizedSearchCV(cv=None, error_score='raise',
          estimator=SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False),
          fit_params=None, iid=True, n_iter=10, n_jobs=1,
          param_distributions={'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x120d382e8>, 'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x120d38898>},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

In [26]:
y_pred = rnd_search_cv.best_estimator_.predict(X_train_scaled)
mse = mean_squared_error(y_train, y_pred)
np.sqrt(mse)

0.5623681456069524