# Exercises

1. What is the fundamental idea behind support vector machines?
2. What is a support vector?
3. Why is it important to scale the inputs when using SVMs?
4. Can a SVM classifier output a confidence score when it classifies an instance? What about a probability?
5. Should you use the primal or dual form of the SVM problem to train a model on a training set with millions of instances & hundred of features?
6. Say you trained an SVM classifier with an RBF kernel. it seems to underfit the training set: should you increase or decrease $\gamma$ (`gamma`)? What about `C`?
7. How should you set the QP parameters ($H$, $f$, $A$, & $b$) to solve the soft margin linear SVM classifier problem using an off-the-shelf QP solver?
8. Train a `LinearSVC` on a linearly separable dataset. Then train an `SVC` & a `SGDClassifer` on the same dataset. See if you can get them to produce roughly the same model.
9. Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 10 digits. you may want to tune the hyperparameters using small validation sets to speed up the process. What accuracy can you reach?
10. Train an SVM regressor on the California housing dataset.

---

1. For SVM classification, you want to find the best parameters for the decision function that defines the hyperplane that maximises the distance between it & its support vectors (maximise the margin), while minimising the number of instances that exist between the hyperplane & its support vectors (minimising the amount of margin violation). For SVM regression, its the opposite. You want to increase the number of instances between the support vectors & minimise the number of instances outside of it.
2. Support vectors are the instances that lie on or within the margin. They are used to define the decision boundary. Instances outside of the margin-violation line do not affect the decision boundary. Computing the predictions only involve the support vectors, not all instances.
3. Support vectors machines are sensitive to feature scales, so to improve the model performance (larger margin, less margin violation) & speed, we scale the inputs. This ensures that each feature will have equal weight when building the model.
4. Not sure what "confidence score" means in this question, but you can output the distance between an instance & the decision boundary. SVM classifiers cannot output probabilities.
5. The dual form of the SVM problem is faster to solve than the primal when the number of training instances is smaller than the number of features.
6. You should increase gamma if your classifier is underfit. Same applies with C.
7. For a hard margin linear SVM classifier, the QP parameters are defined as:
   * $n_p = n + 1$, where $n$ is the number of features (+1 for the bias term)
   * $n_c = m$, where m is the number of training instances
   * $H$ is the $n_p * n_p$ identity matrix, except with a zero in the top-left cell (to ignore the bias term)
   * $f = 0$, an $n_p$ dimensional vector full of 0s
   * $b = -1$, an $n_c$ dimensional vector full of -1s
   * $a^{(i)} = -t^{(i)}x^{(i)}$, where $x^{(i)}$ is equal to $x^{(i)}$ with an extra bias feature $x_0 = 1$
   
   For soft margin linear SVM classifier, the QP parameters have $m$ additional parameters & $m$ additional constraints, so:
   * $n_p = n + 1 + m$
   * $n_c = 2m$
   * $H$ is an $n_p * n_p$ identity matrix, plus $m$ columns of 0s on the right & $m$ rows of 0s at the bottom.
   * $f$ is an $n_p$ dimensional vector with $m$ additional elements, all equal to the value of hyperparameter C
   * $b$ is an $n_c$ dimensional vector with $m$ additional elements, all equal to 0
   * $a = -tx$ with an extra $m * m$ identity matrix to the right & another below the said identity matrix, while the rest is filled with 0.

# 8.

In [1]:
from sklearn import datasets
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]
y = (iris["target"] == 2).astype(np.float64)
sss = StratifiedShuffleSplit(n_splits = 1, test_size = 0.2, random_state = 32)

for train_indices, test_indices in sss.split(X, y):
    X_train, y_train = X[train_indices], y[train_indices]
    X_test, y_test = X[test_indices], y[test_indices]
X_train



array([[5.1, 1.9],
       [1.7, 0.2],
       [5.3, 1.9],
       [4.4, 1.4],
       [1.5, 0.2],
       [1.5, 0.2],
       [1.4, 0.2],
       [4.9, 1.5],
       [6.4, 2. ],
       [5.6, 2.4],
       [3. , 1.1],
       [4.3, 1.3],
       [1.6, 0.2],
       [5.1, 1.9],
       [1.3, 0.3],
       [3.8, 1.1],
       [4.9, 1.8],
       [4.5, 1.5],
       [1.6, 0.2],
       [4. , 1.2],
       [4.8, 1.8],
       [5. , 1.9],
       [1.4, 0.2],
       [1.4, 0.1],
       [1.4, 0.3],
       [1.1, 0.1],
       [1.6, 0.4],
       [1.4, 0.1],
       [1.7, 0.5],
       [5.1, 2. ],
       [5.1, 2.3],
       [4.9, 1.8],
       [4.3, 1.3],
       [4.6, 1.3],
       [1.4, 0.3],
       [1.4, 0.2],
       [1.6, 0.2],
       [4.5, 1.5],
       [4.8, 1.8],
       [5.1, 1.8],
       [4.7, 1.2],
       [1.4, 0.3],
       [3.9, 1.1],
       [4.7, 1.5],
       [3.3, 1. ],
       [4.7, 1.4],
       [6.3, 1.8],
       [5.8, 2.2],
       [1.5, 0.2],
       [5.8, 1.6],
       [1.3, 0.2],
       [4.2, 1.3],
       [4.7,

In [2]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

prep_steps = Pipeline([("scaler", StandardScaler())])
X_train_prepared = prep_steps.fit_transform(X_train)
X_test_prepared = prep_steps.fit_transform(X_test)
X_train_prepared

array([[ 0.76915544,  0.92993001],
       [-1.18608082, -1.34199481],
       [ 0.88416934,  0.92993001],
       [ 0.3666068 ,  0.26171683],
       [-1.30109472, -1.34199481],
       [-1.30109472, -1.34199481],
       [-1.35860167, -1.34199481],
       [ 0.65414154,  0.39535947],
       [ 1.51674578,  1.06357265],
       [ 1.05669019,  1.5981432 ],
       [-0.43849049, -0.13921108],
       [ 0.30909985,  0.12807419],
       [-1.24358777, -1.34199481],
       [ 0.76915544,  0.92993001],
       [-1.41610862, -1.20835217],
       [ 0.02156511, -0.13921108],
       [ 0.65414154,  0.79628738],
       [ 0.42411375,  0.39535947],
       [-1.24358777, -1.34199481],
       [ 0.136579  , -0.00556844],
       [ 0.5966346 ,  0.79628738],
       [ 0.71164849,  0.92993001],
       [-1.35860167, -1.34199481],
       [-1.35860167, -1.47563745],
       [-1.35860167, -1.20835217],
       [-1.53112252, -1.47563745],
       [-1.24358777, -1.07470954],
       [-1.35860167, -1.47563745],
       [-1.18608082,

In [3]:
from sklearn.svm import LinearSVC, SVC
from sklearn.model_selection import GridSearchCV

svc_class = SVC()
svc_param_space = [{"kernel":["linear"], "C":[1, 3, 5, 7, 9], "tol":[1e-5]}]
svc_grid_search = GridSearchCV(svc_class, svc_param_space, scoring = "accuracy", 
                               cv = 10, return_train_score = True)
svc_grid_search.fit(X_train_prepared, y_train)
svc_grid_search.best_params_

{'C': 1, 'kernel': 'linear', 'tol': 1e-05}

In [4]:
linear_svc_class = LinearSVC()
linear_svc_param_space = [{"penalty":["l2"], "loss":["hinge"], "C":[1, 3, 5, 7, 9], 
                           "max_iter":[2000], "tol":[1e-5]}]
linear_svc_grid_search = GridSearchCV(linear_svc_class, linear_svc_param_space, scoring = "accuracy", 
                                      cv = 10, return_train_score = True)
linear_svc_grid_search.fit(X_train_prepared, y_train)
linear_svc_grid_search.best_params_

{'C': 1, 'loss': 'hinge', 'max_iter': 2000, 'penalty': 'l2', 'tol': 1e-05}

In [5]:
from sklearn.linear_model import SGDClassifier

sgd_classifier = SGDClassifier()
sgd_classifier_param_space = [{"loss":["hinge"], "penalty":["l2"], "alpha":[1e-3, 2e-3, 3e-3, 4e-3, 5e-3], 
                               "tol":[1e-5], "max_iter":[2000]}]
sgd_classifier_grid_search = GridSearchCV(sgd_classifier, sgd_classifier_param_space,
                                          scoring = "accuracy", cv = 10, return_train_score = True)
sgd_classifier_grid_search.fit(X_train_prepared, y_train)
sgd_classifier_grid_search.best_params_

{'alpha': 0.001,
 'loss': 'hinge',
 'max_iter': 2000,
 'penalty': 'l2',
 'tol': 1e-05}

In [6]:
from sklearn.model_selection import cross_val_score

svc_scores = cross_val_score(SVC(**svc_grid_search.best_params_), X_train_prepared, y_train,
                             cv = 10, scoring = "accuracy", error_score = "raise")
print(f"Scores: {svc_scores}")
print(f"Mean: {svc_scores.mean()}")
print(f"Std Dev: {svc_scores.std()}")

Scores: [1.         1.         1.         1.         0.91666667 1.
 0.91666667 0.83333333 0.91666667 0.83333333]
Mean: 0.9416666666666667
Std Dev: 0.06508541396588878


In [10]:
linear_svc_scores = cross_val_score(LinearSVC(**linear_svc_grid_search.best_params_), X_train_prepared, y_train,
                                    cv = 10, scoring = "accuracy")
print(f"Scores: {linear_svc_scores}")
print(f"Mean: {linear_svc_scores.mean()}")
print(f"Std Dev: {linear_svc_scores.std()}")

Scores: [1.         1.         1.         1.         0.91666667 1.
 0.91666667 0.83333333 0.91666667 0.83333333]
Mean: 0.9416666666666667
Std Dev: 0.06508541396588878


In [11]:
sgd_classifier_scores = cross_val_score(SGDClassifier(**sgd_classifier_grid_search.best_params_),
                                        X_train_prepared, y_train, cv = 10, scoring = "accuracy")
print(f"Scores: {sgd_classifier_scores}")
print(f"Mean: {sgd_classifier_scores.mean()}")
print(f"Std Dev: {sgd_classifier_scores.std()}")

Scores: [1.         1.         1.         1.         0.91666667 1.
 1.         0.83333333 1.         0.83333333]
Mean: 0.9583333333333334
Std Dev: 0.06718548123582124


In [14]:
from sklearn.metrics import accuracy_score, precision_score, recall_score

svc = SVC(**svc_grid_search.best_params_)
svc.fit(X_train_prepared, y_train)
svc_pred = svc.predict(X_test_prepared)

print(f"Accuracy: {accuracy_score(y_test, svc_pred)}")
print(f"Precision: {precision_score(y_test, svc_pred)}")
print(f"Recall: {recall_score(y_test, svc_pred)}")

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


In [15]:
linear_svc = LinearSVC(**linear_svc_grid_search.best_params_)
linear_svc.fit(X_train_prepared, y_train)
linear_svc_pred = linear_svc.predict(X_test_prepared)

print(f"Accuracy: {accuracy_score(y_test, linear_svc_pred)}")
print(f"Precision: {precision_score(y_test, linear_svc_pred)}")
print(f"Recall: {recall_score(y_test, linear_svc_pred)}")

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


In [16]:
sgd_classifier = SGDClassifier(**sgd_classifier_grid_search.best_params_)
sgd_classifier.fit(X_train_prepared, y_train)
sgd_classifier_pred = sgd_classifier.predict(X_test_prepared)

print(f"Accuracy: {accuracy_score(y_test, sgd_classifier_pred)}")
print(f"Precision: {precision_score(y_test, sgd_classifier_pred)}")
print(f"Recall: {recall_score(y_test, sgd_classifier_pred)}")

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


---

# 9.

In [2]:
from sklearn.datasets import fetch_openml
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit

mnist = fetch_openml("mnist_784", version = 1, as_frame = False)
mnist.keys()
X, y = mnist["data"].astype(np.intc), mnist["target"].astype(np.intc)

strat_split = StratifiedShuffleSplit(n_splits = 1, test_size = 0.2, random_state = 32)
for train_index, test_index in strat_split.split(X, y):
    X_train, y_train = X[train_index], y[train_index]
    X_test, y_test = X[test_index], y[test_index]
X_train

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int32)

In [3]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

scaler = Pipeline([("scaler", StandardScaler())])
X_train_prepared = scaler.fit_transform(X_train)
X_test_prepared = scaler.fit_transform(X_test)
X_train_prepared

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [4]:
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV

sgd_classifier = SGDClassifier()
param_search_space = [{"alpha":[1e-3, 2e-3, 3e-3], "max_iter":[5000], "tol":[1e-4], "n_jobs":[6]}]
grid_search = GridSearchCV(sgd_classifier, param_search_space, cv = 3, 
                           scoring = "accuracy", return_train_score = True)
grid_search.fit(X_train_prepared, y_train)
grid_search.best_params_

{'alpha': 0.001, 'max_iter': 5000, 'n_jobs': 6, 'tol': 0.0001}

In [5]:
from sklearn.model_selection import cross_val_score

cross_val_score(SGDClassifier(**grid_search.best_params_), X_train_prepared, y_train, 
                scoring = "accuracy", cv = 5, n_jobs = 6)



array([0.90946429, 0.90991071, 0.90767857, 0.90526786, 0.90607143])

In [8]:
from sklearn.metrics import accuracy_score

sgd_classifier = SGDClassifier(**grid_search.best_params_)
sgd_classifier.fit(X_train_prepared, y_train)
sgd_pred = sgd_classifier.predict(X_test_prepared)
print(f"Accuracy: {accuracy_score(y_test, sgd_pred)}")

Accuracy: 0.9085


---

# 10.

In [10]:
import pandas as pd

housing = pd.read_csv("housing.csv")
housing

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY
...,...,...,...,...,...,...,...,...,...,...
20635,-121.09,39.48,25.0,1665.0,374.0,845.0,330.0,1.5603,78100.0,INLAND
20636,-121.21,39.49,18.0,697.0,150.0,356.0,114.0,2.5568,77100.0,INLAND
20637,-121.22,39.43,17.0,2254.0,485.0,1007.0,433.0,1.7000,92300.0,INLAND
20638,-121.32,39.43,18.0,1860.0,409.0,741.0,349.0,1.8672,84700.0,INLAND
