# Exercises

### Q1: Fundamental Idea Behind SVMs
Support Vector Machines aim to find the optimal separating hyperplane between two classes that maximizes the margin, i.e., the distance between the nearest data points of the classes (support vectors).

### Q2: Support Vector Definition
Support vectors are the data points that lie closest to the decision surface (or hyperplane) and are pivotal in defining the boundary. The SVM classifier is primarily influenced by these points.

### Q3: Importance of Scaling Inputs in SVMs
Input scaling in SVMs is crucial because the algorithm depends on calculating the distance between data points. Unscaled features can skew these distances and lead to a biased hyperplane that does not generalize well.

### Q4: SVM Confidence Scores and Probabilities
An SVM classifier typically provides a confidence score based on the data point's distance from the hyperplane. Probabilities can be derived from these scores using additional methods like Platt scaling.

### Q5: Primal vs. Dual SVM Problem for Large Datasets
For large training sets with fewer features, solving the primal problem is generally more efficient. The dual is preferable for kernelized SVMs or when the number of features is much greater than the number of instances.

### Q6: Adjusting \( $\gamma$ \) and \( $C$ \) in SVM with RBF Kernel for Underfitting
To combat underfitting in an SVM with an RBF kernel, increasing \( $\gamma$ \) makes the decision boundary more flexible, while increasing \( $C$ \) allows for a greater margin of error in classification, thus capturing more complexity.

## 7. Soft Margin Linear SVM Classifier with QP Solver

To set the QP parameters for solving a soft margin linear SVM classifier problem, configure:

- `H`: This is a matrix where each element `H[i][j]` is the dot product of the ith and jth training instances multiplied by their respective labels. It defines the curvature of the quadratic optimization problem.
- `f`: A vector that represents the linear part of the objective function. For SVM, this is typically set to a vector of -1s since we want to minimize the inverse of the distance of the margin.
- `A`: The constraint matrix that enforces the class labels. In the context of SVM, it's a diagonal matrix where each entry `A[i][i]` is the label of the ith instance.
- `b`: A vector of ones, which corresponds to the constraint that the slack variables must be greater than 1, scaled by the label of the instance.

## 8. Comparing LinearSVC, SVC, and SGDClassifier

- **LinearSVC**: Optimized for linear SVMs by using the liblinear library. It's suitable for large datasets and supports explicit feature mapping.
- **SVC with a linear kernel**: Uses the libsvm library and supports the kernel trick. Typically slower on large datasets, but effective for datasets where the number of features is high compared to the number of samples.
- **SGDClassifier**: Implements a linear classifier with stochastic gradient descent learning. It's very flexible and can handle large datasets efficiently.

These classifiers can be compared on a linearly separable dataset to understand their performance and decision boundaries.


In [2]:
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Generate a linearly separable dataset
X, y = make_classification(n_features=4, random_state=42, n_redundant=0, n_informative=4, n_clusters_per_class=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the LinearSVC
linear_svc = LinearSVC(random_state=42)
linear_svc.fit(X_train, y_train)

# Train the SVC with linear kernel
svc = SVC(kernel='linear', random_state=42)
svc.fit(X_train, y_train)

# Train the SGDClassifier
sgd_clf = SGDClassifier(loss='hinge', random_state=42)
sgd_clf.fit(X_train, y_train)

# Compare their performance
models = {'LinearSVC': linear_svc, 'SVC': svc, 'SGDClassifier': sgd_clf}
for name, model in models.items():
    y_pred = model.predict(X_test)
    print(f'{name} accuracy: {accuracy_score(y_test, y_pred):.2f}')


LinearSVC accuracy: 0.80
SVC accuracy: 0.80
SGDClassifier accuracy: 0.75




In [1]:
from sklearn.datasets import fetch_openml
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the MNIST dataset
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist['data'], mnist['target']

# Since SVMs are binary classifiers, we use one-versus-rest by default in SVC for multiclass classification

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/7, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Use a small validation set for hyperparameter tuning
X_val_scaled, y_val = X_train_scaled[:10000], y_train[:10000]
X_train_small, y_train_small = X_train_scaled[10000:], y_train[10000:]

# Hyperparameter tuning using GridSearchCV
param_grid = [
    {'C': [1, 10], 'gamma': [0.001, 0.01]},
]
svm_clf = SVC()
grid_search = GridSearchCV(svm_clf, param_grid, cv=3)
grid_search.fit(X_val_scaled, y_val)

# Train the model with the best parameters found
svm_clf = grid_search.best_estimator_
svm_clf.fit(X_train_small, y_train_small)

# Predict and evaluate the model
y_pred = svm_clf.predict(X_test_scaled)
print(f'SVM accuracy on MNIST: {accuracy_score(y_test, y_pred):.2f}')


  warn(


SVM accuracy on MNIST: 0.97


In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Load the California housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Hyperparameter tuning using GridSearchCV
param_grid = [
    {'kernel': ['linear'], 'C': [1, 10]},
    {'kernel': ['rbf'], 'C': [1, 10], 'gamma': [0.001, 0.01]},
]
svm_reg = SVR()
grid_search = GridSearchCV(svm_reg, param_grid, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(X_train_scaled, y_train)

# Train the model with the best parameters found
svm_reg = grid_search.best_estimator_
svm_reg.fit(X_train_scaled, y_train)

# Predict and evaluate the model
y_pred = svm_reg.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print(f'SVM regressor MSE on California housing dataset: {mse:.2f}')


SVM regressor MSE on California housing dataset: 0.42
