### 1. What is the fundamental idea behind Support Vector Machines?
The fundamental idea behind **Support Vector Machines (SVMs)** is to find the **optimal hyperplane** that best separates data points of different classes in a dataset. For linearly separable data, this hyperplane maximizes the **margin**, which is the distance between the hyperplane and the nearest data points of each class (called support vectors). For non-linearly separable data, SVM uses **kernel functions** to map the data into a higher-dimensional space where it becomes linearly separable.



### 2. What is a support vector?
A **support vector** is a data point that lies closest to the decision boundary (or hyperplane). These points are critical in defining the position and orientation of the hyperplane because the margin is maximized based on their distance. Removing a support vector would change the decision boundary, which is why they are called "support" vectors.



### 3. Why is it important to scale the inputs when using SVMs?
It is important to **scale the inputs** (normalize or standardize) when using SVMs because SVMs are sensitive to the scale of the input features. If one feature has a much larger range than others, the SVM will prioritize that feature more, potentially distorting the hyperplane. By scaling the data, you ensure that all features contribute equally to the decision boundary.



### 4. Can an SVM classifier output a confidence score when it classifies an instance? What about a probability?
- **Confidence Score**: SVM classifiers can output a **decision function**, which provides a confidence score based on the distance from the hyperplane. However, it is not a probability.
- **Probability**: SVMs do not natively provide probabilities, but by using **Platt scaling** (enabled in Scikit-Learn by setting `probability=True`), you can calibrate the SVM’s outputs to provide probability estimates.



### 5. Should you use the primal or the dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?
For datasets with **millions of instances and hundreds of features**, you should use the **primal form** of the SVM problem. The **dual form** is typically used when the number of features is larger than the number of training instances, and it becomes computationally expensive for very large datasets. **LinearSVC** in Scikit-Learn solves the primal problem efficiently for large datasets.



### 6. Say you’ve trained an SVM classifier with an RBF kernel, but it seems to underfit the training set. Should you increase or decrease γ (gamma)? What about C?
- **Increase γ (gamma)**: Increasing gamma makes the model more sensitive to individual data points, meaning the decision boundary becomes more flexible, which could help in reducing underfitting.
- **Increase C**: Increasing C reduces the regularization, allowing the model to fit the training data better by penalizing misclassifications less. This too can help reduce underfitting.



### 7. How should you set the QP parameters (H, f, A, and b) to solve the soft margin linear SVM classifier problem using an off-the-shelf QP solver?
To solve the soft-margin linear SVM problem using a Quadratic Programming (QP) solver, you would set the parameters as follows:
- **H**: This is the Hessian matrix that corresponds to the quadratic term in the objective function, representing the dot products of the features.
- **f**: This is the vector corresponding to the linear term in the objective function, typically set to `-1` for soft-margin SVMs.
- **A**: This represents the constraints, typically set to the labels `y` (as `y_i * (w^T x_i + b) >= 1 - ξ_i`).
- **b**: This represents the right-hand side of the inequality constraints, typically set to `0` in the soft-margin SVM formulation.



### 8. Train a LinearSVC on a linearly separable dataset. Then train an SVC and an SGDClassifier on the same dataset. See if you can get them to produce roughly the same model.

In [1]:
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Create a dataset
X, y = make_blobs(n_samples=1000, centers=2, random_state=42)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a LinearSVC model
linear_svc = LinearSVC()
linear_svc.fit(X_train, y_train)

# Train an SVC with linear kernel
svc = SVC(kernel='linear')
svc.fit(X_train, y_train)

# Train an SGDClassifier
sgd = SGDClassifier()
sgd.fit(X_train, y_train)

# Predict
y_pred_linear_svc = linear_svc.predict(X_test)
y_pred_svc = svc.predict(X_test)
y_pred_sgd = sgd.predict(X_test)

# Calculate accuracy
accuracy_linear_svc = accuracy_score(y_test, y_pred_linear_svc)
accuracy_svc = accuracy_score(y_test, y_pred_svc)
accuracy_sgd = accuracy_score(y_test, y_pred_sgd)

print(f'LinearSVC accuracy: {accuracy_linear_svc}')
print(f'SVC accuracy: {accuracy_svc}')
print(f'SGDClassifier accuracy: {accuracy_sgd}')


LinearSVC accuracy: 1.0
SVC accuracy: 1.0
SGDClassifier accuracy: 1.0


### 9. Train an SVM classifier on the MNIST dataset.

Since SVM classifiers are binary classifiers, you will need to use one-versus-all (OvA) strategy to classify all 10 digits (0 to 9). You can train 10 binary classifiers, one for each digit, and then select the class with the highest decision score.

In [2]:
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load the MNISt dataset
mnist = datasets.fetch_openml('mnist_784', version=1)
X, y = mnist['data'], mnist['target']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the dataset
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an SVC model
svc_clf = SVC(kernel='rbf', gamma='scale', C=1)
svc_clf.fit(X_train_scaled, y_train)

# Predict
y_pred = svc_clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy on MNIST dataset: {accuracy}')

Accuracy on MNIST dataset: 0.963


### 10. Train an SVM regressor on the California housing dataset.

In [6]:
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split


# Load the California housing dataset
housing = fetch_california_housing()
X, y = housing['data'], housing['target']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the dataset
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an SVR model
svr = SVR(kernel='rbf', gamma='scale', C=1)
svr.fit(X_train_scaled, y_train)

# Predict
y_pred = svr.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean squared error on California housing dataset: {mse}')

Mean squared error on California housing dataset: 0.3570026426754465
