# Assignment 20 Solutions

##### 1. What is the underlying concept of Support Vector Machines ?

The underlying concept of Support Vector Machines is to find an optimal hyperplane that separates data points belonging to different classes with the largest possible margin.

##### 2. What is the concept of a support vector ?

A support vector is a data point that lies closest to the decision boundary (hyperplane) in SVM and plays a crucial role in determining the position and orientation of the decision boundary.

##### 3. When using SVMs, why is it necessary to scale the inputs ?

Scaling the inputs is necessary when using SVMs to ensure that features with larger scales do not dominate the optimization process, leading to biased results.

##### 4. When an SVM classifier classifies a case, can it output a confidence score? What about a percentage chance ? 

Yes, an SVM classifier can output a confidence score or a margin distance, but it does not directly provide a percentage chance or probability estimate.

##### 5. Should you train a model on a training set with millions of instances and hundreds of features using the primal or dual form of the SVM problem ?

When training a model on a large dataset with millions of instances and hundreds of features, it is generally more efficient to use the dual form of the SVM problem.

##### 6. Let's say you've used an RBF kernel to train an SVM classifier, but it appears to underfit the training collection. Is it better to raise or lower (gamma)? What about the letter C ?

If an SVM classifier with an RBF kernel underfits the training collection, it is better to increase the value of gamma to make the decision boundary more flexible. As for the letter C, reducing its value can help reduce overfitting.

##### 7. To solve the soft margin linear SVM classifier problem with an off-the-shelf QP solver, how should the QP parameters (H, f, A, and b) be set ?

The QP parameters (H, f, A, and b) for solving the soft margin linear SVM classifier problem with an off-the-shelf QP solver should be set according to the specific formulation of the SVM problem and the constraints imposed by the dataset.

##### 8. On a linearly separable dataset, train a LinearSVC. Then, using the same dataset, train an SVC and an SGDClassifier. See if you can get them to make a model that is similar to yours ?

In [None]:
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a linearly separable dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a LinearSVC
linear_svc = LinearSVC()
linear_svc.fit(X_train, y_train)

# Train an SVC
svc = SVC()
svc.fit(X_train, y_train)

# Train an SGDClassifier
sgd = SGDClassifier(loss='hinge')
sgd.fit(X_train, y_train)

# Make predictions on the test set
linear_svc_pred = linear_svc.predict(X_test)
svc_pred = svc.predict(X_test)
sgd_pred = sgd.predict(X_test)

# Calculate accuracies
linear_svc_accuracy = accuracy_score(y_test, linear_svc_pred)
svc_accuracy = accuracy_score(y_test, svc_pred)
sgd_accuracy = accuracy_score(y_test, sgd_pred)

# Print accuracies
print("LinearSVC accuracy:", linear_svc_accuracy)
print("SVC accuracy:", svc_accuracy)
print("SGDClassifier accuracy:", sgd_accuracy)


##### 9. On the MNIST dataset, train an SVM classifier. You'll need to use one-versus-the-rest to assign all 10 digits because SVM classifiers are binary classifiers. To accelerate up the process, you might want to tune the hyperparameters using small validation sets. What level of precision can you achieve ?

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the MNIST dataset
digits = datasets.load_digits()
X = digits.data
y = digits.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier with one-versus-the-rest strategy
svm_classifier = SVC(kernel='rbf', decision_function_shape='ovr')
svm_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

# Calculate precision (accuracy) on the test set
precision = accuracy_score(y_test, y_pred)

# Print precision
print("Precision:", precision)


##### 10. On the California housing dataset, train an SVM regressor ?

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the California housing dataset
data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM regressor
svm_regressor = SVR()
svm_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = svm_regressor.predict(X_test)

# Calculate mean squared error (MSE) on the test set
mse = mean_squared_error(y_test, y_pred)

# Print MSE
print("Mean Squared Error:", mse)
