### 1. What is the underlying concept of Support Vector Machines?

Used for both classification and regression tasks, the primary idea behind SVM is to find the optimal hyperplane that best separates different classes of data in a high-dimensional space. SVM creates a decision boundary that best seperates data points from different classes with the widest possible margin. 

This results in a robust and efficient classifier, particularly useful in cases with high-dimensional data or non-linearly separable classes. SVM has been widely used in various fields, including image classification, text categorization, and bioinformatics, among others.

-------------

### 2. What is the concept of a support vector?

The concept of a support vector is a fundamental idea in SVM. In SVM, a support vector refers to the data points from the training dataset that are closest to the decision boundary (hyperplane). These are the data points that lie on or within the margin, which is the region between the hyperplane and the closest data points of each class.

------------

### 3. When using SVMs, why is it necessary to scale the inputs?

SVM aims to find the hyperplane that best separates the data points of different classes. The position and orientation of the decision boundary are influenced by the scale of the input features.
If the features have different scales, some dimensions may dominate the decision-making process, while others may be largely ignored. Scaling ensures that all features contribute more equally to the decision boundary.

--------

### 4. When an SVM classifier classifies a case, can it output a confidence score? What about a percentage chance?


Yes, it can.

By default, SVMs are not direct probability estimators like logistic regression, and they do not provide a straightforward probability score representing the confidence of a particular classification. Instead, SVMs are primarily designed for binary classification, and their decision is based on the sign of the output value from the decision function.

------------

### 5. Should you train a model on a training set with millions of instances and hundreds of features using the primal or dual form of the SVM problem?

If you have a dataset with millions of instances and hundreds of features, and the number of instances is significantly larger than the number of features, you might want to consider using the primal form of the SVM. On the other hand, if the number of features is much larger than the number of instances, the dual form could be a better choice, especially if you plan to leverage kernel methods for non-linear classification. As always, it's a good practice to experiment with both approaches and measure their performance and computational efficiency on your specific dataset.

---------

### 6. Let&#39;s say you&#39;ve used an RBF kernel to train an SVM classifier, but it appears to underfit the training collection. Is it better to raise or lower (gamma)? What about the letter C?



When an SVM classifier with an RBF kernel appears to underfit the training data, you need to adjust the hyperparameters gamma and C to improve its performance.

The gamma parameter determines the influence of a single training example. It controls the shape of the decision boundary.
- Higher gamma values make the decision boundary more complex and can lead to overfitting, where the model memorizes the training data but performs poorly on unseen data.
- Lower gamma values make the decision boundary smoother and can lead to underfitting, where the model is too simple to capture the underlying patterns in the data.

Therefore to address underfitting - we need to increase gamma value.

The C parameter is the regularization parameter in SVM, which controls the trade-off between maximizing the margin (finding the widest possible margin) and minimizing the classification error on the training data.

- A larger C value allows the model to have fewer margin violations (misclassifications) in the training data, which can lead to overfitting.
- A smaller C value allows the model to have a larger margin and allows more margin violations, which can lead to underfitting.

To address underfitting you need to increase the C value.

---------

### 7. To solve the soft margin linear SVM classifier problem with an off-the-shelf QP solver, how should the QP parameters (H, f, A, and b) be set?



To solve the soft margin linear SVM classifier problem using an off-the-shelf Quadratic Programming (QP) solver, you need to formulate the QP problem in the standard form and set the corresponding parameters (H, f, A, and b).

Using the formulation 
(1/2) * α^T * H * α - f^T * α
we set the parameters accordingly and then use an off-the-shelf QP solver to find the optimal values of α. From there, you can recover the optimal values of w and b to obtain the linear SVM classifier.

For e.g. - An off-the-shelf QP solver called "Quadprod" is a Python library. 
In the Quadprog library, you can set the QP parameters using the function quadprog.solve_qp, which allows you to solve a quadratic programming problem of the form. 

-------------

### 8. On a linearly separable dataset, train a LinearSVC. Then, using the same dataset, train an SVC and an SGDClassifier. See if you can get them to make a model that is similar to yours.

In [None]:
# import dataset and train LinearSVC

from sklearn.svm import LinearSVC
linear_svc_model = LinearSVC()
linear_svc_model.fit(X_train, y_train)

#Train SVC. Since data is linearly separable, set kernel to linear.

from sklearn.svm import SVC
svc_model = SVC(kernel='linear')
svc_model.fit(X_train, y_train)

#The SGD classifier is a stochastic classifier. Set loss parameter to hinge to make it behave like SVM.
from sklearn.linear_model import SGDClassifier
sgd_model = SGDClassifier(loss='hinge')
sgd_model.fit(X_train, y_train)

#Model comparison
linear_svc_accuracy = linear_svc_model.score(X_test, y_test)
svc_accuracy = svc_model.score(X_test, y_test)
sgd_accuracy = sgd_model.score(X_test, y_test)

#To evaluate the similarity between the models, you can compare their
#accuracy on a validation set or test set. 
#Use the score method to get the accuracy.

print("LinearSVC Accuracy:", linear_svc_accuracy)
print("SVC Accuracy:", svc_accuracy)
print("SGDClassifier Accuracy:", sgd_accuracy)

---------



### 9. On the MNIST dataset, train an SVM classifier. You&#39;ll need to use one-versus-the-rest to assign all 10 digits because SVM classifiers are binary classifiers. To accelerate up the process, you might want to tune the hyperparameters using small validation sets. What level of precision can you achieve?

### Running this code below will take forever as the MNIST data is huge to dl.

In [None]:
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the MNIST dataset
mnist = fetch_openml('mnist_784')
X, y = mnist.data, mnist.target.astype(int)

# Scale the features to a range between 0 and 1
X = X / 255.0

# Split the data into training, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Define the SVM classifier with a linear kernel
svm_classifier = SVC(kernel='linear', decision_function_shape='ovr', random_state=42)

# Define a range of C values to tune the hyperparameter
C_values = [0.1, 1, 10]

best_accuracy = 0
best_C = None

# Hyperparameter tuning using validation set
for C in C_values:
    svm_classifier.C = C
    svm_classifier.fit(X_train, y_train)
    y_val_pred = svm_classifier.predict(X_val)
    accuracy = accuracy_score(y_val, y_val_pred)
    
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_C = C

# Train the SVM classifier with the best hyperparameter on the full training set
svm_classifier.C = best_C
svm_classifier.fit(X_train, y_train)

# Evaluate the model on the test set
y_test_pred = svm_classifier.predict(X_test)
test_accuracy = accuracy_score(y_test, y_test_pred)

print(f"Best C value: {best_C}")
print(f"Validation Accuracy: {best_accuracy:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

----------

### 10. On the California housing dataset, train an SVM regressor.

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the California housing dataset and perform any preprocessing such as scaling or handling missing values.
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

from sklearn.svm import SVR

# Create an SVM regressor (you can adjust kernel and other hyperparameters)
svm_regressor = SVR(kernel='linear', C=1.0)

# Train the SVM regressor on the scaled training data
svm_regressor.fit(X_train_scaled, y_train)

from sklearn.metrics import mean_squared_error, r2_score

# Predict on the test set
y_pred = svm_regressor.predict(X_test_scaled)

# Calculate mean squared error and R-squared score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared Score:", r2)


Mean Squared Error: 0.5792049946323379
R-squared Score: 0.5579967748619474
