In [None]:
# 1. What is the underlying concept of Support Vector Machines?

Underlying Concept of SVM:
Support Vector Machines (SVM) are supervised learning models used for classification and regression tasks. 
The fundamental concept is to find the hyperplane that best separates data points of different classes in the feature space. 
SVM aims to maximize the margin between the closest data points (support vectors) of different classes and the hyperplane, 
thus achieving the best generalization for the classifier.

# 2. What is the concept of a support vector?

Concept of a Support Vector:
Support vectors are the data points that are closest to the decision boundary or hyperplane. These points are crucial 
in defining the position and orientation of the hyperplane because the SVM algorithm optimizes the margin based on 
these support vectors. Any other points that do not influence the margin are ignored during training.

# 3. When using SVMs, why is it necessary to scale the inputs?

Necessity of Scaling Inputs in SVM:
Scaling the inputs is necessary when using SVMs because the algorithm is sensitive to the scale of the input features. 
If the features are on different scales, SVM may give more importance to features with larger values, leading to 
suboptimal hyperplanes. Scaling ensures that all features contribute equally to the model, improving convergence and performance.

# 4. When an SVM classifier classifies a case, can it output a confidence score? What about a percentage chance?

Confidence Score and Percentage Chance in SVM:
SVMs do not directly provide a probability or percentage chance of a case belonging to a particular class. However, 
the distance of a data point from the decision boundary can be interpreted as a confidence score. 
To get a probability or percentage chance, you can use methods like Platt scaling or the `probability=True` parameter in Scikit-Learn's SVC.

# 5. Should you train a model on a training set with millions of instances and hundreds of features using the primal or dual form of the SVM problem?

Primal vs Dual Form for Large Datasets:
For a large dataset with millions of instances and hundreds of features, it is typically better to use the primal form 
of the SVM problem, especially if the number of features is greater than the number of instances. The dual form is more 
suited for cases where the number of features is smaller than the number of instances.

# 6. Let's say you've used an RBF kernel to train an SVM classifier, but it appears to underfit the training collection. Is it better to raise or lower gamma? What about the letter C?

Adjusting gamma and C for Underfitting:
If your SVM with an RBF kernel is underfitting, you should try increasing the gamma parameter. A higher gamma value 
makes the decision boundary more complex, which can help in capturing the patterns in the data. Similarly, increasing 
the C parameter will reduce the regularization strength, allowing the model to fit the training data more closely.

# 7. To solve the soft margin linear SVM classifier problem with an off-the-shelf QP solver, how should the QP parameters (H, f, A, and b) be set?

Setting QP Parameters for Soft Margin SVM:
For a soft margin linear SVM classifier:
- H is set to a positive semi-definite matrix, which corresponds to the identity matrix with the number of features.
- f is a vector of zeros.
- A represents the constraints matrix with labels (y_i) multiplied by feature vectors (x_i).
- b is a vector of ones, representing the margin constraints.

In [2]:
# 8. On a linearly separable dataset, train a LinearSVC. Then, using the same dataset, train an SVC and an SGDClassifier. See if you can get them to make a model that is similar to yours.

# Training Different Classifiers on a Linearly Separable Dataset:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Create a linearly separable dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a LinearSVC
linear_svc = LinearSVC(random_state=42)
linear_svc.fit(X_train, y_train)
linear_svc_pred = linear_svc.predict(X_test)

# Train an SVC with a linear kernel
svc = SVC(kernel="linear", random_state=42)
svc.fit(X_train, y_train)
svc_pred = svc.predict(X_test)

# Train an SGDClassifier
sgd_clf = SGDClassifier(loss="hinge", random_state=42)
sgd_clf.fit(X_train, y_train)
sgd_pred = sgd_clf.predict(X_test)

# Evaluate the accuracy of each model
linear_svc_acc = accuracy_score(y_test, linear_svc_pred)
svc_acc = accuracy_score(y_test, svc_pred)
sgd_acc = accuracy_score(y_test, sgd_pred)

linear_svc_acc, svc_acc, sgd_acc  # All should have similar accuracy on a linearly separable dataset

(0.9, 0.925, 0.915)

In [None]:
# 9. On the MNIST dataset, train an SVM classifier. You'll need to use one-versus-the-rest to assign all 10 digits because SVM classifiers are binary classifiers. To accelerate up the process, you might want to tune the hyperparameters using small validation sets. What level of precision can you achieve?

# Training an SVM on the MNIST Dataset:

from sklearn.datasets import fetch_openml
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

# Load the MNIST dataset
mnist = fetch_openml("mnist_784", version=1)
X, y = mnist["data"], mnist["target"]

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Use a small subset for hyperparameter tuning
small_train_X, small_train_y = X_train[:5000], y_train[:5000]

# Grid search for best hyperparameters
param_grid = {'C': [1, 10, 100], 'gamma': [0.001, 0.01, 0.1]}
grid_search = GridSearchCV(SVC(kernel="rbf"), param_grid, cv=3, verbose=3)
grid_search.fit(small_train_X, small_train_y)

# Train the final model on the entire training set
best_svm = grid_search.best_estimator_
best_svm.fit(X_train, y_train)

# Evaluate the model on the test set
mnist_accuracy = best_svm.score(X_test, y_test)
mnist_accuracy  # Expected accuracy around 98%

  warn(


Fitting 3 folds for each of 9 candidates, totalling 27 fits
[CV 1/3] END ..................C=1, gamma=0.001;, score=0.109 total time=  25.1s
[CV 2/3] END ..................C=1, gamma=0.001;, score=0.109 total time=  27.3s
[CV 3/3] END ..................C=1, gamma=0.001;, score=0.109 total time=  35.0s
[CV 1/3] END ...................C=1, gamma=0.01;, score=0.109 total time=  21.7s
[CV 2/3] END ...................C=1, gamma=0.01;, score=0.109 total time=  19.2s
[CV 3/3] END ...................C=1, gamma=0.01;, score=0.109 total time=  21.0s
[CV 1/3] END ....................C=1, gamma=0.1;, score=0.109 total time=  20.5s
[CV 2/3] END ....................C=1, gamma=0.1;, score=0.109 total time=  22.0s
[CV 3/3] END ....................C=1, gamma=0.1;, score=0.109 total time=  19.6s
[CV 1/3] END .................C=10, gamma=0.001;, score=0.109 total time=  18.5s
[CV 2/3] END .................C=10, gamma=0.001;, score=0.109 total time=  18.4s
[CV 3/3] END .................C=10, gamma=0.001;,

In [3]:
# 10. On the California housing dataset, train an SVM regressor.

# Training an SVM Regressor on the California Housing Dataset:

from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pipeline to scale the data and train an SVM regressor
svm_reg = make_pipeline(StandardScaler(), SVR(kernel="rbf", C=1, gamma="scale"))
svm_reg.fit(X_train, y_train)

# Evaluate the model on the test set
housing_score = svm_reg.score(X_test, y_test)
housing_score  # Expected R^2 score around 0.8

0.7275639524733043