# Support Vector Machines

### Boğaziçi AI

by Güney Işık Tombak

###1) Interactive Demo 

Go to the website below and playing with the interactive SVM answer the questions:

[SVM Demo](https://jgreitemann.github.io/svm-demo)

1.   Sort from simplest to the most complex: Radial Basis Function, Linear, Quadratic. 

Bonus: Mathematically prove. *Hint: Derivation of exponential using limit.*
2.   Which one is more stable? Higher or lower $\nu$? Why? What is the relation between $\nu$ and number of support vectors? What should be the value of $\nu$ for hard-margin SVM?
3.   What do you observe when you change the $\gamma$ term using Radial Basis Function? Which one is more stable? Higher or lower $\gamma$? Why? Again, explain with your observations.

Your answer here

###2) C Value

The mathematical formulation of a Soft-Margin SVM is:

$$min_{\mathbf{w},c,K} \frac{1}{2}\| w\|^2 + K \sum_{i=1}^{N} \epsilon^{(i)}$$

$$\text{subject to } y^{(i)}(\mathbf{w} \cdot \mathbf{x}^{(i)} - c) \geq 1 - \epsilon^{(i)} \text{ and } \epsilon^{(i)} \geq 0$$ 

Realize that the first term of the equation is the regularization term. This means that the errors are punished by the second term. 

Run the code below. Regarding the results, explain the relation between $\nu$, $K$, and `C` (e.g. when A goes up, also B goes up). Then, do the same for at least 5 different datasets and 
comment on the new results.

*Note: Original Decision Function Plot from [Python Data Science Handbook](https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.07-Support-Vector-Machines.ipynb) with MIT license.*

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets.samples_generator import make_blobs
from sklearn.utils import check_random_state as sk_rand_seed

random_state = 42
np.random.seed(random_state)
sk_rand_seed(random_state)

def plot_svc_decision_function(model, ax=None, plot_support=True):
    """Plot the decision function for a 2D SVC"""
    if ax is None:
        ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    
    # create grid to evaluate model
    x = np.linspace(xlim[0], xlim[1], 30)
    y = np.linspace(ylim[0], ylim[1], 30)
    Y, X = np.meshgrid(y, x)
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    P = model.decision_function(xy).reshape(X.shape)
    
    # plot decision boundary and margins
    ax.contour(X, Y, P, colors='k',
               levels=[-1, 0, 1], alpha=0.5,
               linestyles=['--', '-', '--'])
    
    # plot support vectors
    if plot_support:
        ax.scatter(model.support_vectors_[:, 0],
                   model.support_vectors_[:, 1],
                   s=300, linewidth=1, facecolors='none');
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)

In [None]:
X, y = make_blobs(n_samples=100, centers=2,
                  random_state=0, cluster_std=0.60)

C_list = [1, 0.1, 0.01, 0.001]

for C in C_list:

    model = SVC(kernel='linear', C=C, probability=True)
    model.fit(X, y)

    plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='jet')
    plot_svc_decision_function(model);

    plt.title(f'C={C}')
    plt.show()
    

Your answer here

###3) Multiple Classifiers by Dichotomous Ones

When you have more than two classes, you can use various techniques to use SVMs. One basic one is using one dichotomous (two-class classifier) SVM for each class.

For this exercise, first check how Scikit Learn team used SVM classifiers to classify Iris Dataset [[1]](https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html).

Then, rather than using SVM as a multiclass classifier, create a dataset for each class and train each dichotomous SVM with probability. You can achieve this by  

```
clf = svm.SVC(kernel='linear', C=C, probability=True)
clf.fit(X,y)
clf.predict_proba(X)
```

Using the probability results from all three, determine the class. You should implement this as a custom sklearn estimator. You can find the details for custom sklearn estimators [[2]](https://scikit-learn.org/stable/developers/develop.html).

Use `kernel` types `linear`, `rbf`, quadratic, and cubic (i.e. `poly` `degree`=2 and 3) with `C` values `[1, 0.1, 0.01, 0.001]`.

Plot each result using the visualization given in [[1]](https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html) into a 4-by-4 grid and give the f1 score (`sklearn.metrics.f1_score` ) of each algorithm on the title along with C value and kernel type.

Similar to Scikit Learn, use only Sepal Width and Sepal Length for the easiness of visualization. Comment on your results in the end.

In [None]:
# Your answer here
# The one below is an unfinished class
# It might have some errors so please use with caution!
# You can use it as a starting point or directly start from scratch

import numpy as np
import sklearn as skl
from sklearn.svm import SVC

class dichSVC(skl.base.BaseEstimator, skl.base.TransformerMixin):

    def __init__(self, svc_params_dict,  n_classes=2):
        
        svc_params_dict['probability'] = True

        if n_classes == 2:
            self.n_classifiers = 1
            self.base_estimator_list = [SVC(**svc_params_dict)]
        elif n_classes > 2:
            self.n_classifiers = n_classes
            self.base_estimator_list = list()
            for _ in range(n_classes):
                self.base_estimator_list.append(SVC(**svc_params_dict))    

    def fit(self, samples, targets):
        
        for class_id in range(self.n_classifiers):
            dich_targets = self.dichotomize_targets(targets, class_id)
            self.base_estimator_list[class_id].fit(samples, dich_targets)

        return None

    def dichotomize_targets(self, targets, class_id):
        pass

    def predict_proba(self, samples):

        n_samples = samples.shape[0]
        probs = np.zeros((self.n_classifiers, n_samples))

        for class_id in range(self.n_classifiers):
            prob_i = self.base_estimator_list[class_id].predict_proba(samples)
            probs[class_id, :] = prob_i[:,0] # or might be 1, check!

        return probs
        
    def predict(self, samples):
        # You might want to use self.predict_proba here
        pass 
    
    def fit_predict(self, samples, targets):
        
        self.fit(samples, targets)
        preds = self.predict(samples)
        
        return preds



In [None]:
# Your answer here

Your answer here