## Use case
 - Similar to kNN?


## What is it?
 - **Analogy-Based Model**
    - i.e. assigned nearby points the same label
 - Similar to weighted $k$-NN, **however**, the decision boundary only depends on support vectors (key examples in the dataset)
 - Define decision boundary by a subset of +ve, -ve examples, their weights and similarity measure
   - Test example = +ve if it looks more like a +ve example than -ve
   - The similarity metric is called *kernel*
     - Popular kernel: Radial Basis Functions (RBFs)
   - Decision boundary $\sim$ a smooth version of k-NN’s decision boundary

 

## How?

### Classification

In [4]:
#| code-summary: prepare X_train, X_test, y_train, y_test
#| code-fold: true
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("data/canada_usa_cities.csv")

y, X = df.pop("country"), df
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

pd.concat([X_train, y_train], axis=1).head()

Unnamed: 0,longitude,latitude,country
160,-76.4813,44.2307,Canada
127,-81.2496,42.9837,Canada
169,-66.058,45.2788,Canada
188,-73.2533,45.3057,Canada
187,-67.9245,47.1652,Canada


In [6]:
from sklearn.svm import SVC

svm = SVC(kernel='rbf', gamma = 0.01)
svm.fit(X_train, y_train)
svm.predict(X_test)


array(['Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada',
       'Canada', 'Canada', 'Canada', 'USA', 'USA', 'Canada', 'Canada',
       'Canada', 'Canada', 'USA', 'Canada', 'USA', 'Canada', 'Canada',
       'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada',
       'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'USA', 'Canada',
       'Canada', 'Canada', 'Canada', 'Canada', 'USA', 'USA', 'Canada',
       'Canada', 'Canada'], dtype=object)

In [7]:
svm.score(X_test, y_test) # accuracy

0.8333333333333334

In [None]:
## FIXME: visualization: Plot_support_vectors

### Regression

In [8]:
#| code-summary: prepare X_train, X_test, y_train, y_test
#| code-fold: true
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("data/quiz2-grade-toy-regression.csv")
df = df[['lab1', 'lab2', 'lab3', 'lab4', 'quiz1', 'quiz2']]

y, X = df.pop("quiz2"), df
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

pd.concat([X_train, y_train], axis=1).head()

Unnamed: 0,lab1,lab2,lab3,lab4,quiz1,quiz2
4,77,83,90,92,85,90
0,92,93,84,91,92,90
2,78,85,83,80,80,82
5,70,73,68,74,71,75
6,80,88,89,88,91,91


In [12]:
from sklearn.svm import SVR

svm = SVR(kernel='rbf', gamma = 0.05)
svm.fit(X_train, y_train)
svm.predict(X_test)

array([89.39814152, 89.40557499])

In [13]:
svm.score(X_test, y_test) # R^2 (it can be -ve, which is worse than DummyRegressor)

-0.12096790646548117

## Hyperparameters
 - `gamma`
   - Control the complexity
   - larger $\rightarrow$ more complex
   - smaller $\rightarrow$ less complex
 - `C`
   - larger $\rightarrow$ more complex
   - smaller $\rightarrow$ less complex
 - Default: Features are equally important
   - Which hyperparameter controls the weighting of feature? 


## Pros 
 - Time and space complexity are better than $k$-NN
   - No significant difference for small dataset
   - But huge speed and memory difference for large dataset
 - Usually more accurate than $k$-NN



## Cons


## Remarks

 - `svm.support_` gives the indices of support vectors
 - To optimize the two hyperparameters `gamma` and `C`,
   - `sklearn.model_selection.GridSearchCV`
   - `sklearn.model_selection.RandomizedSearchCV`


### Curse of dimensionality
 - If there are too many irrelevant features, the models might get confused.
   - as the accidental similarity swamps out meaning similarity
   - Might become random guessing $\rightarrow$ like dummy classifier


### ?SVC

In [14]:
?SVC

[0;31mInit signature:[0m
[0mSVC[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mC[0m[0;34m=[0m[0;36m1.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkernel[0m[0;34m=[0m[0;34m'rbf'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdegree[0m[0;34m=[0m[0;36m3[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgamma[0m[0;34m=[0m[0;34m'scale'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcoef0[0m[0;34m=[0m[0;36m0.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mshrinking[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprobability[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtol[0m[0;34m=[0m[0;36m0.001[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcache_size[0m[0;34m=[0m[0;36m200[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mclass_weight[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[

### ?SVR

In [15]:
?SVR

[0;31mInit signature:[0m
[0mSVR[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkernel[0m[0;34m=[0m[0;34m'rbf'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdegree[0m[0;34m=[0m[0;36m3[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgamma[0m[0;34m=[0m[0;34m'scale'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcoef0[0m[0;34m=[0m[0;36m0.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtol[0m[0;34m=[0m[0;36m0.001[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mC[0m[0;34m=[0m[0;36m1.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mepsilon[0m[0;34m=[0m[0;36m0.1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mshrinking[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcache_size[0m[0;34m=[0m[0;36m200[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmax_iter[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m[0;34m[0