#### Support Vector Machines

Hyperplane - In a $p$ dimensional space, a hyperplane is a flat affine subspace of dimension $p-1$.

equation of hyperplane $ \beta_0 + \beta_1X_1 + \beta_2X_2 = 0 $

$X_1,X_2$ are vectors of in 2 dimensional space. 

##### Maximum Marginal Classifier 
>is separating hyperplane for which the margin is largest - that is, the hyperplane that has the farthest minimum distance to the training observations. 

requires classes to be separated by a linear boundary, also called optimal separating hyperplane.

    1. Maximize the Margin between the support vectors
    
$\text{maximize M } \beta_0,\beta_1,.....,\beta_p 
\text{ subject to }\sum^{p}_{j=1}\beta_j^2 = 1$

The gap in the maximum margin hyperplane is seen as a measure of our confidence that the observation was correctly classified. 

**Negatives**
In the case where the observations are not separable and a new test observation added may reduce the maximal margin which may not generalize well.This approach could be very sensitive and may overfit the training data. 

##### Support Vector Classifer
also called soft-margin classifier
>an extension of the above which fits in broad range of cases. 
    
    Greater robustness to individual observation
    Better classification of most of the training observations.
    
Rather than seeking the largest possible margin to make sure every observations is not only in the correct side of the hyperplan but also on the correct side of the margin, we allow few observatios to be incorrectly classified or even be on the wrong side of the hyperplane. 

$$ \text{maximize M }\beta_0,\beta_1,....,\beta_p $$
$$ \text{subject to } \sum_{j=1}^{p} \beta_j^2 = 1$$
$$ y_i(\beta_0+\beta_1x_i1+\beta_2x_i2+....+\beta_px_ip) \ge M(1-\epsilon_i)$$
$$ \epsilon_i \ge 0, \sum_{i=1}^n\epsilon_i \le C $$

$\epsilon$ is a slack variable that allow individual observations to be on the wrong side of the hyperplane or margin. 
when $\epsilon$ = 0 then observations is located on the correct side of the hyperplan or margin. <br>
when $\epsilon$ > 0 then the ith observation is located on the wrong side of the margin. <br>
when $\epsilon$ > 1 then the ith observation is located on the wrong side of the hyperplane<br>

$C$ bounds the sum of the $\epsilon_i$ and determines the severity of the violation. 

If $C$ = 0 then no budget for violations and the equation becomes equal to a maximum marginal classifier. 

Observations that lie on the margin or the wrong side of margin are called support vectors. 

**negatives** 
works well with 2 classes and boundary between the 2 classes are linear. if non-linear this approach does not work. 

##### Support Vector Machines

>extension of support vector classifer to accommodate non-linear classes boundaries. 

The idea is to enlarge the feature space from support vector classifer using kernals. 

In [1]:
import numpy as np 
from sklearn import datasets 
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

In [2]:
iris = datasets.load_iris()

In [4]:
iris.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [5]:
iris['data'].shape

(150, 4)

In [6]:
iris['feature_names']

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [19]:
x = iris['data'][:,(2,3)]
y = (iris['target'] == 2).astype(np.float64)

In [20]:
svm_clsr = Pipeline([('scalar',StandardScaler()),
                     ('linear_svc',LinearSVC(C=1,loss='hinge'))
                    ])

In [21]:
svm_clsr.fit(x,y)

Pipeline(memory=None,
         steps=[('scalar',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('linear_svc',
                 LinearSVC(C=1, class_weight=None, dual=True,
                           fit_intercept=True, intercept_scaling=1,
                           loss='hinge', max_iter=1000, multi_class='ovr',
                           penalty='l2', random_state=None, tol=0.0001,
                           verbose=0))],
         verbose=False)

In [43]:
svm_clsr.predict([[4.25,1]])

array([0.])

In [41]:
iris.data[:10,(2,3)]

array([[1.4, 0.2],
       [1.4, 0.2],
       [1.3, 0.2],
       [1.5, 0.2],
       [1.4, 0.2],
       [1.7, 0.4],
       [1.4, 0.3],
       [1.5, 0.2],
       [1.4, 0.2],
       [1.5, 0.1]])

Linear SVC regularizes the bias term and one should always center the training set first by subtracting the mean. Standard Scalar does this already

In [46]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures


In [47]:
poly_svm_clf = Pipeline([
    ('poly',PolynomialFeatures(degree=3)),
    ('scaler',StandardScaler()),
    ('svm_clf',LinearSVC(C=10,loss='hinge'))
])

In [48]:
poly_svm_clf.fit(x,y)

Pipeline(memory=None,
         steps=[('poly',
                 PolynomialFeatures(degree=3, include_bias=True,
                                    interaction_only=False, order='C')),
                ('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('svm_clf',
                 LinearSVC(C=10, class_weight=None, dual=True,
                           fit_intercept=True, intercept_scaling=1,
                           loss='hinge', max_iter=1000, multi_class='ovr',
                           penalty='l2', random_state=None, tol=0.0001,
                           verbose=0))],
         verbose=False)

In [59]:
poly_svm_clf.predict([[2.9,3.5]])

array([1.])

In [62]:
a = [1,2,3,4,5,6]

In [63]:
b = a

In [64]:
b

[1, 2, 3, 4, 5, 6]

In [65]:
b[0] = 10

In [66]:
a

[10, 2, 3, 4, 5, 6]

In [67]:
b = a.copy()

In [69]:
hex(id(b))

'0x1b4acc912c8'

In [70]:
hex(id(a))

'0x1b4acab93c8'

In [71]:
c = [1,2,3,4]

In [72]:
d = c

In [74]:
hex(id(c) == hex(id(d))

SyntaxError: unexpected EOF while parsing (<ipython-input-74-5e7316a8ae3d>, line 1)