# Support Vector Machines

As a binary classifier, the perceptron cannot be used to effectively classify linearly inseparable feature representations. We encountered a similar problem to this in our discussion of multiple linear regression  we examined a dataset in which the response variable was not linearly related to the explanatory variables. To improve the accuracy of the model, we introduced a special case of multiple linear regression called polynomial regression. We created synthetic combinations of features, and were able to model a linear relationship between the response variable and the features in the higher-dimensional feature space.

While this method of increasing the dimensions of the feature space may seem like a promising technique to use when approximating nonlinear functions with linear models, it suffers from two related problems. The first is a computational problem; computing the mapped features and working with larger vectors requires more computing power. The second problem pertains to generalization; increasing the dimensions of the feature representation introduces the curse of dimensionality. Learning from high-dimensional feature representations requires exponentially
more training data to avoid overfitting.

## Kernel Trick to Map Data to High Dimensional space

We can assume Kernel as a similarity function which **implicitly** maps the data to an infinite dimension plane and then find the support vectors in that plane and map the vectors back to the original dimension.

The following image shows how adding a third dimension can make the data linearly 'plane' separable.

![example image](http://www.eric-kim.net/eric-kim-net/posts/1/imgs/data_2d_to_3d.png)

The most commonly used kernel is [_**Radial Bias Function**_](https://en.wikipedia.org/wiki/Radial_basis_function)

For complete study of SVM, refer to the blogs.

### Classifying Handwritten digits


In [None]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pandas as pd
import numpy as np

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

# These are the images# These 
# There are 70,000 images (28 by 28 images for a dimensionality of 784)
print(mnist.data.shape)

# These are the labels
print(mnist.target.shape)

In [None]:
ds = pd.DataFrame(mnist.data)
ds[784] = mnist.target
ds.head()

In [None]:
from sklearn.utils import shuffle
ds = shuffle(ds)
ds.head()

In [None]:
ds = pd.DataFrame(mnist.data)
ds[784] = mnist.target
ds.head()

In [None]:
data = ds.values[6000:18000]
print(data.shape)

In [None]:
X = data[:,:-1]
y = data[:,-1]
print(X.shape,y.shape)

In [None]:
plt.imshow(X[0].reshape(28,-1))

In [None]:
# from  sklearn.pipelinesklearn  import Pipeline
from sklearn.svm import SVC
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split
# from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report

X = scale(X)
X_train, X_test, y_train, y_test = train_test_split(X, y)

clf = SVC(kernel='rbf', gamma=0.01, C=100)


clf.fit(X_train, y_train)

In [None]:
prediction = clf.predict(X_test)

print(classification_report(y_test, prediction))

In [None]:
clf.score(X_test,y_test)