# Module 6: Linear Support Vector Machine

Support vector machine (SVM) is a linear _binary_ classifier.

The goal of the SVM is to find a hyper-plane that separates the training data correctly
into two subspaces while maximizing the **margin** between those two classes.

![SVM](../resources/svm.png)

The hyperplane satisfies

$$ \vec{w} \cdot \vec{x} - b = 1 $$

 * You will note, we again have a linear combination (dot-product) of weights and a data vector, adjusted by a bias.

where $ \vec{x} $ lies in feature space and $ \vec{w} $ is the normal vector of the hyperplane.

The constraint is formulated as:

$$ y_i \left(\vec{w_i} \cdot \vec{x_i} - b \right) \ge 1 $$

where $y$ is merely a mathematical convenience for writing down a unified inequality like so.
Intuitively, it reflects all the negative samples along the hyperplane  so that all data points
are bound by a single objective function. $y=1$ for all **positive samples**; $y=-1$ for **negative samples**.
This implies that SVM is a **binary classifier**.

The optimization results in miminizing $ \left\| w \right\| $ subject to constraint $ y_i \left(\vec{w_i} \cdot \vec{x_i} - b \right) \ge 1 $.

The hyper-parameters for SVMs include the type of kernel and the regularization parameter C.  
For linear SVM the kernel would be linear or we can say there's no kernel.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.ERROR)

## Generate Data

We will first generate 1000 random points.
The line separating "positive" cases from negative cases is $y = x$.
We compute the class of each (x, y) point as y > x.


In [None]:
X = np.random.uniform(-5, 5, (1000, 2))
y = np.greater(X[:, 1], X[:, 0]).astype(int)

# First look
plt.figure(figsize=(6,6))
plt.scatter(X[:, 0], X[:, 1], c = y)

Then we separate points by a corridor of width $2\sqrt{2}$ by adding vector $(1, -1)$ to points in class 0,
and adding $(-1, 1)$ to points in class 1.

In [None]:
X[y==0] += np.array([1, -1])
X[y==1] += np.array([-1, 1])

Let us quickly visualize what we have created.

In [None]:
plt.figure(figsize=(6,6))
plt.scatter(X[:, 0], X[:, 1], c = y)

## Input Function

The job of the **input function** is to feed data into SVM.
It is common to all contributed Estimator classes in TensorFlow.
In case of SVM we need to provide a column that specifies the ID for each training data.
In our trivial examples (which makes it easy to visualize it) we use points on a 2D plane.
In a more complex applications you could have a much higher dimentionality data. 

In [None]:
def my_input_fn():
    columns = dict(
      # example_id is index for data records as required by the SVM,
      #   anything uniquely identify each record is fine.
      example_id = tf.constant([str(i+1) for i in range(len(X))]),
    
      # use a tf.constant to hold dataset. They need to be rank 2 tensors (in shape of matrices)
      x = tf.constant(np.reshape(X[:, 0], [len(X), 1])),
      y = tf.constant(np.reshape(X[:, 1], [len(X), 1])))
    labels = tf.constant(y)
    return columns, labels

## Training SVM

Now we create two feature columns and define the SVM classifier.

The name of the two feature columns should correspond to what's defined in the dictionary,
that `my_input_fn()` would return.

In [None]:

feature1 = tf.contrib.layers.real_valued_column('x')
feature2 = tf.contrib.layers.real_valued_column('y')

# Define a classifier object that is an SVM
svm_classifier = tf.contrib.learn.SVM(
    feature_columns=[feature1, feature2],  #specify the feature columns
    example_id_column='example_id')        #specify the label column

svm_classifier.fit(input_fn=my_input_fn, steps=30)

Now we make a quick evaluation.

In [None]:
metrics = svm_classifier.evaluate(input_fn=my_input_fn, steps=1)
print("Loss", metrics['loss'], "\nAccuracy", metrics['accuracy'])

### Predicting labels

Once SVM classifier has been trained, we can use it to predict classes for n-dimentional (2 in our case) points.

We create another function, predict_fn() that returns some data that we want the trained SVM to classify.

In [None]:
X2 = np.random.uniform(-5, 5, (60, 2))
# Color by classification results
plt.figure(figsize=(6,6))
plt.scatter(X2[:, 0], X2[:, 1])

In [None]:
def predict_fn():
    return dict(
        x = tf.constant(np.expand_dims(X2[:, 0], 1)),
        y = tf.constant(np.expand_dims(X2[:, 1], 1))
    )

# The following statement returns a generator that generates data in such structure:
#    {'logits': array([-1.58874238]), 'classes': 0}
#      ^^^ probability in log scale     ^^^ class label
y_pred = svm_classifier.predict(input_fn=predict_fn)

# just change of format: for each item i, take its i['classes']
y_pred = list(map(lambda i: i['classes'], y_pred))
print(y_pred)

In [None]:
# Color by classification results
plt.figure(figsize=(6,6))
plt.scatter(X2[:, 0], X2[:, 1], c = y_pred)

# Save your Notebook