# TensorFlow: Linear SVM (applied to credit screening data)
*Rachel Buttry*

*April 2018*

### Suport Vector Machine (SVM)
SVM is a method used specifically for binary classification. It calculates a "boundary line" (really a hyperplane) that ideally seperates the provided data into the two possible classes. Image from [Wikipedia](https://en.wikipedia.org/wiki/Support_vector_machine)
<img src="./Svm_separating_hyperplanes_(SVG).svg" width="50%">

Below is a slightly modified version of the code from [nfcclure's tensorflow cookbook](https://github.com/nfmcclure/tensorflow_cookbook/blob/master/04_Support_Vector_Machines/02_Working_with_Linear_SVMs/02_linear_svm.ipynb).

In [1]:
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
sess = tf.Session()

Data was originally taken from [UCI Machine Learning Repo](https://archive.ics.uci.edu/ml/datasets/Credit+Approval).

It was recoded by the script ```recode_crx.py```.

In [2]:
import pandas as pd
crx = pd.read_csv("./crx-recoded.csv",delimiter=",")

ytemp = crx.pop('A16')
y_vals = (ytemp.as_matrix()).astype(float)# 0,1 for approved,rejected
x_vals = (crx.as_matrix()).astype(float)#survey data (recoded)

In [3]:
# Load data
# iris.data = [(Sepal Length, Sepal Width, Petal Length, Petal Width)]
#iris = datasets.load_iris()
#x_vals = np.array([[x[0], x[3]] for x in iris.data])
#y_vals = np.array([1 if y == 0 else -1 for y in iris.target])

#split into train/test
np.random.seed(1234)
train_indices = np.random.choice(len(x_vals),
                                 int(round(len(x_vals)*0.8)),
                                 replace=False)
test_indices = np.array(list(set(range(len(x_vals))) - set(train_indices)))
x_vals_train = x_vals[train_indices]
x_vals_test = x_vals[test_indices]
y_vals_train = y_vals[train_indices]
y_vals_test = y_vals[test_indices]

We may also apply the [kernel trick](https://en.wikipedia.org/wiki/Kernel_method) to create a more appropriate boundary. For this example though, we'll restrict ourselves to just use the linear kernal--meaning we're just going to try to seperate them  with a straight line/"flat" hyperplane. It has a loss function:
<center>
$Loss = \frac{||A||^{2}}{2} + \alpha \cdot \frac{1}{n} \sum_{n=1}^{N}max(0, 1 -( A \cdot x + b))$

</center>

Where 
* $A$ is the normal vector of the hyperplane
* $b$ is the offset of the hyperplane
* $\alpha$ is the soft margin parameter (how much we allow for missclassification)(?)

So $A$ and $b$ decribes the hyperplane that we're solving for. This loss fucntion is a sum of a quadratic loss function and [hinge loss](https://en.wikipedia.org/wiki/Hinge_loss).

In [15]:
# Create linear svm

# Declare hyperplane dimensions
numcols = np.shape(x_vals)[1]
Ashape = [numcols, 1]
bshape = [1, 1]

# Declare batch size
batch_size = len(x_vals_train)

# Initialize placeholders
#x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)# depends on shape of data
#y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)# ||

x_data = tf.placeholder(shape=[None, numcols], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)

# Create variables for SVM
#A = tf.Variable(tf.random_normal(shape=[2, 1]))# depends on shape of data
#b = tf.Variable(tf.random_normal(shape=[1, 1]))# depends on shape of data

A = tf.Variable(tf.random_normal(shape=Ashape))# depends on shape of data
b = tf.Variable(tf.random_normal(shape=bshape))# depends on shape of data

# Declare model operations
model_output = tf.subtract(tf.matmul(x_data, A), b)

# Declare vector L2 'norm' function squared
l2_norm = tf.reduce_sum(tf.square(A))

# Declare loss function
# L2 regularization parameter, alpha
alpha = tf.constant([0.01])
# Margin term in loss
classification_term = tf.reduce_mean(tf.maximum(0., tf.subtract(1., \
                            tf.multiply(model_output, y_target))))
# Put terms together
loss = tf.add(l2_norm/2, tf.multiply(alpha, classification_term))

# Declare prediction function
prediction = tf.sign(model_output)
accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction, y_target), tf.float32))

# Declare optimizer
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)


In [19]:
# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)

# Training loop
loss_vec = []
train_accuracy = []
test_accuracy = []
for i in range(500):
    rand_index = np.random.choice(len(x_vals_train), size=batch_size)
    rand_x = x_vals_train[rand_index]
    rand_y = np.transpose([y_vals_train[rand_index]])
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})

    temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
    loss_vec.append(temp_loss) #for plotting

    train_acc_temp = sess.run(accuracy, feed_dict={
        x_data: x_vals_train,
        y_target: np.transpose([y_vals_train])})
    train_accuracy.append(train_acc_temp)#for plotting

    test_acc_temp = sess.run(accuracy, feed_dict={
        x_data: x_vals_test,
        y_target: np.transpose([y_vals_test])})
    test_accuracy.append(test_acc_temp)#for plotting
    
    # for printing
    if (i + 1) % 100 == 0:
        print('Step #' + str(i+1))
        print('Loss = ' + str(temp_loss))

Step #100
Loss = [1.6352828]
Step #200
Loss = [0.19222452]
Step #300
Loss = [0.03237231]
Step #400
Loss = [0.01103274]
Step #500
Loss = [0.00874015]


In [25]:
Avalue = sess.run(A)
bvalue = sess.run(b)
print "A = ", np.transpose(Avalue)
print "b = ", bvalue
print "Accuracy:", test_accuracy[-1]

A =  [[ 0.01046679  0.03551899  0.00340231  0.00846251  0.01328621 -0.00036537
  -0.00031592 -0.00247209  0.02757664  0.01122418 -0.01167194 -0.00010599
   0.00500354  0.00042625  0.00451804 -0.00352069  0.00781336 -0.00555783
  -0.00267743  0.00717988 -0.00147304  0.00316507 -0.00456573  0.00692532
  -0.00255281]]
b =  [[2.8120515]]
Accuracy: 0.35507247


### The scikit learn way
Alright, we had some really bad accuracy for the tensorflow model, but is it an issue with the model or the dataset? Let's find out by comparing it to the scikit learn model!

Code based off of [PHYS T480: Classification2 Notebook](https://github.com/gtrichards/PHYS_T480/blob/master/Classification2.ipynb)

In [26]:
from sklearn.svm import SVC
svm = SVC(kernel='linear')
svm.fit(x_vals_train, y_vals_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [27]:
acc_sk = 0.0
for x,y in zip(x_vals_test, y_vals_test):
    guess = svm.predict(x)
    if guess == y: acc_sk += 1./len(y_vals_test)
print "Accuracy: ", acc_sk

ValueError: Expected 2D array, got 1D array instead:
array=[  0.    58.67   4.46   3.04   1.     1.     6.     0.    43.   560.
   0.     0.     1.     0.     0.     1.     0.     0.     1.     0.
   1.     0.     0.     0.     1.  ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.