# Support Vector Machines for classification
## Theory

Support vector machines may be applied for both regression and classifiaction, but here we will focus on the latter.

For the simplest example, take a two-imensional plane, spanned by two features of a concept you wish to study. In this plane we have a few sample points which, We wish to now find a linear curve which dissects the data populating this plane into two categories. If this is possible the set is linearly separable, and for the rest of the discussion let us assume our set is so. If it is not, a trick we will introduce at a later point which introduces a function called a Kernel, will transform the input features into more dimensions and if the set is linearly sepearable in this new space, the hyperplane identified to perform this task can be projected onto the 2D plane and yield a boundary. 

For the simplest case in 2D, we have a hyperplane manifesting as a straight line intersecting the feature plane. The normal of the plane is given by the vector $\mathbf{w}$, a point in the plane has a position $\mathbf{x}$ relative to the origin. Points in one class are labeled by +1, and the other is labeled with -1.
We now insist that the dot product of these vectors plus some constant $b$ should ble larger than one for a point on the side of the plane belonging to the +1 class, while the sum should be less than -1 for points in the -1 class.  These sets of equations, are describing the situation:

\begin{equation}
\mathbf{w} \cdot \mathbf{x} + b \geq 1,
\end{equation}

\begin{equation}
\mathbf{w} \cdot \mathbf{x} + b \leq -1.
\end{equation}

And since the labels $y_i$ is $\pm 1$ depending on the location of the point in relation to the dividing line we can simplify both contstraints to:

\begin{equation}
y_i (\mathbf{x}_i \cdot \mathbf{w} + b) - 1 = 0.
\end{equation}

We can then use the difference between two support vectors defining each edge of the margin and alligning this with the plane normal \bf{w}.

\begin{equation}
width = (\mathbf{x}_+ - \mathbf{x}_-) \cdot \frac{\mathbf{w}}{||\mathbf{w}||},
\end{equation}

and by some algebra we can find that to maximize the margin we can minimize

\begin{equation}
min \frac{1}{2} ||\mathbf{w}||^2
\end{equation}

to find the maximum boundary margin. We introduce the Lagrangian function where all constrainta are added to the optimization function by lagrangian multipliers $\alpha_i$, which are zero for most of the constraints with $n$ being the number of calssified data points. The constraints which are not support vectors at the margin tend to have an alpha close to zero.

\begin{equation}
L = \frac{1}{2} ||\mathbf{w}||^2 - \sum_i^n \alpha_i \langle (\mathbf{w} \cdot \mathbf{x} + b) - 1 \rangle,
\end{equation}
boundary
by restriction to the global minimum we can rewrite this to
\begin{equation}
L = \sum_i^n \alpha_i - \frac{1}{2} \sum_i^n \sum_j^n \alpha_i \alpha_j y_i y_j \mathbf{x}_i \cdot \mathbf{x}_j.
\end{equation}

This is the restriction to a linear descision boundary and margins. For more complex boundaries one can introduce the kernel as discussed above to replace the linear dotproduct, we will see an example of how this can be done later. 

\begin{equation}
L = \sum_i^n \alpha_i - \frac{1}{2} \sum_i^n \sum_j^n \alpha_i \alpha_j y_i y_j K(\mathbf{x}_i, \mathbf{x}_j).
\end{equation}

Source: https://towardsdatascience.com/support-vector-machine-python-example-d67d9b63f1c8

The Figure below illustrates the support vecotr machine for the linear two-dimensional case.

<img src="http://folk.ntnu.no/nikolalb/ML_ws/svm_slack.png" width=600>

The 2 element Windkessel model incorporates the compliant and resistive properties of the systemic arteries to compute the blood pressure response to an imposed blood flow from the heart. 

Varying the parameters for arterial compliance and resistance and creating 8000 Windkessel configurations and computed blood pressures we can label them as hypertensive and normotensive cases. The computed pressures are aortic pressures, but they were still classified using the European Society of Cardiology guidelines of systolic brachial pressure over 140 mmHg and diastolic pressure over 90 mmHg for hypertension. No transfer function between brachia and aorta was applied. The resulting domain where the input parameters compliance and resistance ($C$ and $R$) were plotted with their resulting systolic pressures is shown below. Orange dots indicate an hypertensive state, while people outside this range is classified as normotensive with blue dots, even if they are highly isolated systolic hypertensive. This adds some complexity to the issue since it is now not only a singe cut off value.

<img src="http://folk.ntnu.no/nikolalb/ML_ws/Hypertensives_Psys.svg">

## Can we classify hypertensives directly from measured parameter sets? 

Let us try making this work using only tensorflow and the assumption that the set is linearly separable. From the figure it clearly is not, but let us try either way.

Code source: https://www.youtube.com/watch?v=zErT-VtYOHk

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
#import keras
import tensorflow as tf

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
tf.logging.set_verbosity(tf.logging.ERROR) # depreciate warnings

In [None]:
#===================================================
# load data and split training set into actual training and validation sets
#===================================================
dataNameTrain = "WK2DataLabeled_train" # for training the model
dataNameTest = "WK2DataLabeled_test" # for testing the model

testFracTest = 0.9
colNames = ["C", "R", "P_sys", "P_dia", "PP", "Hyp"]
featureCols = [0, 1] # C and R
labelCol = 5 # Label

data = np.genfromtxt(dataNameTrain, dtype=np.float32,skip_header=True)
x, y = data[:,featureCols].copy(), data[:,labelCol].copy()
train_X, val_X, train_Y, val_Y = train_test_split(x, y, test_size=testFracTest, random_state=35)
#train_Y = np.array(train_Y)

train_Y= train_Y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)
val_Y= val_Y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)

print(train_X.shape, train_Y.shape)
print(type(train_X), type(train_Y))

hyp_C = [d[0] for i,d in enumerate(train_X) if train_Y[i]==1]
hyp_R = [d[1] for i,d in enumerate(train_X) if train_Y[i]==1]
norm_C = [d[0] for i,d in enumerate(train_X) if train_Y[i]==-1]
norm_R = [d[1] for i,d in enumerate(train_X) if train_Y[i]==-1]
        
#======================================================
# Visualize initial points
#======================================================
plt.plot(hyp_C, hyp_R,'o',label='Hypertensive')
plt.plot(norm_C, norm_R,'x',label='Normotensive')
plt.show()

In [None]:
#===================================================
# scale inputs
#===================================================
#scaler = preprocessing.StandardScaler()
#train_X = scaler.fit_transform(train_X) # fit (find mu and std) scale and transform data
#val_X  = scaler.transform(val_X) # transform data based on mu and std from training/learning set
#===================================================
# set parameters for SVM
#===================================================
batch_size = 60
epochs = 5000
trainingRate=0.01

#==============================
# Create placeholders and linear parameter placeholders for defining a hyperplane line.
#==============================
sess = tf.Session()

x_data = tf.placeholder(shape=[None,2], dtype=tf.float32,name="RCinput")
y_target = tf.placeholder(shape=[None,1], dtype=tf.float32,name="Label")

A = tf.Variable(tf.random_normal(shape=[2,1])) #Containing vector w, components
b = tf.Variable(tf.random_normal(shape=[1,1])) #Boundary bias, b

model_output = tf.subtract(tf.matmul(x_data, A), b) # w . x - b 

#==============================
# Maximum magin loss function
#==============================
l2_norm = tf.reduce_sum(tf.square(A))
alpha = tf.constant([0.01]) # characterizing the "Hardness" of the decision boundary. Lower alpha allows more points to cross.
classification_term = tf.reduce_mean(tf.maximum(0., tf.subtract(1.0,tf.multiply(model_output,y_target))))
# 1 - y_i (w . x_i +b) >= 0
loss = tf.add(tf.multiply(alpha,l2_norm), classification_term)
# alpha * ||w||^2 + max(1 - y_i (w . x_i +b), 0 )

#==============================
# Prediction and accuracy
#==============================

prediction = tf.sign(model_output)
accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction, y_target),tf.float32))
residuals = prediction - y_target

#=========================================================
# Set optimization method and initialize model variables
#=========================================================
opt_ref = tf.train.GradientDescentOptimizer(trainingRate)
train_step = opt_ref.minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)

In [None]:
#=========================================================
# Run session for number of epochs
#=========================================================

loss_vec = []
train_accuracy = []
test_accuracy = []

for i in range(epochs):
    rand_index = np.random.choice(len(train_X), size=batch_size)
    rand_x = train_X[rand_index,:]
    rand_y = train_Y[rand_index,:]
    
    _, train_loss = sess.run([train_step, loss],
                                      feed_dict={x_data: rand_x, y_target: rand_y})
    test_loss, test_resids = sess.run([loss,residuals],
                                     feed_dict={x_data: val_X, y_target: val_Y})
    loss_vec.append(train_loss)
    train_acc_temp = sess.run(accuracy, feed_dict={x_data: train_X, y_target: train_Y})
    train_accuracy.append(train_acc_temp)
    test_acc_temp = sess.run(accuracy, feed_dict={x_data: val_X, y_target: val_Y})
    test_accuracy.append(test_acc_temp)
    
    if (i+1)%500==0:
        print('Step #'+str(i+1)+' A = ' + str(sess.run(A)) + ' b = ' + str(sess.run(b)))
        print('Loss = ' + str(train_loss))


In [None]:
#======================================================
# Extract coefficients of hyperplane and classify
#======================================================
[[a1],[a2]] = sess.run(A)
[[b_out]] = sess.run(b)
print(a1,a2,b_out)
slope = -a1/a2
y_intercept = b_out/a2
x1_values = [d[0] for d in val_X]
best_fit = []
for i in x1_values:
    best_fit.append(slope*i+y_intercept)

hyp_C = [d[0] for i,d in enumerate(val_X) if val_Y[i]==1]
hyp_R = [d[1] for i,d in enumerate(val_X) if val_Y[i]==1]
norm_C = [d[0] for i,d in enumerate(val_X) if val_Y[i]==-1]
norm_R = [d[1] for i,d in enumerate(val_X) if val_Y[i]==-1]
        
#======================================================
# Visualize results
#======================================================
plt.plot(hyp_C, hyp_R,'o',label='Hypertensive')
plt.plot(norm_C, norm_R,'x',label='Normotensive')
plt.plot(x1_values, best_fit, 'r-', label='Linear Separator',linewidth=3)
#plt.plot(x1_values, 4.8 + 1./slope*np.array(x1_values),label='mirror')
#plt.ylim([-0.5,4.5])
plt.legend()
plt.title('C and R with hyperplane - training data')
plt.xlabel('C')
plt.ylabel('R')
plt.show()

plt.plot(train_accuracy, 'k--', label='Training Accuracy')
plt.plot(test_accuracy, 'r--', label='Test Accuracy')
plt.legend()
plt.title('Train and test set accuracies')
plt.xlabel('Generation')
plt.ylabel('Accuracy')
plt.show()
plt.plot(loss_vec,'k-')
plt.title('Loss per generation')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

## Quadratic kernel

Let us now try adding a kernel, in stead of a plain dot product between two-dimensional vectors let us now try using the square of a dot product, which should allow us to employ some more complex polynomial functions and other support vectors for determining our decision boundary. 

\begin{equation}
L = \frac{1}{2}||\mathbf{w}||^2 + \sum_{i,j} \alpha_i \alpha_j y_i y_j (\mathbf{x}_i \cdot \mathbf{x}_j )
\end{equation}

We substitute the dot-ptoduct for the kernel $K(\mathbf{x},\mathbf{x'})$

\begin{equation}
L = \frac{1}{2}||\mathbf{w}||^2 + \sum_{i,j} \alpha_i \alpha_j y_i y_j K(\mathbf{x_i},\mathbf{x_j})
\end{equation}

Here will apply a a second order polynomial kernel

\begin{equation}
K(\mathbf{x},\mathbf{x'}) = (1 + \mathbf{x}^T \cdot \mathbf{x'})^2
\end{equation}

Problem 1:
- Unfortunately I was myself unable to correctly implement this example in time. Can you implement it? Alterniatively skip ahead to the gaussian kernel problem to learn more and return to this problem later. The loss function and prediction functions must be properly implemented using the new kernel. 

For the original source, and other ideas/examples go to: https://github.com/nfmcclure/tensorflow_cookbook/tree/master/04_Support_Vector_Machines

In [None]:
#===================================================
# load data and split training set into actual training and validation sets
#===================================================
dataNameTrain = "WK2DataLabeled_train" # for training the model
dataNameTest = "WK2DataLabeled_test" # for testing the model

testFracTest = 0.97
colNames = ["C", "R", "P_sys", "P_dia", "PP", "Hyp"]
featureCols = [0, 1] # C and R
labelCol = 5 # Label

data = np.genfromtxt(dataNameTrain, dtype=np.float32,skip_header=True)
x, y = data[:,featureCols].copy(), data[:,labelCol].copy()
train_X, val_X, train_Y, val_Y = train_test_split(x, y, test_size=testFracTest, random_state=35)
#train_Y = np.array(train_Y)

train_Y= train_Y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)
val_Y= val_Y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)

print(train_X.shape, train_Y.shape)
print(type(train_X), type(train_Y))

hyp_C = [d[0] for i,d in enumerate(train_X) if train_Y[i]==1]
hyp_R = [d[1] for i,d in enumerate(train_X) if train_Y[i]==1]
norm_C = [d[0] for i,d in enumerate(train_X) if train_Y[i]==-1]
norm_R = [d[1] for i,d in enumerate(train_X) if train_Y[i]==-1]
        
#======================================================
# Visualize training points
#======================================================
plt.plot(hyp_C, hyp_R,'o',label='Hypertensive')
plt.plot(norm_C, norm_R,'x',label='Normotensive')
plt.show()

In [None]:
#===================================================
# scale inputs, optional
#===================================================
#scaler = preprocessing.StandardScaler()
#train_X = scaler.fit_transform(train_X) # fit (find mu and std) scale and transform data
#val_X  = scaler.transform(val_X) # transform data based on mu and std from training/learning set

#===================================================
# set parameters for SVM
#===================================================
batch_size = train_X.shape[0]
epochs = 10000
trainingRate=0.01

#==============================
# Create placeholders
#==============================
sess = tf.Session()

x_data = tf.placeholder(shape=[None,2], dtype=tf.float32,name="RCinput")
y_target = tf.placeholder(shape=[None,1], dtype=tf.float32,name="Label")
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)

b = tf.Variable(tf.random_normal(shape=[1,batch_size])) 
# b is here a vector containing the lagrangian multipliers (alphas as opposed to the bias) for the hyperplane

#==================================================
# Set kernel
#==================================================
# Quadratic
dotprod = tf.matmul(x_data,tf.transpose(x_data))
quad_kernel = tf.matmul(tf.add(1.,dotprod), tf.add(1.,dotprod))

#=================================# 
# Loss function for minimization  #
#=================================#
#...

#==============================
# Prediction computation
#==============================
#rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1])
#rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])
#pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data,tf.transpose(prediction_grid)))), tf.transpose(rB))
#pred_kernel = tf.square(tf.matmul(pred_sq_dist,tf.transpose(pred_sq_dist)))
#prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b),pred_kernel)
#prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output))
#accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction),tf.squeeze(y_target)), tf.float32))

#print(type(prediction_output), type(prediction))
#print(prediction_output.shape, prediction.shape)

#=========================================================
# Set optimization method and initialize model variables
#=========================================================
opt_ref = tf.train.GradientDescentOptimizer(trainingRate)
train_step = opt_ref.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

# Visualize initial condition
# Create a mesh to plot points in
x_min, x_max = train_X[:, 0].min() - 1, train_X[:, 0].max() + 1
y_min, y_max = train_X[:, 1].min() - 1, train_X[:, 1].max() + 1
x1, x2 = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
grid_points = np.c_[x1.ravel(), x2.ravel()]

[pred_init] = sess.run(prediction, feed_dict={x_data: train_X,
                                                   y_target: train_Y,
                                                   prediction_grid: grid_points})
print(pred_init.shape, grid_points.shape)
grid_predictions = pred_init.reshape(x1.shape)

plt.figure()
plt.contouf(x1,x2,grid_predictions)
plt.plot(hyp_C, hyp_R,'o',label='Hypertensive')
plt.plot(norm_C, norm_R,'x',label='Normotensive')
plt.plot(pred_init)

In [None]:
#=========================================================
# Run session for number of epochs
#=========================================================

loss_vec = []
train_accuracy = []

for i in range(epochs):
    rand_index = np.random.choice(len(train_X), size=batch_size)
    X = train_X[rand_index,:]
    Y = train_Y[rand_index,:]
    sess.run(train_step, feed_dict={x_data: X, y_target: Y})
    temp_loss = sess.run(loss, feed_dict={x_data: X, y_target: Y})
    loss_vec.append(temp_loss)
    acc_temp = sess.run(accuracy, feed_dict={x_data: X,y_target: Y,prediction_grid:X})
    train_accuracy.append(acc_temp)
    
    if (i+1)%1000==0:
        print('Epoch #'+str(i+1))
        print('Loss = ' + str(temp_loss))
        


In [None]:
# Create a mesh to plot points in
x_min, x_max = train_X[:, 0].min() - 1, train_X[:, 0].max() + 1
y_min, y_max = train_X[:, 1].min() - 1, train_X[:, 1].max() + 1
x1, x2 = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
grid_points = np.c_[x1.ravel(), x2.ravel()]
#print(x1.ravel(), x1.ravel().shape)
#print(type(grid_points), grid_points.shape)
[grid_predictions] = sess.run(prediction, feed_dict={x_data: train_X,
                                                   y_target: train_Y,
                                                   prediction_grid: grid_points})
grid_predictions = grid_predictions.reshape(x1.shape)
#print(grid_predictions.shape)

In [None]:
# Create a mesh to plot points in
x_min, x_max = train_X[:, 0].min() - 1, train_X[:, 0].max() + 1
y_min, y_max = train_X[:, 1].min() - 1, train_X[:, 1].max() + 1
x1, x2 = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
grid_points = np.c_[x1.ravel(), x2.ravel()]
#print(x1.ravel(), x1.ravel().shape)
#print(type(grid_points), grid_points.shape)
[grid_predictions] = sess.run(prediction, feed_dict={x_data: train_X,
                                                   y_target: train_Y,
                                                   prediction_grid: grid_points})
grid_predictions = grid_predictions.reshape(x1.shape)
#print(grid_predictions.shape)

In [None]:
#==================================
# Plot points and grid
#==================================
plt.contourf(x1, x2, grid_predictions, cmap=plt.cm.Paired, alpha=0.8)
plt.plot(hyp_C, hyp_R, 'bo', label='Hypertensive')
plt.plot(norm_C, norm_R, 'kx', label='Normotensive')
plt.title('Gaussian SVM Results for training points')
plt.xlabel('C')
plt.ylabel('R')
plt.legend(loc='lower right')
plt.ylim([-0.0, 3.5])
plt.xlim([-0.0, 7.5])
plt.show()

# Plot training accuracy
plt.plot(train_accuracy, 'k-', label='Accuracy')
plt.title('Training accuracy')
plt.xlabel('Generation')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.show()

# Plot loss over time
plt.plot(loss_vec, 'k-')
plt.title('Loss per Generation')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

#========================================
# Plot validation points
#========================================

hyp_C_val = [d[0] for i,d in enumerate(val_X) if val_Y[i]==1]
hyp_R_val = [d[1] for i,d in enumerate(val_X) if val_Y[i]==1]
norm_C_val = [d[0] for i,d in enumerate(val_X) if val_Y[i]==-1]
norm_R_val = [d[1] for i,d in enumerate(val_X) if val_Y[i]==-1]

plt.contourf(x1, x2, grid_predictions, cmap=plt.cm.Paired, alpha=0.8)
plt.plot(hyp_C_val, hyp_R_val, 'bo', label='Hypertensive')
plt.plot(norm_C_val, norm_R_val, 'kx', label='Normotensive')
plt.title('Gaussian SVM Results for validation points')
plt.xlabel('C')
plt.ylabel('R')
plt.legend(loc='lower right')
plt.ylim([-0.0, 3.5])
plt.xlim([-0.0, 7.5])
plt.show()

## Gaussian kernel

now we attempt with a kernel on the form

\begin{equation}
K(\mathbf{x},\mathbf{x}') = e^{\gamma |\mathbf{x} - \mathbf{x}'|^2}, \gamma < 0
\end{equation}

For the original source, and other ideas/examples go to: https://github.com/nfmcclure/tensorflow_cookbook/tree/master/04_Support_Vector_Machines

In [None]:
#===================================================
# load data and split training set into actual training and validation sets
#===================================================
dataNameTrain = "WK2DataLabeled_train" # for training the model
dataNameTest = "WK2DataLabeled_test" # for testing the model

testFracTest = 0.96
colNames = ["C", "R", "P_sys", "P_dia", "PP", "Hyp"]
featureCols = [0, 1] # C and R
labelCol = 5 # Label

data = np.genfromtxt(dataNameTrain, dtype=np.float32,skip_header=True)
x, y = data[:,featureCols].copy(), data[:,labelCol].copy()
train_X, val_X, train_Y, val_Y = train_test_split(x, y, test_size=testFracTest, random_state=35)
#train_Y = np.array(train_Y)

train_Y= 1.*train_Y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)
val_Y= val_Y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)

#print(train_X.shape, train_Y.shape)
#print(type(train_X), type(train_Y))

hyp_C = [d[0] for i,d in enumerate(train_X) if train_Y[i]==1]
hyp_R = [d[1] for i,d in enumerate(train_X) if train_Y[i]==1]
norm_C = [d[0] for i,d in enumerate(train_X) if train_Y[i]==-1]
norm_R = [d[1] for i,d in enumerate(train_X) if train_Y[i]==-1]
        
#======================================================
# Visualize training points
#======================================================
plt.plot(hyp_C, hyp_R,'o',label='Hypertensive')
plt.plot(norm_C, norm_R,'x',label='Normotensive')
plt.show()

In [None]:
#===================================================
# scale inputs
#===================================================
#scaler = preprocessing.StandardScaler()
#train_X = scaler.fit_transform(train_X) # fit (find mu and std) scale and transform data
#val_X  = scaler.transform(val_X) # transform data based on mu and std from training/learning set
#===================================================
# set parameters for SVM
#===================================================
batch_size = train_X.shape[0]
epochs = 1000
trainingRate=0.01
gamma_const = -100.0

In [None]:
#==============================
# Create placeholders and linear parameter placeholders for defining a hyperplane line.
#==============================
sess = tf.Session()

x_data = tf.placeholder(shape=[None,2], dtype=tf.float32,name="RCinput")
y_target = tf.placeholder(shape=[None,1], dtype=tf.float32,name="Label")
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)

b = tf.Variable(tf.random_normal(shape=[1,batch_size])) # Not linear bias b, but vector of lagrangian multipliers alpha
#==================================================
# Set kernel
#==================================================
# Gaussian
gamma = tf.constant(gamma_const)
dist = tf.reduce_sum(tf.square(x_data), 1) # sum (x^2) ---> vector of N_batch elements R^2 + C^2
dist = tf.reshape(dist, [-1,1]) #Reshape to 1D vector
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data,tf.transpose(x_data)))), tf.transpose(dist))
# dist^T + (dist - 2* x cross x^T)
gaussian_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists))) # exp(gamma * |(x^2)^T + x^2 - 2 * x^2_matrix|)

#==============================
# Specify loss function
#==============================
first_term = tf.reduce_sum(b) # sum alpha
b_vec_cross = tf.matmul(tf.transpose(b), b) # alpha^T x alpha (row vectors)
y_target_cross = tf.matmul(y_target, tf.transpose(y_target)) # y x y^T (column vectors)
second_term = tf.reduce_sum(tf.multiply(gaussian_kernel, tf.multiply(b_vec_cross,y_target_cross))) 
# sum (exp(...) * bxb * yxy)
loss = tf.negative(tf.subtract(first_term, second_term)) #-(sum b - sum (K bxb yxy))

#==============================
# Prediction function set up
#==============================
rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1]) # sum R^2+C^2, 1D vector
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1]) #Square distances to points in grid
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data,tf.transpose(prediction_grid)))), tf.transpose(rB))
# Squared distances between grid points and data points
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist))) # Evaluate kernel
prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b),pred_kernel) # (y_vec^T * b) * K
prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output)) #sign of deviation from mean
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction),tf.squeeze(y_target)), tf.float32)) 
# sum correct predictions

#print(type(prediction_output), type(prediction))
#print(prediction_output.shape, prediction.shape)

#=========================================================
# Set optimization method and initialize model variables
#=========================================================
opt_ref = tf.train.GradientDescentOptimizer(trainingRate)
train_step = opt_ref.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

In [None]:
#=========================================================
# Run session for number of epochs
#=========================================================

loss_vec = []
train_accuracy = []

for i in range(epochs):
    rand_index = np.random.choice(len(train_X), size=batch_size)
    X = train_X[rand_index,:]
    Y = train_Y[rand_index,:]
    sess.run(train_step, feed_dict={x_data: X, y_target: Y})
    temp_loss = sess.run(loss, feed_dict={x_data: X, y_target: Y})
    loss_vec.append(temp_loss)
    acc_temp = sess.run(accuracy, feed_dict={x_data: X,y_target: Y,prediction_grid:X})
    train_accuracy.append(acc_temp)
    
    if (i+1)%1000==0:
        print('Epoch #'+str(i+1))
        print('Loss = ' + str(temp_loss))


In [None]:
#====================================
# Create a mesh to plot points in
#====================================
x_min, x_max = train_X[:, 0].min() - 1, train_X[:, 0].max() + 1
y_min, y_max = train_X[:, 1].min() - 1, train_X[:, 1].max() + 1
x1, x2 = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
grid_points = np.c_[x1.ravel(), x2.ravel()] #Turn grid into lists and join pairwise to a [.., 2] matrix

[grid_predictions] = sess.run(prediction, feed_dict={x_data: train_X,
                                                   y_target: train_Y,
                                                   prediction_grid: grid_points}) # Evaluate training for gridpoints
grid_predictions = grid_predictions.reshape(x1.shape) # Reshape to grid for plotting.
#print(grid_predictions.shape)

In [None]:
#==================================
# Plot points and grid
#==================================
plt.contourf(x1, x2, grid_predictions, cmap=plt.cm.Paired, alpha=0.8)
plt.plot(hyp_C, hyp_R, 'bo', label='Hypertensive')
plt.plot(norm_C, norm_R, 'kx', label='Normotensive')
plt.title('Gaussian SVM Results for training points')
plt.xlabel('C')
plt.ylabel('R')
plt.legend(loc='lower right')
plt.ylim([-0.0, 3.5])
plt.xlim([-0.0, 7.5])
plt.show()

#============================
# Plot training accuracy
#============================
plt.plot(train_accuracy, 'k-', label='Accuracy')
plt.title('Training accuracy')
plt.xlabel('Generation')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.show()

#======================
# Plot loss over time
#======================
plt.plot(loss_vec, 'k-')
plt.title('Loss per Generation')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

#========================================
# Plot validation points
#========================================
x_min, x_max = val_X[:, 0].min() - 1, val_X[:, 0].max() + 1
y_min, y_max = val_X[:, 1].min() - 1, val_X[:, 1].max() + 1
x1, x2 = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
grid_points = np.c_[x1.ravel(), x2.ravel()]
print(x1.ravel(), x1.ravel().shape)
print(type(grid_points), grid_points.shape)
[grid_predictions] = sess.run(prediction, feed_dict={x_data: val_X,
                                                   y_target: val_Y,
                                                   prediction_grid: grid_points})
grid_predictions = grid_predictions.reshape(x1.shape)

hyp_C_val = [d[0] for i,d in enumerate(val_X) if val_Y[i]==1]
hyp_R_val = [d[1] for i,d in enumerate(val_X) if val_Y[i]==1]
norm_C_val = [d[0] for i,d in enumerate(val_X) if val_Y[i]==-1]
norm_R_val = [d[1] for i,d in enumerate(val_X) if val_Y[i]==-1]

plt.contourf(x1, x2, grid_predictions, cmap=plt.cm.Paired, alpha=0.8)
#plt.plot(hyp_C_val, hyp_R_val, 'bo', label='Hypertensive')
#plt.plot(norm_C_val, norm_R_val, 'kx', label='Normotensive')
plt.title('Gaussian SVM Results for validation points')
plt.xlabel('C')
plt.ylabel('R')
plt.legend(loc='lower right')
plt.ylim([-0.0, 3.5])
plt.xlim([-0.0, 7.5])
plt.show()

In [None]:
#======================================================
# Extract coefficients of b-vector
#======================================================
b_out = sess.run(b)
#print(sess.run(prediction_output))
print(b_out)

#dist = np.sum(train_X**2,1)
#sq_dist = dist - 2*np.matmul(train_X,np.transpose(train_X))+ np.transpose(dist)
#gaussian = np.exp(gamma_const*sq_dist)
#gaussian = np.exp(gamma_const*(train_X)**2)
#output = np.matmul(b_out,gaussian)
#print('Output',output.shape)
plt.plot(b_out.flatten(), 'k--')
plt.show()
  
from mpl_toolkits.mplot3d import axes3d
    
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
       
ax.scatter(train_X[:,0], train_X[:,1], b_out)
ax.set_xlabel('C')
ax.set_ylabel('R')
ax.set_zlabel('b')
plt.show()

#pred_out = sess.run(prediction_grid, feed_dict={x_data: train_X})
#print(type(pred_out),pred_out)
#ouput1 = sess.run(prediction)
#print(output1)
#[[b_out]] = sess.run(b)
#print(a1,a2,b_out)
#slope = -a1/a2
#y_intercept = b_out/a2
#x1_values = [d[0] for d in val_X]
#best_fit = []
#for i in x1_values:
#    best_fit.append(slope*i+y_intercept)

#hyp_C = [d[0] for i,d in enumerate(val_X) if val_Y[i]==1]
#hyp_R = [d[1] for i,d in enumerate(val_X) if val_Y[i]==1]
#norm_C = [d[0] for i,d in enumerate(val_X) if val_Y[i]==-1]
#norm_R = [d[1] for i,d in enumerate(val_X) if val_Y[i]==-1]
        
#======================================================
# Visualize results
#======================================================
plt.plot(hyp_C, hyp_R,'o',label='Hypertensive')
plt.plot(norm_C, norm_R,'x',label='Normotensive')
plt.plot(b_out, 'k--')
#plt.plot(x1_values, 4.8 + 1./slope*np.array(x1_values),label='mirror')
plt.ylim([-0.5,4.5])
plt.legend()
plt.title('C and R with hyperplane - training data')
plt.xlabel('C')
plt.ylabel('R')
plt.show()

plt.plot(train_accuracy, 'k--', label='Training Accuracy')
plt.legend()
plt.title('Training batch accuracies')
plt.xlabel('Generation')
plt.ylabel('Accuracy')
plt.show()
plt.plot(loss_vec,'k-')
plt.title('Loss per generation')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()