<h1 align="center" style="background-color:#616161;color:white">Nonlinear SVM Example in Tensorflow</h1>

Adapted from: https://github.com/nfmcclure/tensorflow_cookbook/blob/master/04_Support_Vector_Machines/05_Implementing_Nonlinear_SVMs/05_nonlinear_svm.ipynb

This function wll illustrate how to implement the gaussian kernel on the iris dataset.

Gaussian Kernel:

$$K(x_{1}, x_{2}) = exp\left(-\gamma * (x_{1} - x_{2})^{2}\right)$$

We start by loading the necessary libraries and resetting the computational graph.

<h3 style="background-color:#616161;color:white">0. Setup</h3>

<div style="background-color:white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Input Parameters</div>

In [1]:
PeriodGranularity = 30 # E.g. 15, 30, 60
# Train / Test split
newUsers = 10   # Num of randomly selected users to separate out of eval 2
rndPeriods = 3 # Num of random periods from each use to select
rndPeriodsLength = int(60/PeriodGranularity) * 24 * 7 * 4     # How long the random test period should cover

# Root path
#root = "C:/DS/Github/MusicRecommendation"  # BA, Windows
root = "/home/badrul/Documents/git/MusicRecommendation" # BA, Linux

<div style="background-color:white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Common Libraries</div>

In [2]:
# Core
import numpy as np
import pandas as pd
from IPython.core.debugger import Tracer    # Used for debugging
import logging

# File and database management
import csv
import os
import sys
import json
import sqlite3
from pathlib import Path

# Date/Time
import datetime
import time
#from datetime import timedelta # Deprecated

# Visualization
import matplotlib.pyplot as plt             # Quick
%matplotlib inline

# Misc
import random

#-------------- Custom Libs -----------------#
os.chdir(root)

# Import the codebase module
fPath = root + "/1_codemodule"
if fPath not in sys.path: sys.path.append(fPath)

# Custom Libs
import coreCode as cc
import lastfmCode as fm

<div style="background-color:white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Page Specific Libraries</div>

In [3]:
# Data science (comment out if not needed)
#from sklearn.manifold import TSNE
import tensorflow as tf
from tensorflow.python.framework import ops
ops.reset_default_graph()

<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Declare Functions</div>

<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Load settings</div>

In [4]:
settingsDict =  cc.loadSettings()
dbPath = root + settingsDict['mainDbPath']
fmSimilarDbPath = root + settingsDict['fmSimilarDbPath']
fmTagsDbPath = root + settingsDict['fmTagsDbPath']
trackMetaDbPath = root + settingsDict['trackmetadata']

Create a graph session

<h3 style="background-color:#616161;color:white">1. Load data</h3>

In [5]:
def getTrainAndTestData():
    con = sqlite3.connect(dbPath)
    c = con.cursor()

    # Get list of UserIDs 
    trainUsers = pd.read_sql_query("Select UserID from tblUsers Where tblUsers.TestUser = 0",con)

    fieldList="t, UserID, HrsFrom6pm, isSun,isMon,isTue,isWed,isThu,isFri,isSat,t1,t2,t3,t4,t5,t10,t12hrs,t24hrs,t1wk,t2wks,t3wks,t4wks"
    trainDf=pd.DataFrame(columns=[fieldList])  # Create an emmpty df
    testDf=pd.DataFrame(columns=[fieldList])  # Create an emmpty df
    periodsInAMonth=int(60/PeriodGranularity)*24*7*4

    totalRows=0
    
    for user in trainUsers.itertuples():
        # Get training dataset
        SqlStr="SELECT {} from tblTimeSeriesData where UserID = {}".format(fieldList,user.userID)
        df = pd.read_sql_query(SqlStr, con)
        totalRows += len(df)
    
        # Cut-off 1
        k = random.randint(periodsInAMonth, len(df))
        #Tracer()()  -- for debugging purposes
        testDf = testDf.append(df.iloc[k:k+periodsInAMonth])[df.columns.tolist()]

        tmp = df.drop(df.index[k:k+periodsInAMonth])

        # Cut-off 2
        k = random.randint(periodsInAMonth, len(tmp))
        testDf = testDf.append(tmp.iloc[k:k+periodsInAMonth])[df.columns.tolist()]
        trainDf = trainDf.append(tmp.drop(tmp.index[k:k+periodsInAMonth]))[df.columns.tolist()]

    if len(trainDf)+len(testDf) == totalRows:
        print('Ok')
    else:
        print("Incorrect. Total Rows = {}. TestDf+TrainDf rows = {}+{}={}".format(totalRows,len(testDf),len(trainDf),len(testDf)+len(trainDf)))
        
    return trainDf, testDf

trainDf,testDf = getTrainAndTestData()

x_vals = trainDf.drop(['t','UserID'], 1).values
y_vals = trainDf['t'].values.astype(int) 
# Change the 0's to -1
y_vals = np.array([1 if y==1 else -1 for y in y_vals])

Ok


<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Confirm dimensions</div>

In [6]:
numOfFeatures = np.shape(x_vals)[1]
np.shape(x_vals),np.shape(y_vals)

((937519, 20), (937519,))

<h3 style="background-color:#616161;color:white">2. Define Model</h3>

<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Model Parameters</div>

We now declare our batch size, placeholders, and the fitted b-value for the SVM kernel.  Note that we will create a separate placeholder to feed in the prediction grid for plotting.

In [7]:
sess = tf.Session()

In [8]:
# Declare batch size
batch_size = 1000

# Initialize placeholders within the tf graph
x_data = tf.placeholder(shape=[None, numOfFeatures], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
prediction_grid = tf.placeholder(shape=[None, numOfFeatures], dtype=tf.float32)

# Create variables for svm within the tf graph
b = tf.Variable(tf.random_normal(shape=[1,batch_size]))

<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Define Kernel</div>

We create the gaussian (RBF) kernel that is used to transform the data points into a higher dimensional space.

The Kernel of two points, $x$ and $x'$ is given as

$$K(x, x')=exp\left(-\gamma|| x-x' ||^{2}\right)$$

For $\gamma$ very small, the kernel is very wide, and vice-versa for large $\gamma$ values.  This means that large $\gamma$ leads to high bias and low variance models.

If we have a vector of points, $x$ of size (batch_size, 2), then our kernel calculation becomes

$$K(\textbf{x})=exp\left( -\gamma \textbf{x} \cdot \textbf{x}^{T} \right)$$

In [9]:
# Define the formula
gamma = tf.constant(-1.0) # Gamma constant as a negative 1
sq_vec = tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))  # (x*x^T)*(x*x^T)
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_vec)))  # exp(gamma*sq_vec)

<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Define Loss function</div>

Here, the SVM loss is given by two terms, The first term is the sum of the $b$ matrix

$$\sum\textbf{b}$$

and the second term is 

$$\sum\left(K\cdot||\textbf{b}||^{2}||\textbf{y}||^{2}\right)$$

We finally tell TensorFlow to maximize the loss by minimizing the negative:  (The following is a horribly abbreviated version of the dual problem)

$$-\left(\sum\textbf{b} - \sum\left(K\cdot||\textbf{b}||^{2}||\textbf{y}||^{2}\right)\right)$$

In [10]:
# Compute SVM loss
first_term = tf.reduce_sum(b)   # without the axis param this is a straifghtforward sum
bsq = tf.matmul(tf.transpose(b), b)    # b^2
ysq = tf.matmul(y_target, tf.transpose(y_target))  # y^2
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(bsq, ysq)))
loss = tf.negative(tf.subtract(first_term, second_term))

<div style="background-color:#white; color:#008000; font-family: 'Courier New, Monospace;font-weight: bold">Define Prediction kernel</div>

Now we do the exact same thing as above for the prediction points. 

In [19]:
# Gaussian (RBF) prediction kernel
rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1])
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data, tf.transpose(prediction_grid)))), tf.transpose(rB))
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))

prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b), pred_kernel)
prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction), tf.squeeze(y_target)), tf.float32))

<b>Badrul's notes</b>:<br>
This is a good reference: http://www.robots.ox.ac.uk/~az/lectures/ml/lect3.pdf
$$x = test data$$
$$x_i = train data$$
$$rA = x_i^2$$
$$rB=x^2$$
$$\gamma ||(x_i^2 -(x_i  x)^2 + x^2)||$$

Note following how this is 'the exact same thing as above' or how it relates to the RBF formulas I see in the pdf ref

### Optimizing Method

We declare our gradient descent optimizer and intialize our model variables (`b`)

In [12]:
# Declare optimizer
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)

# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)

### Run the Classification!

We iterate through the training for 300 iterations. We will output the loss every 300 iterations. One thing to remember about sess is that the first parameter, in this case 'loss' is simply referring to the variable called loss defined earlier - which you can think of as defining a function

In [13]:
# Training loop
loss_vec = []
batch_accuracy = []
for i in range(1000):
    rand_index = np.random.choice(len(x_vals), size=batch_size)
    rand_x = x_vals[rand_index]
    rand_y = np.transpose([y_vals[rand_index]])
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
    
    temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
    loss_vec.append(temp_loss)
    
    acc_temp = sess.run(accuracy, feed_dict={x_data: rand_x,
                                             y_target: rand_y,
                                             prediction_grid:rand_x})
    batch_accuracy.append(acc_temp)
    
    if (i+1)%300==0:
        print('Step #' + str(i+1))
        print('Loss = ' + str(temp_loss))

Step #300
Loss = -1.3565e+11
Step #600
Loss = -1.85351e+20
Step #900
Loss = -3.57153e+28


In [14]:
# Last 10 accuracy scores
batch_accuracy[990:]

[0.77399999,
 0.62699997,
 0.458,
 0.31099999,
 0.616,
 0.73199999,
 0.611,
 0.458,
 0.111,
 0.41800001]

### Evaluate Test data

In [15]:
# Test data
x_test= testDf.drop(['t','UserID'], 1).values
y_test = testDf['t'].values.astype(int)
y_test=y_test.reshape(len(y_test),1)
y_test = np.array([1 if y==1 else -1 for y in y_test])
np.shape(x_test), np.shape( y_test)

((55304, 20), (55304,))

Now we can evaluate the predictions on our test points:

In [16]:
[test_predictions] = sess.run(prediction, feed_dict={x_data: rand_x,
                                                     y_target: rand_y,
                                                     prediction_grid: x_test})
#test_predictions = test_predictions.reshape(x_test.shape)

In [17]:
test_predictions.ravel()

array([ 1.,  1.,  1., ..., -1., -1., -1.], dtype=float32)

Format the test points together with the predictions:

In [18]:
from sklearn import metrics
print(metrics.classification_report(y_test,test_predictions))
print(metrics.confusion_matrix(y_test,test_predictions))

             precision    recall  f1-score   support

         -1       0.86      0.44      0.58     50236
          1       0.05      0.30      0.09      5068

avg / total       0.79      0.43      0.54     55304

[[22089 28147]
 [ 3566  1502]]
