# One Class SVM
A one class support vector machine (SVM) is an SVM modified to create a boundary around a single class and classify points beyond the boundary as outliers. The algorithm is based on Schölkopf B, Williamson RC, Smola AJ, Shawe-Taylor J, Platt JC. <a href="https://www.researchgate.net/publication/221619107_Support_Vector_Method_for_Novelty_Detection">Support Vector Method for Novelty Detection</a>. Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999]. 1999. pp. 582–588.

Essentially, a one class SVM algorithm treats the origin in feature space as the second class. It then maximizes the distance between the origin and the support vectors, with the decision hyperplane bisecting that distance. As with regular SVM, slack variables can be used to create a soft boundary.

According to Microsoft Azure's <a href="https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-cheat-sheet">helpful cheatsheet</a>, one class SVM is good for data with > 100 features. While the dataset I'm using only has 30 features, UnifyID uses 100s of features in their implicit authentication algorithm. I thought it would be useful to practice this algorithm for the future.

I will be using the scikit learn implementation.

See ```Data Exploration and Setup.ipynb``` for more information on the data set used and how it was split.

## Initial Training

In [29]:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import PredefinedSplit
from sklearn.metrics import roc_auc_score

In [2]:
#load training data
X_train = pd.read_csv('X_train.csv')

In [3]:
X_train.shape #should be (142403, 30)

(142403, 30)

Since I can't determine unique credit cards, I'm going to drop the 'Time' feature. I'm also going to mean normalize the data.

In [4]:
X_train = X_train.drop('Time', 1)

In [5]:
X_train.shape

(142403, 29)

In [6]:
X_train = (X_train - X_train.mean()) / (X_train.std())

In [7]:
#testing the model just on one parameter set first
#linear worked for the centroid model, so might as well start there
model_test = svm.OneClassSVM(nu=0.01, kernel='linear')

In [8]:
model_test.fit(X_train)

OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='auto',
      kernel='linear', max_iter=-1, nu=0.01, random_state=None,
      shrinking=True, tol=0.001, verbose=False)

In [14]:
#load validation data
X_val = pd.read_csv('X_val.csv')

In [15]:
X_val.shape #should be (56961, 30)

(56961, 30)

In [16]:
#drop 'Time' and mean normalize
X_val = X_val.drop('Time', 1)
X_val = (X_val - X_val.mean()) / (X_val.std())

In [17]:
X_val.shape # should be (56961, 29)

(56961, 29)

In [18]:
y_predict = model_test.predict(X_val)

In [22]:
y_val = pd.read_csv('y_val.csv', header=None)

In [23]:
y_val.shape #should be (56961, 1)

(56961, 1)

In [24]:
# the sklearn algorithm uses -1 for normal points, so I need to change 0s to 1s
y_val[y_val==0] = -1

In [28]:
y_val[0].unique() #check that there's only 1's and -1's

array([-1,  1])

I'll use the ```roc_auc_score()``` function from scikit learn to calculate the area under the curve.

In [32]:
auc = roc_auc_score(y_val, y_predict)
auc

0.72863286158293139

Considering this is likely not the optimal model for this dataset, this result isn't bad. Now to test out a few more hyperparameters. Unfortunately, ```GridSearchCV()``` with scikit learn requires y_true to have more that one class, so the grid search will have to be done manually.

In [54]:
# I'm going to try the radial basis function kernel and a linear kernel (no transformation)
kernels = ['rbf', 'linear']
nus = [0.1, 0.01, 0.001]
gammas = [0.1, 0.01, 0.001]

In [56]:
auc = pd.DataFrame({'kernel': [], 'nu': [], 'gamma': [], 'AUC': []})
row = {}

for kernel in kernels:
    row['kernel'] = kernel

    for nu in nus:
        row['nu'] = nu
        
        if kernel == 'rbf':
            for gamma in gammas:
                row['gamma'] = gamma

                model = svm.OneClassSVM(nu=nu, kernel=kernel, gamma=gamma)
                model.fit(X_train)
                y_predict = model.predict(X_val)
                row['AUC'] = roc_auc_score(y_val, y_predict)
                auc = auc.append(row, ignore_index=True)
        else:
            row['gamma'] = 'NA'
            
            model = svm.OneClassSVM(nu=nu, kernel=kernel)
            model.fit(X_train)
            y_predict = model.predict(X_val)
            row['AUC'] = roc_auc_score(y_val, y_predict)
            auc = auc.append(row, ignore_index=True)

In [57]:
auc

Unnamed: 0,AUC,gamma,kernel,nu
0,0.085154,0.1,rbf,0.1
1,0.088529,0.01,rbf,0.1
2,0.088741,0.001,rbf,0.1
3,0.083976,0.1,rbf,0.01
4,0.098986,0.01,rbf,0.01
5,0.17805,0.001,rbf,0.01
6,0.083994,0.1,rbf,0.001
7,0.167384,0.01,rbf,0.001
8,0.400959,0.001,rbf,0.001
9,0.808935,,linear,0.1


In [58]:
#training took a while, so writing this to csv for later
auc.to_csv('svm_grid_search.csv')

So the choice of parameters is linear kernel with nu=0.1. The AUC is still a bit low, but that is to be expected.

In [59]:
model_final = svm.OneClassSVM(nu=0.1, kernel='linear')

In [60]:
model_final.fit(X_train)

OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='auto',
      kernel='linear', max_iter=-1, nu=0.1, random_state=None,
      shrinking=True, tol=0.001, verbose=False)

In [65]:
#load test data
X_test = pd.read_csv('X_test.csv')

In [66]:
X_test.shape #should be (56961, 30)

(56961, 30)

In [67]:
X_test = X_test.drop('Time', 1)

In [68]:
X_test.shape

(56961, 29)

In [69]:
X_test = (X_test - X_test.mean()) / (X_test.std())

In [71]:
y_predict = model_final.predict(X_test)

In [63]:
y_test = pd.read_csv('y_test.csv', header=None)

In [64]:
y_test.shape #should be (56961, 1)

(56961, 1)

In [72]:
auc = roc_auc_score(y_test, y_predict)
auc

0.80465502188694538

Alright, the model generalizes well.

## Updating the Model
Since the hyperplane is calculated using only the support vectors, all of the initial non vector points can be discarded. For updates, the previous support vectors plus the new data will be input back into ```OneClassSVM()```. $\nu$ will be set to n_support_vectors / (n_support_vectors + n_new_data). Because $\nu$ puts a lower bound on the fraction of support vectors, this means the algorithm will use at least n_support_vectors as the new support vectors. To avoid the possibility of an infitiely growing number of support vectors, n_support_vectors can stay constant at the number of support vectors from the initial training

This is an idea and algorithm I developed based on my knowledge of SVMs. To my knowledge, this update algorithm has not been published by anyone else.

In [74]:
support_vectors = model_final.support_vectors_

In [75]:
support_vectors.shape # should have 29 columns

(14762, 29)

In [85]:
n_support_vectors = support_vectors.shape[0]

In [76]:
# load update data
X_update = pd.read_csv('X_update.csv')

In [77]:
X_update.shape #should be (28482, 30)

(28482, 30)

In [78]:
X_update = X_update.drop('Time', 1)

In [79]:
X_update.shape

(28482, 29)

In [80]:
X_update = (X_update - X_update.mean()) / (X_update.std())

In [81]:
#see how many are outliers
y_predict = model_final.predict(X_update)

In [84]:
# OneClassSVM() outputs 1 for outliers and -1 for normal
y_predict[y_predict==-1] = 0
frac_outliers = y_predict.sum() / y_predict.shape[0]
frac_outliers

0.31110877045151325

In [90]:
if frac_outliers < 0.5:
    n_new_data = X_update.shape[0]
    
    #combine the support vectors and the new data points
    X_new = np.vstack((support_vectors, X_update.values))
    
    #recalculate nu
    nu = n_support_vectors / (n_support_vectors + n_new_data)
    
    #fit the new data
    model_update = svm.OneClassSVM(nu=nu, kernel='linear')
    model_update.fit(X_new)

In [89]:
X_new.shape #should be (43244, 29)

(43244, 29)

In [91]:
model_update.support_vectors_.shape #should be ~14762

(14989, 29)

## Making Functions
I'm assuming that only the needed features are input into the function. I am not assuming that the data is normalized. For now, I'm only making sure these functions work with DataFrames, since that is what I used to develop it.

In [9]:
def model_B_initialize(X_train):
    """
    This algorithm implements the inital training for the OneClassSVM() algorithm from scikit learn. Hyperparameters
    were optimized previously to nu = 0.1 and kernel = 'linear'.
    
    Inputs:
        X_train: A DataFrame of features. All features are input into the model.
        
    Output:
        model: The OneClassSVM() model object with stored fit results.
        n_support_vectors: The number of suport vectors used by the algorithm.
    """
    #imports
    import pandas as pd
    from sklearn import svm
    
    #make sure X_train is mean normalized
    X_train = (X_train - X_train.mean()) / (X_train.std())

    # create model
    model = svm.OneClassSVM(nu=0.1, kernel='linear') #hyperparameters found through validation

    # fit the model
    model.fit(X_train)
    
    #extract the number of support vectors
    support_vectors = model.support_vectors_
    n_support_vectors = support_vectors.shape[0]

    return model, n_support_vectors

In [2]:
def model_B_update(X_update, model, n_support_vectors):
    """
    This function updates the OneClassSVM() algorithm with new data. It uses the support vectors from the previous
    model and the new data to train a new model. This data set is much smaller than the original, so training is 
    faster. nu is also set to be n_support_vectors / (n_support_vectors + n_new_data) so that the number of support
    vectors stays roughly constant. n_support_vectors is held constant across updates, but the support vectors are
    updated every time.
    
    Inputs:
        X_update: a DataFrame of features. All features will be used in the algorithm.
        model: The OneClassSVN() model object from the previous training. The model must include fit results.
        n_support_vectors: The number of support vectors generated in the initial training
        
    Outputs:
        model_update: The updated OneClassSVM() model object with stored fit results.
        y_predict: The predictions for each data point. 0=normal, 1=outlier.
    """
    #imports
    import pandas as pd
    from sklearn import svm
    
    #extract the support vectors
    support_vectors = model.support_vectors_

    X_update = (X_update - X_update.mean()) / (X_update.std())

    #see how many are outliers
    y_predict = model.predict(X_update)

    # OneClassSVM() outputs 1 for outliers and -1 for normal
    y_predict[y_predict==-1] = 0
    frac_outliers = y_predict.sum() / y_predict.shape[0]

    if frac_outliers < 0.5:
        n_new_data = X_update.shape[0]

        #combine the support vectors and the new data points
        X_new = np.vstack((support_vectors, X_update.values))

        #recalculate nu
        nu = n_support_vectors / (n_support_vectors + n_new_data)

        #fit the new data
        model_update = svm.OneClassSVM(nu=nu, kernel='linear')
        model_update.fit(X_new)
        
    return model_update, y_predict

In [3]:
import pandas as pd
import numpy as np

In [4]:
#load training data
X_train = pd.read_csv('X_train.csv')

In [5]:
X_train = X_train.drop('Time', 1)

In [6]:
# load update data
X_update = pd.read_csv('X_update.csv')

In [7]:
X_update = X_update.drop('Time', 1)

In [10]:
model, n_support_vectors = model_B_initialize(X_train)

In [11]:
model.support_vectors_.shape #should be (14762, 29)

(14762, 29)

In [12]:
n_support_vectors #should be 14762

14762

In [14]:
model_update, y_predict = model_B_update(X_update, model, n_support_vectors)

In [15]:
model_update.support_vectors_.shape #should be (14989, 29)

(14989, 29)

In [16]:
y_predict.sum()

8861.0