## Concept

* Unsupervised algo that models normal examples in order to classify new ex as either normal or abnormal
* A one-class classifier aims at capturing characteristics of normal training instances, in order to be able to distinguish between them and potential outliers to appear.
* Before oneclass SVM - try standard SVM and weighted SVM
* Algo:
    * Unlike normal svm, this algo considers to maximize the margin split btw normal data against the origin
    * It creates a boundary such the distance btw origin and normal class are maximised
    * Train only on normal data
    
* Hyperparameters
    * $\nu$ contols sensitivity of support vectors (how many outliers to allow) ~ to contamination in isolation forest


In [6]:
# one-class svm for imbalanced binary classification
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.svm import OneClassSVM
# generate dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
	n_clusters_per_class=1, weights=[0.999], flip_y=0, random_state=4)

# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2, stratify=y)

# define outlier detection model
model = OneClassSVM(gamma=0.001, nu=0.01)

# fit on majority class
trainX = trainX[trainy==0]
model.fit(trainX)

# detect outliers in the test set
yhat = model.predict(testX)

# mark inliers 1, outliers -1
testy[testy == 1] = -1
testy[testy == 0] = 1

# calculate score
score = f1_score(testy, yhat, pos_label=-1)
print('F1 Score: %.3f' % score)

F1 Score: 0.130
