# Introduction

The purpose of a one-class classifier is identical to the purpose of a supervised binary classifier. 
New data is classified to belong to one of two classes based on a classification model trained from labeled samples, for which the class membership is known. 
In contrast to the supervised classifier, the training data of the one-class classifier only contains labeled samples from the class of interest, i.e. the positive class. 
In the case of the binary classifier also the other class, or the  negative class, has to be represented with the training set.
Collecting a representative training set for the negative class can be very costly and time-consuming due to the fact that the negative class is the aggregation of all other classes without the positive class. 
Thus, a one-class classifier is particularly useful when only one  or a few classes have to be mapped and when the acquisition of representative labeled data for the negative class is expensive or not possible at all.
The convenience of not requiring negative training data comes at a price. 
One-class classification is challenging due to the limited information contained in the training set. 
Unlabeled training data can be necessary for some classification problems in order to learn  more accurate predictive models. 
However, the process is still uncertain and the classification outcome has to be treated with caution.
The package **oneClass** shall serve the requirements of two potential users, the analyst and the developer.
These are extrem characters and in reality one will usually be located somewhere in between. 
The analyst is faced with a particular one-class classification problem, i.e. a set of positive training samples and the unlabeled data to be classified. 
It is assumed that no complete and representative test set is available for the purpose of validation and testing. 
In such a situation a careful evaluation of the classification outcome based on the available (positive and unlabeled) data is 
required in order to select the most promising final model and threshold [1]. 
The function trainOcc() is a wrapper for the `train()` function of **caret** (http://caret.r-forge.r-project.org/, [2]) which is called with one of the one-class classification methods implemented in **oneClass** (see Section4). 
trainOcc() returns an object of class *trainOcc* which inherits from class *train*. 
Thus, the extensive infrastructure of **caret** is available, such as parallel processing and different methods for pre-processing, resampling, and model comparision. 
Furthermore, the **oneClass** infrastructure comprises one-class classification specific methods, such as performance metrics based on positive and unlabeled data (see Section1.2) and diagnostic plots, which further support the handling of the one-class classification methods and particularly understanding their outcome in the absence of representative test data. 
In Section4 a one-class classification task is solved step by step in order to show which outcomes should be screend by the analyst in order to detect deficient settings, input data, or model outcomes and improve the model if necessary (Section 4). 
Hopefully the package is helpful for solving one-class classification problems more effectively and conveniently.
The developer is interested in the developement of new or optimization of existing methods. The package **oneClass** builds upon the powerful package **caret** and tries to adapt its philosophy. 
The package **caret** allows the user to embed own custom functions and performance metrics in the rich infrastructure of the **caret** package. 
Furthermore, convenient functions are available for testing the classifier outcome with positive/negative (PN) test sets (Section 5).

## One-class classifiers

The **oneClass** package is a user-oriented environment for analyzing one-class classification problems. 
It implements three commonly used classifiers, the one-class SVM (OCSVM) [3] and biased SVM (BSVM) [4, 5] via the package **kernlab** [6], and a one-class classifier based on calculating a density ratio with a maximum entropy approach (MAXENT) [7, 8] via the package **dismo** [9]. 
As mentioned before these classifiers are implemented as custom functions for train() for the package **caret** [2]. 
The one-class SVM is a P-classifier, i.e. the classification model is trained with positive samples only. 
Nevertheless, unlabeled samples can be used to calculate PU-performance and support model selection. 
The biased SVM and Maxent are PUclassifiers, i.e. they are trained on positive and unlabeled data. 
Computationally, P-classifiers are usually computationally less complex than PU-classifiers. 
However, PU-classifiers often perform better in terms of classification accuracy because with the information contained in the unlabeled training data models can be build which better fit the particular classification problem to be solved (Section 4.5 and Section 5).

## PU-performance metrics

As with other pattern recognition and machine learning algorithms, it is crutial to parameterize the one-class classification methods carefully. 
The parameterization or model selection is usually performed via a grid-search. The grid points are combinations of discrete parameter values. 
The performance of themodel is evaluated for all grid points and the parameters are chosen which optimizes the performance metric. 
In the case of supervised classification the performance metric, such as the overall accuracy or kappa coefficient.
Such metrics have to be derived from complete validation data comprising the positive and negative class.
They are therefore unidentifiable in a one-class classification situation.
Some performance metrics have been defined which can be derived from positive and unlabeled data (PU-performance metrics). From PU-data we can estimate two interesting probabilities: 
From the positive training samples we can estimate the probability of classifying a positive sample correctly, also known as the true positive rate (TPR). 
From the unlabeled samples we can estimate the probability of classifying a sample as positive, which we call the probability of positive prediction (PPP). 
Given we have two models with the same TPR but with different PPP it is valid to say that the model with lower PPP is more accurate because the TPR is the same but the false positive rate is necessarily lower. 
This is conclusion is however only valid if the TPR can be estimated accurately. 
Furthermore, it does not solve the question which of a set of models is the best when both the TPR and PPP differ.
The PU-performance metrics puF related to the F-score [10] and puAuc related to the area under the receiver operating curve [8] try to give an answer. 
Both have been shown proved shown in the cited references to be suitable for ranking models based on PU-data. 
It is impossible to say which metrics is better in a particular situation. 
Note that puF is based on the TPR and PPP which are derived for a particular threshold, here zero. 
It is possible that the threshold with which TPR and PPP are estimated is not optimal and thus puF can be low even though the model has high discrimnative power. 
Instead puAuc is calculated independent of a particular threshold. In other words, it calculates the performance over the whole range of possible thresholds. 
Thus it also considers thresholds which are definitively unsuitable which might also lead to misleading results [11].
Based on these thoughts and experience it is not recommended to trust these rankings blindely, particularly in challenging classification problems, e.g. with a small amount of positive training samples or an eventually unsuitable set of unlabeled samples. 
We can reasonably assume the the PU-performance metrics are positively correlated with PN-performance metrics, such as the overall accuracy of the kappa coefficient.
But the relationship can be noisy and in the worst case this could mean that the model with the highest PU metric has very poor discriminative power. 
They should rather be used as helpers for selecting a couple of candidate models, which are examined more thoroughly. 
Furthermore, because the performance metrics do not proved information on the absolute accuracy, such as the overall accuracy, they do not reveal if the model is poor even though it might be the best one of all evaluated models, e.g. because non of the specified parameter settings are suitable.
Therefore, it can be useful to also investigate the true positive rate (TPR), and the probability of positive prediction (PPP). The quantities are implemented in the function puSummary() and calculated by default for all models evaluated during model selection.

# Installation and parallel processing

The package can be downloaded from GitHub (https://github.com/benmack/oneClass). 
It can be installed from within R when the package devtools is loaded:

```
require(devtools)
install_github('benmack/oneClass')
```

If a parallel backend is registered for the pacakge **foreach** (http://topepo.github.io/caret/parallel.html) model selection and prediction of raster data can be performed parallel. 
For parallel prediction of raster data the package spatial.tools must also be available. 
The following code registers a parallel backend for foreach via the package doParallel.

In [15]:
require(oneClass)
require(foreach)
require(parallel)
require(doParallel)
cl <- makeCluster(detectCores())
doParallel:::registerDoParallel(cl)

Loading required package: oneClass
Loading required package: caret
: package 'caret' was built under R version 3.2.5Loading required package: foreach
Loading required package: doParallel
Loading required package: foreach


ERROR: Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called 'codetools'
