Skip to content

david-dunson/SPAclassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classification via local manifold approximation

Authors: Didong Li and David B. Dunson (2019) https://arxiv.org/abs/1903.00985

Classifiers label data as belonging to one of a set of groups based on input features. It is challenging to obtain accurate classification performance when the feature distributions in the different classes are complex, with nonlinear, overlapping and intersecting supports. This is particularly true when training data are limited. To address this problem, this article proposes a new type of classifier based on obtaining a local approximation to the support of the data within each class in a neighborhood of the feature to be classified, and assigning the feature to the class having the closest support. This general algorithm is referred to as LOcal Manifold Approximation (LOMA) classification. As a simple and theoretically supported special case having excellent performance in a broad variety of examples, we use spheres for local approximation, obtaining a SPherical Approximation (SPA) classifier. We illustrate substantial gains for SPA over competitors on a variety of challenging simulated and real data examples.

SPA classifier: Code Documentation

As a realization of LOcal Manifold Approximation (LOMA) classifier, SPherical Approximation (SPA) classifier is designed for classification when the feature in different classes have complex, nonlinear, overlapping and intersecting support, especially when the training data are limited.

Here is a list of files:

SPCA.m: the function for fitting data by a sphere. cls_spherelets.m: the main function to classify data with given tuning parameters k and d SPA_tune.m: the function to tune parameters k and d given training samples FunkyCurves_noise.mat: An example data set consisting of 1500 points sampled from three entangled funky curves with Gaussian noise. FunkyCurves_demo.m: Apply SPA to this Funky curves data set, plot the classification accuracy when the training sample size ranges from 150 to 300.

To use the code: Input: X_tr, y_tr, X_te, y_te Step 1: run SPA_tune(X_tr,y_tr) to get the optimal k and d Step 2: run cls_spherelets(X_tr,y_tr,X_te,y_te,k,d) for classification

In SPA_tune, there are two default set of choices for k and d. User may change the default sets, depending on the specific dataset. For example, when the intrinsic dimension of the manifold is know, like the funky curves case (d=1), d does not need to be tuned. Similarly, if there is some information about d, the set can be changed accordingly. If the dimension of the ambient space is too large, user can replace the “PCA” step in SPCA function by sparse PCA or other efficient alternatives of PCA. If the total sample size is too large, user can randomly choose a smaller and computable subset and tune d only (not k) on that subset, since d does not depend on the sample size. Then return to the entire dataset, the only tuning parameter is k, making the tuning process much more efficient.

Some restrictions for tuning parameters: d must be integers between 1 and p-1; k must be integers between 3 and n_min/2, where n_min is the size of the smallest class; k is no smaller than d+1. Depending on the computation time, user can try more/fewer pairs of k and d.

About

Spherical Approximation Classifier

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages