Skip to content

GjjvdBurg/GenSVM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenSVM

This is the repository for the C implementation of GenSVM, a generalized multiclass support vector machine proposed in:

GenSVM: A Generalized Multiclass Support Vector Machine
G.J.J. van den Burg and P.J.F. Groenen
Journal of Machine Learning Research, 2016.

GenSVM is available in these languages:

Language URL
https://github.com/GjjvdBurg/PyGenSVM
https://github.com/GjjvdBurg/RGenSVM
https://github.com/GjjvdBurg/GenSVM

Introduction

GenSVM is a general multiclass support vector machine, which you can use for classification problems with multiple classes. Training GenSVM in cross-validation or grid search setups can be done efficiently due to the ability to use warm starts. See the paper for more information, and Usage below for how to use GenSVM.

The library has support for datasets in MSVMpack and LibSVM/SVMlight format, and can take advantage of sparse datasets. There is also preliminary support for nonlinear GenSVM through kernels.

For documentation on how the library is implemented, see the Doxygen documentation available here. There are also many unit tests, which you can use to further understand how the library works. For the latest version of the library you can view the test coverage report online.

This is the C library for GenSVM that contains two executables for using the method. A Python package for GenSVM is available here. An R package for GenSVM is planned. If you are interested in this, please express your interest for the R package here.

Usage

First, download and compile the library. Minimal requirements for compilation are a working BLAS and LAPACK installation, which you can likely obtain from your package manager. It is however recommended to use ATLAS versions of these libraries, since this will give a significant increase in speed. If you choose not to use ATLAS, remove linking with -latlas in the LDFLAGS variable in the Makefile.

Then, compile the library with a simple:

$ make

If you like to run the tests, use make test on the command line.

After successful compilation, you will have two executables gensvm and gensvm_grid. Type:

$ ./gensvm

To get an overview of the command line options to the executable (similar for gensvm_grid).

The gensvm executable can be used to train a GenSVM model on a dataset with a single hyperparameter configuration, whereas the gensvm_grid executable can be used to run a grid search on a dataset.

Here's an example of using the gensvm executable on a single dataset, with some custom parameters:

$ ./gensvm -l 1e-5 -k 1.0 -p 1.5 data/iris.train

This fits the model with regularization parameter 1e-5, Huber hinge parameter 1.0 and lp norm parameter 1.5, and default settings otherwise. On my computer this yields a model with 18 support vectors in about 0.1 seconds. The gensvm executable can also be used to get predictions for a test dataset, if it is supplied as final argument to the command. In this case, predictions will be printed to stdout, unless an output file is specified with the -o option.

The gensvm_grid executable can be used to run a grid search on a dataset. The input to this executable is a file (called a grid file), which specifies the values of the parameters. See the training directory for examples and the documentation here for more info on the file format. One important thing to note is that when the repeats field has a positive value, a so-called "consistency check" will be performed after the grid search has finished. This is a robustness check on the best performing configurations, to find the best overall hyperparameter configuration with the best performance and smallest training time. In this robustness check warm-starts are not used, to ensure the observations are independent measurements of training time.

Here's an example of running gensvm_grid without repeats on the iris dataset:

$ ./gensvm_grid training/iris_norepeats.training

On my computer this runs in about 8 seconds with 342 hyperparameter configurations. Alternatively, if consistency checks are desired we can run:

$ ./gensvm_grid training/iris.training

which runs the same grid search but also does 5 consistency repeats for each of the configurations with the 5% best performance. Note that the performance is measured by cross-validated accuracy scores. This example runs in about 13 seconds on my computer.

Reference

If you use GenSVM in any of your projects, please cite the GenSVM paper available at http://jmlr.org/papers/v17/14-526.html. You can use the following BibTeX code:

@article{JMLR:v17:14-526,
        author  = {Gerrit J.J. van den Burg and Patrick J.F. Groenen},
        title   = {{GenSVM}: A Generalized Multiclass Support Vector Machine},
        journal = {Journal of Machine Learning Research},
        year    = {2016},
        volume  = {17},
        number  = {225},
        pages   = {1-42},
        url     = {http://jmlr.org/papers/v17/14-526.html}
}

License

Copyright 2016, G.J.J. van den Burg.

GenSVM is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

GenSVM is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GenSVM. If not, see <http://www.gnu.org/licenses/>.

For more information please contact:

G.J.J. van den Burg
email: gertjanvandenburg@gmail.com