Skip to content

Latest commit

 

History

History
71 lines (54 loc) · 5.17 KB

README.md

File metadata and controls

71 lines (54 loc) · 5.17 KB

Generalized Matrix Learning Vector Quantization

DOI

A Java implementation of the generalized matrix learning vector quantization, a prototype-based, supervised learning technique.

The project is a plug-in for the weka machine learning framework.

Quick start

Using the weka package manager:

  • download the leatest WEKA version
  • download the GMLVQ plugin zip
  • install and run the weka gui
  • choose Tools - Package manager
  • in the new window, click the File/URL button and locate the packaged GMLVQ downloaded before
  • restart WEKA

To run an analysis with GMLVQ go to the Explorer, choose your data and after selecting the Classify tab you are able to choose GMLVQ located in the functions folder.

Implementation Details

Generalized Matrix Learning Vector Quantization

Conventional LVQ was enhanced by a linear mapping rule described by an OmegaMatrix - putting the M in (GMLVQ). This matrix has a dimension of dataDimension x omegaDimension. The omega dimension can be set to 2...dataDimension. Depending on the chosen omega dimension each data point and prototype will be mapped (respectively linearly transformed) to an embedded data space. Within this data space distance between data points and prototypes are computed and this information is used to compose the update for each learning epoch. Setting the omega dimension to values significantly smaller than the data dimension will drastically speed up the learning process. As mapping to the embedded space of data points is still computationally expensive, we 'cache' these mappings. By invoking DataPoint#getEmbeddedSpaceVector(OmegaMatrix) one can retrieve the EmbeddedSpaceVector for this data point according to the specified mapping rule (provided by the OmegaMatrix). Results are directly link to the data points. So they are only calculated when absolutely necessary and previous results can be recalled at any point. Subsequently, by calling EmbeddedSpaceVector#getWinningInformation(List) one can access the WinningInformation linked to each embedded space vector. These information include the distance to the closest prototype of the same class as the considered data point as well as the distance to the closest prototype of a different class. This information is crucial in composing the update of each epoch as well as for the computation of CostFunctions.

Generalized Matrix Learning Vector Quantization

Also GMLVQ is capable of generalization, meaning various CostFunction rules can be used to guide the learning process. Most notably, it is possible to evaluate the success of each epoch by consulting the F-measure or precision-recall values which is especially important for problems with unbalanced class distributions or for use cases where certain incorrect classifications (e.g. false-negatives) could be critical.

Visualization

Another key feature is the possibility of tracking the influence of individual features within the input data which contribute the most to the training process. This is realized by a lambda matrix (defined as lambda = omega * omega'). This matrix can be visualized and will contain the influence of features to the classification on its principal axis. Other elements describe the correlation between the corresponding features.

Literature & References

When using the GMLVQ plugin, please cite:

  • Bittrich, S., Kaden, M., Leberecht, C., Kaiser, F., Villmann, T., & Labudde, D. (2019). Application of an interpretable classification model on Early Folding Residues during protein folding. BioData mining, 12(1), 1. link

Additional references on GMLVQ:

  • Kaden, M., Lange, M., Nebel, D., Riedel, M., Geweniger, T., & Villmann, T. (2014). Aspects in classification learning-Review of recent developments in Learning Vector Quantization. Foundations of Computing and Decision Sciences, 39(2), 79-105. link
  • Bunte, K., Schneider, P., Hammer, B., Schleif, F. M., Villmann, T., & Biehl, M. (2012). Limited rank matrix learning, discriminative dimension reduction and visualization. Neural Networks, 26, 159-173. link
  • Kästner, M., Hammer, B., Biehl, M., & Villmann, T. (2012). Functional relevance learning in generalized learning vector quantization. Neurocomputing, 90, 85-95. link
  • Hammer, B., & Villmann, T. (2002). Generalized relevance learning vector quantization. Neural Networks, 15(8-9), 1059-1068. link