megaman
is a scalable manifold learning package implemented in
python. It has a front-end API designed to be familiar
to scikit-learn but harnesses
the C++ Fast Library for Approximate Nearest Neighbors (FLANN)
and the Sparse Symmetric Positive Definite (SSPD) solver
Locally Optimal Block Precodition Gradient (LOBPCG) method
to scale manifold learning algorithms to large data sets.
On a personal computer megaman can embed 1 million data points
with hundreds of dimensions in 10 minutes.
megaman is designed for researchers and as such caches intermediary
steps and indices to allow for fast re-computation with new parameters.
Package documentation can be found at http://mmp2.github.io/megaman/
You can also find our arXiv paper at http://arxiv.org/abs/1603.02763
The easiest way to install megaman
and its dependencies is with
conda, the cross-platform package
manager for the scientific Python ecosystem.
To install megaman and its dependencies, run
$ conda install --channel=jakevdp megaman
Currently builds are available for OSX-64 and Linux-64, on Python 2.7, 3.4, and 3.5. For other operating systems, see the full install instructions below.
To install megaman from source requires the following:
- python tested with versions 2.7, 3.4, and 3.5
- numpy version 1.8 or higher
- scipy version 0.16.0 or higher
- scikit-learn
- FLANN
- cython
- a C++ compiler such as
gcc
/g++
Optional requirements include
- pyamg, which allows for faster decompositions of large matrices
- pyflann which offers another method of computing distance matrices (this is bundled with the FLANN source code)
- nose for running the unit tests
These requirements can be installed on Linux and MacOSX using the following conda command:
$ conda install --channel=jakevdp pip nose coverage gcc cython numpy scipy scikit-learn pyflann pyamg
Finally, within the source repository, run this command to install the megaman
package itself:
$ python setup.py install
megaman uses nose
for unit tests. With nose
installed, type
$ make test
to run the unit tests. megaman
is tested on Python versions 2.7, 3.4, and 3.5.
We have the following planned updates for upcoming releases:
- Native support for K-Nearest Neighbors distance (in progress)
- Lazy R-metric (only calcualte on selected points)
- Make cover_plotter.py work more generally with rmetric.py