Skip to content

LTLA/umappp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A C++ library for UMAP

Unit tests Documentation uwot comparison Codecov

Overview

umappp is a header-only C++ implementation of the Uniform Manifold Approximation and Projection (UMAP) algorithm (McInnes, Healy and Melville, 2018). UMAP is a non-linear dimensionality reduction technique that is most commonly used for visualization of complex datasets. This is achieved by placing each observation on a low-dimensional (usually 2D) embedding in a manner that preserves the neighborhood of each observation from the high-dimensional original space. The aim is to ensure that the local structure of the data is faithfully recapitulated in lower dimensions Further theoretical details can be found in the original UMAP documentation; the implementation here is derived from the C++ code in the uwot R package.

Quick start

Given a pointer to a column-major input array with ndim rows and nobs columns, we can compute the UMAP embedding easily:

#include "umappp/Umap.hpp"

std::vector<double> embedding(npts * 2);
umappp::Umap x;
x.run(ndim, nobs, data.data(), 2, embedding.data());

We can modify parameters by calling the relevant setters in the Umap class:

umappp::Umap x;
x.set_num_neighbors(20).set_num_epochs(200);

We can initialize and run the algorithm up to the specified number of epochs:

umappp::Umap x;

auto status = x.initialize(ndim, nobs, data.data(), 2, embedding.data());

for (int iter = 10; iter < 200; iter += 10) {
    status.run(2, embedding.data(), iter);
    // do something with the current embedding, e.g., create an animation
}

Advanced users can control the neighbor search by either providing the search results directly (as a vector of vectors of index-distance pairs) or by providing an appropriate knncolle subclass to the run() or initialize() functions:

umappp::Umap x;
knncolle::AnnoyEuclidean<> searcher(ndim, nobs, data.data());
x.run(&searcher, 2, embedding.data());

See the reference documentation for more details.

Building projects

CMake with FetchContent

If you're already using CMake, you can add something like this to your CMakeLists.txt:

include(FetchContent)

FetchContent_Declare(
  umappp 
  GIT_REPOSITORY https://github.com/LTLA/umappp
  GIT_TAG master # or any version of interest
)

FetchContent_MakeAvailable(umappp)

And then:

# For executables:
target_link_libraries(myexe ltla::umappp)

# For libaries
target_link_libraries(mylib INTERFACE ltla::umappp)

CMake with find_package()

find_package(ltla_umappp CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::umappp)

To install the library, use:

mkdir build && cd build
cmake .. -DUMAPPP_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DUMAPPP_FETCH_EXTERN=OFF. See the commit hashes in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation:

  • The irlba library for singular value decomposition (or in this case, an eigendecomposition). This also requires the Eigen library for matrix operations.
  • The knncolle library for nearest neighbor search. If you are instead supplying your own neighbor search, this dependency can be eliminated by defining the UMAPPP_CUSTOM_NEIGHBORS macro.
  • The aarand library for random distribution functions.

References

McInnes L, Healy J, Melville J (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv, https://arxiv.org/abs/1802.03426

About

UMAP C++ implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •