Skip to content

UMAP-rs not efficient #2

@jianshu93

Description

@jianshu93

Dear Cell-ranger team,

It seems scan-rs is still using the very old vantage point data structure for nearest neighbor search, a key step of UMAP and t-SNE. However, recent breakthroughs in nearest neighbor search, e.g., proximity graph based algorithm has been proposed, which can be much faster and also accurate in terms of recall (e.g. HNSW, NSG). More important, it can be efficiently parallelized. In addition to the NNS step, UMAP steps, including cross entropy optimization, embedding space initialization, are all single threaded, thus slow for large dataset such as millions or billions of samples (it will be soon easy to have such large-scale dataset). I think the non-linear dimension reductions step can be further improved/accelerated.

Thanks,

Jianshu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions