Feature Request: Custom distance matrix input #109

RichieHakim · 2022-01-16T19:56:20Z

FEATURE REQUEST:

In #8, the possibility of using a custom NN matrix is discussed and noted to be 'easy' to implement.
DavidMChan: " It would be easy to add the ability to pass in a sparse nearest neighbors matrix, however it becomes more complicated if you want to extract the nearest neighbors from a pre-computed distance matrix."

It would be a significant improvement that would open up a lot of use cases if this were implemented.
Specifically: allowing a user to input a custom distance matrix (ie a sparse knn_graph) would be amazing.
It would be sufficient for users already familiar with and using this feature in sklearn's TSNE to directly port their workflow to tsne-cuda.

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
metricstr or callable, default=’euclidean’: ...If metric is “precomputed”, X is assumed to be a distance matrix. ...

Thanks!

The text was updated successfully, but these errors were encountered:

DavidMChan · 2022-01-18T20:26:44Z

I'll look into adding this (though, TBH, I can't promise anything), but I'm also happy to accept a PR to address this.

For future reference (and for anyone who wants to give it a shot), the idea would be to shortcut the logic for nearest neighbors here:

tsne-cuda/src/fit_tsne.cu

Line 118 in b740a7d

    
           tsnecuda::util::KNearestNeighbors(gpu_opt, opt, knn_indices, knn_squared_distances, high_dim_points, high_dim, num_points, num_neighbors);

It's not that hard to do, since the rest of the TSNE algorithm only requires a float distance array of size (N x # neighbors) and a similarly shaped array of the nearest neighbor indices.

The logic for passing arrays is already in place (since we handle pre-initialized T-SNE (see how preinit_data) is handled in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/python/tsnecuda/TSNE.py), and how it's parsed into the actual function call in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/ext/pymodule_ext.cu

All that would have to be done is to create a new option in the options file (just like the pre-init data), https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/include/options.h, and reference it during the main tsne call.

RichieHakim · 2022-07-04T20:31:38Z

This is still dearly hoped for.

loganylchen · 2023-12-19T04:03:23Z

I have the same requests here.

DavidMChan added enhancement New feature or request good first issue Good for newcomers labels Jan 18, 2022

RichieHakim mentioned this issue Jul 4, 2022

[FEA] TSNE and UMAP: Allow input to be precomputed distance matrix rapidsai/cuml#4799

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Custom distance matrix input #109

Feature Request: Custom distance matrix input #109

RichieHakim commented Jan 16, 2022 •

edited

DavidMChan commented Jan 18, 2022

RichieHakim commented Jul 4, 2022 •

edited

loganylchen commented Dec 19, 2023

Feature Request: Custom distance matrix input #109

Feature Request: Custom distance matrix input #109

Comments

RichieHakim commented Jan 16, 2022 • edited

DavidMChan commented Jan 18, 2022

RichieHakim commented Jul 4, 2022 • edited

loganylchen commented Dec 19, 2023

RichieHakim commented Jan 16, 2022 •

edited

RichieHakim commented Jul 4, 2022 •

edited