Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Custom distance matrix input #109

Open
RichieHakim opened this issue Jan 16, 2022 · 3 comments
Open

Feature Request: Custom distance matrix input #109

RichieHakim opened this issue Jan 16, 2022 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@RichieHakim
Copy link

RichieHakim commented Jan 16, 2022

FEATURE REQUEST:

In #8, the possibility of using a custom NN matrix is discussed and noted to be 'easy' to implement.
DavidMChan: " It would be easy to add the ability to pass in a sparse nearest neighbors matrix, however it becomes more complicated if you want to extract the nearest neighbors from a pre-computed distance matrix."

It would be a significant improvement that would open up a lot of use cases if this were implemented.
Specifically: allowing a user to input a custom distance matrix (ie a sparse knn_graph) would be amazing.
It would be sufficient for users already familiar with and using this feature in sklearn's TSNE to directly port their workflow to tsne-cuda.

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
metricstr or callable, default=’euclidean’: ...If metric is “precomputed”, X is assumed to be a distance matrix. ...

Thanks!

@DavidMChan DavidMChan added enhancement New feature or request good first issue Good for newcomers labels Jan 18, 2022
@DavidMChan
Copy link
Member

I'll look into adding this (though, TBH, I can't promise anything), but I'm also happy to accept a PR to address this.

For future reference (and for anyone who wants to give it a shot), the idea would be to shortcut the logic for nearest neighbors here:

tsnecuda::util::KNearestNeighbors(gpu_opt, opt, knn_indices, knn_squared_distances, high_dim_points, high_dim, num_points, num_neighbors);

It's not that hard to do, since the rest of the TSNE algorithm only requires a float distance array of size (N x # neighbors) and a similarly shaped array of the nearest neighbor indices.

The logic for passing arrays is already in place (since we handle pre-initialized T-SNE (see how preinit_data) is handled in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/python/tsnecuda/TSNE.py), and how it's parsed into the actual function call in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/ext/pymodule_ext.cu

All that would have to be done is to create a new option in the options file (just like the pre-init data), https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/include/options.h, and reference it during the main tsne call.

@RichieHakim
Copy link
Author

RichieHakim commented Jul 4, 2022

This is still dearly hoped for.

@loganylchen
Copy link

I have the same requests here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants