Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] support n_components=3, happy to contribute :) #71

Open
shun-lin opened this issue Nov 6, 2019 · 9 comments
Open
Labels
enhancement New feature or request help wanted Extra attention is needed inactive

Comments

@shun-lin
Copy link

shun-lin commented Nov 6, 2019

Hi!

I recently need to need to use tsne / umap to visualize the embeddings generated from some tf models I am testing, and I found this repository and it's super useful and super fast, thanks so much! I just wonder if I can help contribute to support n_components=3 as I would also like to visualize it in 3D, if it's feasible to do so. If so, may you give me a few pointers on where to start? Thanks!

@DavidMChan DavidMChan added the enhancement New feature or request label Nov 8, 2019
@DavidMChan
Copy link
Member

Thanks for your interest on developing for TSNE-CUDA! I don't think there's a huge amount of complexity, at least mathematically in extending it to 3D visualization. There is, however, a huge amount of generally annoying development work.

There are a few things that need to be handled. The main function is in src/fit_tsne.cu. First, all of the vectors that are designed to handle X-Y points need to be expanded so they can handle an arbitrary (or at least 2/3D) dimensions (for example, the vector on line 164). Next, the CUDA kernels in src/kernels need to be re-written so that they correctly index the arrays for 3D points. This is mostly systematic, but can be tricky to get right. Finally, the repulsive force calculation needs to be re-adapted to handle 3 dimensions. Our original FIT-tsne code is based on the repository here: https://github.com/KlugerLab/FIt-SNE which supports 3D, but we didn't pull a lot of the 3D code.

In the end, it's not a crazy mathematical challenge, but there's a pretty large amount of code to be re-written. If you start a branch/fork we'd love to assist as needed!

@shun-lin
Copy link
Author

shun-lin commented Nov 8, 2019

Thanks for the pointers @DavidMChan will look into it!

@LucaCappelletti94
Copy link

Did you have any luck with this? I would love to be able to use it for a package I made for 3D visualizations.

@shun-lin
Copy link
Author

shun-lin commented Apr 6, 2020

Hi @LucaCappelletti94 , nothing yet, didn't get too much time to be able to dig deeper.

@DavidMChan
Copy link
Member

I'm actually just now starting to get back into active development for this code - and it's on my ToDo list (as one of the most requested features).

@shun-lin
Copy link
Author

shun-lin commented Apr 6, 2020

yay thanks @DavidMChan :) :) Very excited!

@DavidMChan
Copy link
Member

DavidMChan commented May 22, 2020

Approximately what sized datasets are you thinking of for 3D visualization? I've been going through our code and the potential implementation, and the scaling for 3D isn't very good when it comes to the FIt-SNE algorithm. You might actually be better off using a tool like https://github.com/tensorflow/tfjs-tsne which is somewhat slower, but will likely (fundamentally) scale better to higher dimension.

@LucaCappelletti94
Copy link

It still could be useful by just running it on a significant subset of any given dataset, keeping all the pipeline in python.

@shun-lin
Copy link
Author

@DavidMChan for my use case my dimensions is roughly [128, 1M] (128 dimension, ~ million of examples), would tfjs-tsne perform better in this case? And I agree with @LucaCappelletti94 that it would be nicer to keep everything in python as well (one of my main use-case is to show visualization on Google Colab). Thanks so much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed inactive
Projects
None yet
Development

No branches or pull requests

3 participants