Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VpTree does not accept float in knncolle #18

Open
shrit opened this issue Dec 18, 2023 · 4 comments
Open

VpTree does not accept float in knncolle #18

shrit opened this issue Dec 18, 2023 · 4 comments

Comments

@shrit
Copy link

shrit commented Dec 18, 2023

Hi Aaron,

It seems that VpTree does not accept float value as the std::tuple is hard coded to double.

Here is the code:

https://github.com/LTLA/knncolle/blob/3ad6b8cdbd281d78c77390d5a6ded4513bdf3860/include/knncolle/VpTree/VpTree.hpp#L76

Here is the error when trying to use float for Umap:

/./knncolle/VpTree/VpTree.hpp:150:29: error: no matching function for call to ‘std::tuple<int, const double*, double>::tuple(int&, const float*, int)’
  150 |             items.push_back(DataPoint(i, vals + i * num_dim, 0));

It seems that const INTERNAL_t*, is not being propagated correctly from umap, and it is using the default value causing the above error.

I know this is belong to the knn repo, but it is here when the error is happening.

Any easy solution ?

Many thanks

@LTLA
Copy link
Collaborator

LTLA commented Dec 19, 2023

Hopefully fixed by ff42321. Note that VpTree is a decent default for small datasets but you'll probably want to use one of the approximate methods (Annoy or HNSW) for anything larger.

@shrit
Copy link
Author

shrit commented Dec 19, 2023

@LTLA Thank you very much for the quick fix, do you know how I can specify Annoy as a template parameter for Umap ? is there an example that shows how this is done ?
Many thanks

@LTLA
Copy link
Collaborator

LTLA commented Dec 20, 2023

The README has an example:

umappp::Umap x;
knncolle::AnnoyEuclidean<> searcher(ndim, nobs, data.data());
x.run(&searcher, 2, embedding.data());

You might need to add <float> in various places above.

TBH I don't usually use this method signature for run(), as I compute the neighbor list manually and provide it to Umap::run(). This is because I might end up re-using the neighbor list with different UMAP parameters so I just compute the same set of neighbors once and use it across different UMAP runs.

If you want some more inspiration, check out these bindings:

@shrit
Copy link
Author

shrit commented Jan 8, 2024

Perfect, thank you very much, I will keep this open for a while until I have the chance to run it again and let you know if I have any question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants