UMAP fit/transform approach #9

amarrerod · 2022-10-25T14:21:01Z

I've been searching through the examples of your UMAP library and I was wondering if there is any option to use it in a fit/transform way similar to the UMAP library in python.

I mean, loading a dataset to compute the embeding and then using a transform method to transform new samples into the existing embedded space and get the transformed output.

Thank you so much!

LTLA · 2022-10-25T15:12:41Z

Probably not, I don't remember adding this. I would need to have a look at how uwot does it. Might be pretty simple if it's just a weighted average of neighbors; a PR would be welcome.

jlmelville · 2022-10-25T17:02:59Z

Transforming a new point involves:

finding the nearest neighbors from the old points, so you need to store the index that was built during the initial construction.
constructing the fuzzy set memberships values with respect to those neighbors, i.e. the similarities. You also adjust the local connectivity constraint here (practically this just means you don't shift the exponential in the similarity calculation, rho always equals zero).
initializing the coordinates of the new point in the low-dimensional space by an average of the coordinates of the nearest neighbors (so the low dimensional coordinates of the original points must also be stored). Or maybe a weighted average using the similarities? Uwot can do both: one of them is the Python UMAP way and one is something I added to see if it made a difference, but I can't remember which is which. I don't think it has turned out to be important.
optimizing the coordinates. This is the same gradient descent as with the usual layout optimization but you DO NOT want the original coordinates to change. Therefore you need a scheme to keep track of which nodes are which as part of the edge list which is used in the optimization. Also the learning rate is scaled down.

transform.R in uwot is a bit of a disaster but mainly due to trying to maintain backwards compatibility and allow for an ever-increasing number of ways to provide input data. If you ignore all that then apart from setting up the edge list appropriately there isn't a lot of special casing for transforming and the structure of the smooth knn and optimization C++ code works as-is.

jlmelville · 2022-10-25T17:05:59Z

Oh but also be aware of jlmelville/uwot#103 which I have been unable to reproduce but may indicate a bug somewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UMAP fit/transform approach #9

UMAP fit/transform approach #9

amarrerod commented Oct 25, 2022

LTLA commented Oct 25, 2022

jlmelville commented Oct 25, 2022

jlmelville commented Oct 25, 2022

UMAP fit/transform approach #9

UMAP fit/transform approach #9

Comments

amarrerod commented Oct 25, 2022

LTLA commented Oct 25, 2022

jlmelville commented Oct 25, 2022

jlmelville commented Oct 25, 2022