-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a projectReducedDims
function
#168
Comments
Seems like a bad idea but also something people would certainly use. I like the weighted average idea, something like projectReducedDim <- function(old.points, new.points, old.embedding, k = 2) {
res <- queryKNN(X = old.points, query = new.points, k = k)
weight <- 1 / res$distance
weight <- weight / rowSums(weight)
new.embedding <- sapply(1:ncol(old.embedding), function(i) {
rowMeans(
sapply(1:ncol(res$index), function(j) {
old.embedding[res$index[, j], i] * weight[, j]
})
)
})
new.embedding
} I realise there's surely a more elegant way to do the nested loops. I could also wrap the snifter stuff into scater without a terrible amount of effort. |
Check out https://github.com/LTLA/batchelor/blob/master/R/utils_tricube.R for a tricube-weighted average based on the nearest neighbors. Watch out for problems with distances of zero if you're going to use inverse weights. |
Ooh, perfect. Might just |
Probably best to just copy it over, avoid an explicit dependency on batchelor. It's not too large and it should be easy to drag across the few unit tests just in case. |
Resolved by commit(s) above but feel free to submit feedback/gripes here |
Recently came up in discussions with a user, who wanted something like Seurat's
ProjectUMAP
. The idea is to map new data onto an existing embedding. Kind of like how snifter does it, but for any target embedding without requiring special knowledge.For general use, this is probably not a great idea, mostly because the new data may contain populations that weren't present in the old data, and so they go... who knows where. It's also slightly tedious in that the user has to effectively maintain two analyses side-by-side, i.e., that using the old data only and that using the new data, rather than having a single analysis with both old and new datasets.
Nonetheless, a projection can be useful in specific cases where the preservation of the existing embedding is non-negotiable. And by that, I mean embeddings that are being used in publications and one doesn't want a new fight with the reviewers.
To this end, a quick and dirty projection function might look like:
Basically, just plonk each new cell at the embedding location of its nearest neighbor in the old dataset, where neighbors are defined according to some low-dimensional space. Users can decide what space they want to use here; for a quick-and-dirty projection, a raw PCA might suffice, but for something more "correct", you could use the MNN-corrected PCs from batchelor.
A more sophisticated approach might take some kind of (weighted) mean across multiple nearest neighbors, rather than just inserting the cell directly at the closest neighbor. This probably will add some jitter that makes it look more realistic. ¯\_(ツ)_/¯
The text was updated successfully, but these errors were encountered: