Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a projectReducedDims function #168

Closed
LTLA opened this issue May 17, 2022 · 5 comments
Closed

Add a projectReducedDims function #168

LTLA opened this issue May 17, 2022 · 5 comments

Comments

@LTLA
Copy link
Collaborator

LTLA commented May 17, 2022

Recently came up in discussions with a user, who wanted something like Seurat's ProjectUMAP. The idea is to map new data onto an existing embedding. Kind of like how snifter does it, but for any target embedding without requiring special knowledge.

For general use, this is probably not a great idea, mostly because the new data may contain populations that weren't present in the old data, and so they go... who knows where. It's also slightly tedious in that the user has to effectively maintain two analyses side-by-side, i.e., that using the old data only and that using the new data, rather than having a single analysis with both old and new datasets.

Nonetheless, a projection can be useful in specific cases where the preservation of the existing embedding is non-negotiable. And by that, I mean embeddings that are being used in publications and one doesn't want a new fight with the reviewers.

To this end, a quick and dirty projection function might look like:

# Completely untested
projectReducedDims <- function(old.points, new.points, old.embedding) {
    res <- queryKNN(X = old.points, query = new.points, k = 1)
    new.embedding <- old.embedding[res$index,,drop=FALSE]
    new.embedding
}

Basically, just plonk each new cell at the embedding location of its nearest neighbor in the old dataset, where neighbors are defined according to some low-dimensional space. Users can decide what space they want to use here; for a quick-and-dirty projection, a raw PCA might suffice, but for something more "correct", you could use the MNN-corrected PCs from batchelor.

A more sophisticated approach might take some kind of (weighted) mean across multiple nearest neighbors, rather than just inserting the cell directly at the closest neighbor. This probably will add some jitter that makes it look more realistic. ¯\_(ツ)_/¯

@alanocallaghan
Copy link
Owner

Seems like a bad idea but also something people would certainly use. I like the weighted average idea, something like

projectReducedDim <- function(old.points, new.points, old.embedding, k = 2) {
    res <- queryKNN(X = old.points, query = new.points, k = k)
    weight <- 1 / res$distance
    weight <- weight / rowSums(weight)
    new.embedding <- sapply(1:ncol(old.embedding), function(i) {
        rowMeans(
            sapply(1:ncol(res$index), function(j) {
                old.embedding[res$index[, j], i] * weight[, j]
            })
        )
    })
    new.embedding
}

I realise there's surely a more elegant way to do the nested loops.

I could also wrap the snifter stuff into scater without a terrible amount of effort.

@LTLA
Copy link
Collaborator Author

LTLA commented May 18, 2022

Check out https://github.com/LTLA/batchelor/blob/master/R/utils_tricube.R for a tricube-weighted average based on the nearest neighbors. Watch out for problems with distances of zero if you're going to use inverse weights.

@alanocallaghan
Copy link
Owner

Ooh, perfect. Might just ::: that unless that's deeply frowned on

@LTLA
Copy link
Collaborator Author

LTLA commented May 19, 2022

Probably best to just copy it over, avoid an explicit dependency on batchelor. It's not too large and it should be easy to drag across the few unit tests just in case.

alanocallaghan added a commit that referenced this issue Jul 18, 2022
#150) (#170)

* Implement projection of reducedDims (#168), import more (#169), support "colour" and "color" args (#150)
@alanocallaghan
Copy link
Owner

Resolved by commit(s) above but feel free to submit feedback/gripes here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants