Add a `projectReducedDims` function #168

LTLA · 2022-05-17T04:30:36Z

Recently came up in discussions with a user, who wanted something like Seurat's ProjectUMAP. The idea is to map new data onto an existing embedding. Kind of like how snifter does it, but for any target embedding without requiring special knowledge.

For general use, this is probably not a great idea, mostly because the new data may contain populations that weren't present in the old data, and so they go... who knows where. It's also slightly tedious in that the user has to effectively maintain two analyses side-by-side, i.e., that using the old data only and that using the new data, rather than having a single analysis with both old and new datasets.

Nonetheless, a projection can be useful in specific cases where the preservation of the existing embedding is non-negotiable. And by that, I mean embeddings that are being used in publications and one doesn't want a new fight with the reviewers.

To this end, a quick and dirty projection function might look like:

# Completely untested
projectReducedDims <- function(old.points, new.points, old.embedding) {
    res <- queryKNN(X = old.points, query = new.points, k = 1)
    new.embedding <- old.embedding[res$index,,drop=FALSE]
    new.embedding
}

Basically, just plonk each new cell at the embedding location of its nearest neighbor in the old dataset, where neighbors are defined according to some low-dimensional space. Users can decide what space they want to use here; for a quick-and-dirty projection, a raw PCA might suffice, but for something more "correct", you could use the MNN-corrected PCs from batchelor.

A more sophisticated approach might take some kind of (weighted) mean across multiple nearest neighbors, rather than just inserting the cell directly at the closest neighbor. This probably will add some jitter that makes it look more realistic. ¯\_(ツ)_/¯

The text was updated successfully, but these errors were encountered:

alanocallaghan · 2022-05-17T09:57:13Z

Seems like a bad idea but also something people would certainly use. I like the weighted average idea, something like

projectReducedDim <- function(old.points, new.points, old.embedding, k = 2) {
    res <- queryKNN(X = old.points, query = new.points, k = k)
    weight <- 1 / res$distance
    weight <- weight / rowSums(weight)
    new.embedding <- sapply(1:ncol(old.embedding), function(i) {
        rowMeans(
            sapply(1:ncol(res$index), function(j) {
                old.embedding[res$index[, j], i] * weight[, j]
            })
        )
    })
    new.embedding
}

I realise there's surely a more elegant way to do the nested loops.

I could also wrap the snifter stuff into scater without a terrible amount of effort.

LTLA · 2022-05-18T22:39:38Z

Check out https://github.com/LTLA/batchelor/blob/master/R/utils_tricube.R for a tricube-weighted average based on the nearest neighbors. Watch out for problems with distances of zero if you're going to use inverse weights.

alanocallaghan · 2022-05-19T12:48:49Z

Ooh, perfect. Might just ::: that unless that's deeply frowned on

LTLA · 2022-05-19T15:17:19Z

Probably best to just copy it over, avoid an explicit dependency on batchelor. It's not too large and it should be easy to drag across the few unit tests just in case.

#150) (#170) * Implement projection of reducedDims (#168), import more (#169), support "colour" and "color" args (#150)

alanocallaghan · 2022-07-25T11:33:45Z

Resolved by commit(s) above but feel free to submit feedback/gripes here

alanocallaghan added a commit that referenced this issue May 23, 2022

Implement projection of reducedDims (#168), import more (#169)

be2798b

alanocallaghan added a commit that referenced this issue Jul 18, 2022

Implement projection of reducedDims (#168), import more (#169), colour (

d5f429b

#150) (#170) * Implement projection of reducedDims (#168), import more (#169), support "colour" and "color" args (#150)

alanocallaghan closed this as completed Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `projectReducedDims` function #168

Add a `projectReducedDims` function #168

LTLA commented May 17, 2022

alanocallaghan commented May 17, 2022

LTLA commented May 18, 2022

alanocallaghan commented May 19, 2022

LTLA commented May 19, 2022

alanocallaghan commented Jul 25, 2022

Add a projectReducedDims function #168

Add a projectReducedDims function #168

Comments

LTLA commented May 17, 2022

alanocallaghan commented May 17, 2022

LTLA commented May 18, 2022

alanocallaghan commented May 19, 2022

LTLA commented May 19, 2022

alanocallaghan commented Jul 25, 2022

Add a `projectReducedDims` function #168

Add a `projectReducedDims` function #168