Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precompute function #10

Closed
FelixTheStudent opened this issue Jul 6, 2021 · 2 comments
Closed

precompute function #10

FelixTheStudent opened this issue Jul 6, 2021 · 2 comments

Comments

@FelixTheStudent
Copy link
Owner

Problem:
For large data set (MS data from Schirmer group, for example), the interactive commands (rule + plot_last) are permissively slow.

Solution:
Precomputing the totalUMI S would speed things up for the MS data set, since raw is in gene-wise HDF5Array format (so cell-wise computations take long).

I found it unnecessary to precompute K when using an NxN neighbor graph (SNN > .1), since that uses matrix multiplication which is incredibly fast. I still have to test an Nxk neighbor graph (i.e. matrix with indices of kNN for each cell), that might be slower.

Both cases could be solved with something like this:

obj <- list(raw, neighbors, embedding)
obj <- precompute(obj)  # computes S
obj <- precompute(obj, genes=c("PLP1", "AQP1") # precomputes genes
@FelixTheStudent
Copy link
Owner Author

I will close this issue because a separate precompute function at this point is an overkill, for the following reason:

  • computing totalUMI is the computational bottle-neck, not NxN or Nxk pooling.
  • I will create a slot called totalUMI or totals in the object, that evaluate_rule then searches for totals. That's more intuitive to the user than a precompute function, and in the vignettes I can simply write precomputing totals can speed up the interactive pypes (especially if your raw UMI data is in gene-wise HDF5Array, which I recommend for large data sets).

Here is some quick code (in part taken from evaluate_rule) which I used to show that pooling with Nxk neighbors is instant (on my laptop):

library(dataMS)
x <- ms_raw[, "PLP1"]

NxN_neighbors <- as(ms_snn, "dgCMatrix") > .1
as.numeric( as(NxN_neighbors, "dgCMatrix") %*% x ) 

Nxk_neighbors <- ms_nn$idx
rowSums(matrix(data=x[c(Nxk_neighbors)], ncol=ncol(Nxk_neighbors)))

@FelixTheStudent
Copy link
Owner Author

I close this issue for the above reasons. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant