All zero check#41
Merged
deto merged 4 commits intoYosefLab:masterfrom Feb 24, 2025
Merged
Conversation
Member
|
Thanks - it makes sense that this check isn't expansive enough. However, the real issue is that the std. deviation for genes that are all 0 is zero and then this leads to nans/infinities later on. Could you change the PR so that instead of checking for all zero, you check to see if the std of the gene == 0? I think that should cover all cases better. |
Contributor
Author
|
Thanks @deto, I updated to check if a gene row is constant, which means it has 0 std. In addition there is a warning for the use of a sparse matrix different from csr format but the warning text said we should use a csc matrix. I updated to csr, is it correct? |
deto
approved these changes
Feb 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the Hotspot init, there is a check to ensure that a gene is not 'all zero'. i..e. never expressed in any cell.
It checks that the sum of its expression over all cells is greater than 0, which makes sense if we consider NB or Bernoulli data; but if we have 'normal' or 'none' (meaning no assumption about data distribution but assuming data is already standardized) one can get negative or zero sums. For instance this is the case if one wants to compute the autocorrelation of module scores, which are PC loadings and real numbers.
Thus I propose to change the valid_genes computation to consider an 'all zero' column iff all elements are zero.