[integration/peerlab] feature: adding a reference gene correlation#32
Merged
Tobiaspk merged 11 commits intointegration/peerlabfrom May 7, 2026
Merged
Conversation
Collaborator
|
Good PR. Here's what should be changed before we can merge:
Creating some tests now to double check these changes |
… missing from the reference
f42aafb to
026d06d
Compare
Collaborator
|
Thanks @ananya-nandula and @nkalfus for your contributions. A custom gene correlation can now be provided. Missing feature for next PR: filling genes if they are missing from the reference. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Revised the gene correlation reference feature so that we check to see if it contains raw counts, and then normalize counts the same way segger does. Also, added an assertion to make sure that all genes in the reference anndata pass the genes min counts threshold set by segger. Example use case: if you have multiple samples you want to segment but want to use the same gene-gene correlation prior, pool the desired samples together into one anndata, ensuring that the anndata contains raw counts, all genes pass the min counts threshold, and that the anndata contains all genes present in the samples you want to segment.
New parameters:
gene_corr_reference_path(--gene-corr-reference-path): Path to an.h5adreference dataset. Used to compute gene-gene PCA embeddings..Xmust be raw counts.gene_missing_strategy(--gene-missing-strategy): What to do when genes in the data are missing in the reference. Defaults to "error". Alternative use "remove" to remove these from the data. Work in progress is "fill" which will use correlations from the data for missing genes only.