Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why using too extensive gene sets (e.g. from bulk RNA-seq) can lead to artifacts #1

Open
halterc opened this issue Feb 20, 2023 · 0 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested

Comments

@halterc
Copy link

halterc commented Feb 20, 2023

Related to question: https://support.bioconductor.org/p/9149618/
In summary:
A user saw that the nFeatures and cell number per sample seemed to confound his UCell scoring (see image).
The reason is likely that this may happen with huge gene sets (>1000 genes), where the # of detected genes can affect ranks just by chance.

The solution:

  • Avoid using too extensive gene sets (e.g. gene-set enrichment analyses from bulk RNA-seq) and lowly/sparsly expressed genes because they can lead to artifacts when applying them to sparse scRNA-seq data.
  • Use small but highly expressed gene sets that will be found across most cells and samples

image

@halterc halterc added documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested labels Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant