-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute gene scores without GC content annotation #12
Comments
At this point, we do not yet provide a function of computing GC content but we will look to add this feature. GC content represents the fraction of Gs or Cs in the peak sequence. This can be computed using Bioconda or Bioconductor packages. We have been using ArchR for ATAC preprocessing which generates this information automatically. GC content is unfortunately a requirement for computing gene scores since it is necessary for background peak identification for expression-accessibility correlations |
Thanks a lot for the suggestions. This feature would be very useful. I was able to calculate the gene score using genomepy and biopython. I extracted the peak sequences with genomepy and calculated the gc content with the GC function from Bio.SeqUtils. It was pretty fast and easy actually:). |
Awesome ! Do you mind sharing the code snippet ? We will add that into the
functionality in the repo.
On Fri, May 20, 2022 at 02:39 Marie Becker ***@***.***> wrote:
Thanks a lot for the suggestions. This feature would be very useful.
I was able to calculate the gene score using genomepy and biopython. I
extracted the peak sequences with genomepy and calculated the gc content
with the GC function from Bio.SeqUtils. It was pretty fast and easy
actually:).
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQNH3BG7MO3YBJGDYHPLLTVK5MW3ANCNFSM5WMMXZCA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Manu
|
Hi Manu, sure, no problem. Thank you for adding this! Here is the code I used to download the right genome and to add the GC annotation.
To find the right genome name and provider, I used
Best regards, |
Thank you! @sitarapersad Can you please add this to the utils. |
Hey,
thanks a lot for your package. It is very useful:).
I'm currently working with a multiome dataset for which I want to calculate gene scores for the ATAC data to compare them to the gene expression data. I'm using the data set from the NeurIPS Competition Open Problems in Single Cell Analysis (https://openproblems.bio/neurips_docs/data/dataset/). However, in my dataset there is no GC annotation. That's why the method genescore.prepare_multiome_anndata() crashes and I'm not able to use the follow up methods to compute the gene scores.
Do you provide a method to calculate this or do you have another easy way to add this annotation to the data? Or is there another way to calculate the gene scores without the GC annotation.
Thanks a lot for your help.
Best regards,
Marie Becker
The text was updated successfully, but these errors were encountered: