-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to interpret the probability and max score in the output files? #14
Comments
Dear Ankit,
No worries - this is fine!
Can you please elaborate on how should I interpret the probability and
score given by dREG to each peak? For example, I used it on GROseq data
from mice cell line, and there were many peaks which have probability as
0.0 or very low (~10E-15) but dREG score such as 0.3 or 0.4, does that mean
something?
dREG scores are the raw output of the SVR. Values near 1 represent a region
that looks very much like a TIR; values near 0 represent a region that
looks very much like either a gene body or an intergenic region where there
is no evidence of transcription initiation. Values near 0.3 and 0.4 are
also very likely TIRs.
The p-values (what you call probability) represents the probability of
observing a dREG score >0 if the region is actully either a gene body or an
intergenic region. These can be used a lot like you would use the p-value
or (when corrected by FDR) the q-values provided by any other peak caller.
Also is there a way to identify enhancers from all peaks? Or should I just
consider every detected peak outside a gene as an enhancer?
We usually define candidate enhancers as dREG peaks that are located
distally from an annotated transcription start site. How far from a TSS
they have to be is arbitrary - lots of papers have used values >10kb
(though this probably treats some proximal enhancers as promoters).
Best,
Charles
… Thanks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#14>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYUH7MRQ3IRIMSJ4OMV6KTT3675ZANCNFSM5BZ35IQA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
--
Robert N. Noyce Assistant Professor in Life Science & Technology
Baker Institute for Animal Health
College of Veterinary Medicine
Cornell University
235 Hungerford Hill Road
Ithaca, NY 14853
Phone: 315-395-4693
Website: http://www.dankolab.org
E-mail: ***@***.***
|
Dear Charles, Thank you for the great explanation. So should I consider that the score and p-value are, kind of, inversely related, also it seems like that. Also while manually going through the peaks positions with respect to GROseq data, it seems that 0.8 threshold for dREG score is a good value to filter out the false or somewhat weak peaks, does that sound reasonable? On another note, I also noticed that while plotting the distribution of dREG score for peaks which have p-value/probability = 0.0 have a clear bimodal distribution, there are peaks either below 0.75 (approx.) or above it (like in the image attached), and it was consistent in 2 replicates of both of the samples I tried. But shouldn't peaks with p-value = 0 have a very high score? |
Hi, sorry if I am not supposed to ask such question here, please let me know if there is a designated email or group.
Can you please elaborate on how should I interpret the probability and score given by dREG to each peak? For example, I used it on GROseq data from mice cell line, and there were many peaks which have probability as 0.0 or very low (~10E-15) but dREG score such as 0.3 or 0.4, does that mean something?
Also is there a way to identify enhancers from all peaks? Or should I just consider every detected peak outside a gene as an enhancer?
Thanks
The text was updated successfully, but these errors were encountered: