How to interpret the probability and max score in the output files? #14

MasterChief1O7 · 2021-08-09T12:04:33Z

Hi, sorry if I am not supposed to ask such question here, please let me know if there is a designated email or group.

Can you please elaborate on how should I interpret the probability and score given by dREG to each peak? For example, I used it on GROseq data from mice cell line, and there were many peaks which have probability as 0.0 or very low (~10E-15) but dREG score such as 0.3 or 0.4, does that mean something?

Also is there a way to identify enhancers from all peaks? Or should I just consider every detected peak outside a gene as an enhancer?

Thanks

dankoc · 2021-08-09T12:43:06Z

Dear Ankit, No worries - this is fine! Can you please elaborate on how should I interpret the probability and

score given by dREG to each peak? For example, I used it on GROseq data from mice cell line, and there were many peaks which have probability as 0.0 or very low (~10E-15) but dREG score such as 0.3 or 0.4, does that mean something?

dREG scores are the raw output of the SVR. Values near 1 represent a region that looks very much like a TIR; values near 0 represent a region that looks very much like either a gene body or an intergenic region where there is no evidence of transcription initiation. Values near 0.3 and 0.4 are also very likely TIRs. The p-values (what you call probability) represents the probability of observing a dREG score >0 if the region is actully either a gene body or an intergenic region. These can be used a lot like you would use the p-value or (when corrected by FDR) the q-values provided by any other peak caller.

Also is there a way to identify enhancers from all peaks? Or should I just consider every detected peak outside a gene as an enhancer?

We usually define candidate enhancers as dREG peaks that are located distally from an annotated transcription start site. How far from a TSS they have to be is arbitrary - lots of papers have used values >10kb (though this probably treats some proximal enhancers as promoters). Best, Charles

…

Thanks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#14>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYUH7MRQ3IRIMSJ4OMV6KTT3675ZANCNFSM5BZ35IQA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- Robert N. Noyce Assistant Professor in Life Science & Technology Baker Institute for Animal Health College of Veterinary Medicine Cornell University 235 Hungerford Hill Road Ithaca, NY 14853 Phone: 315-395-4693 Website: http://www.dankolab.org E-mail: ***@***.***

MasterChief1O7 · 2021-08-10T14:16:38Z

Dear Charles,

Thank you for the great explanation. So should I consider that the score and p-value are, kind of, inversely related, also it seems like that. Also while manually going through the peaks positions with respect to GROseq data, it seems that 0.8 threshold for dREG score is a good value to filter out the false or somewhat weak peaks, does that sound reasonable?

On another note, I also noticed that while plotting the distribution of dREG score for peaks which have p-value/probability = 0.0 have a clear bimodal distribution, there are peaks either below 0.75 (approx.) or above it (like in the image attached), and it was consistent in 2 replicates of both of the samples I tried. But shouldn't peaks with p-value = 0 have a very high score?

Regards
Ankit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interpret the probability and max score in the output files? #14

How to interpret the probability and max score in the output files? #14

MasterChief1O7 commented Aug 9, 2021

dankoc commented Aug 9, 2021 via email

MasterChief1O7 commented Aug 10, 2021

How to interpret the probability and max score in the output files? #14

How to interpret the probability and max score in the output files? #14

Comments

MasterChief1O7 commented Aug 9, 2021

dankoc commented Aug 9, 2021 via email

MasterChief1O7 commented Aug 10, 2021