-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to use the methylation data when to do boxplot ? #324
Comments
Hi Xiangyi, I think when dealing with this low amount of coverage, a difference of 5% can safely be ignored. Your DMRs should show much higher differences to be significant. If you are visualizing single DMRs, averaging the methylation per CpG would allow you to create a point for each CpG and is the only way to get the boxplot. Using the mean per DMR gives you a weighted estimate of the methylation and is more biased towards higher coverage CpGs, but this seems reasonable given the low coverage. But you do not get methylation information per CpG, so this can not be used for the boxplot. However when reporting average methylation per DMR, this is probably the value you should use. Best, . |
Dear Alex, Thank you so much for your response.
Well, I understand what you mean. Yes, we used a more strict criteria to do dmr calling. Here I just show you an example.
I agree with you that it is only way to show all CpG cites methylation levels in a boxplot.
But I am a little confused. the mean per DMR you mean sum(freqC)/sum(Coverage) , right ? It can not be used for the boxplot. I think I know it must be more complicated process of dmr calling.
You mean average methylation per DMR of DMR calling, or sum(freqC)/sum(Coverage). If so, I get you. ------------------------ Last question And in order to see how much degree these DMRs show, we load data into IGV. In some degree, we can see the difference, for example, one dmr, low methylation of A, compared to B, but also can be compared to the rest groups. However, I was wondering if somebody may ask how do you know it is significant that this dmr in A versus other rest groups ? So we want to do a boxplot based on the same DMRs and use t.test to show significance between all comparisons. Could you give me some advice on my last question if you understand my description. Thanks again. Best, |
Hi Xiangyi,
Correct, with "mean per DMR" I am talking about `sum(freqC)/sum(Coverage)`
per DMR, in contrast to "averaging the methylation per CpG" within the DMR
which would be `(numC1/Coverage1 + ... numCN/CoverageN) / N`.
Concerning your question, what you suggest should be fine, since you get
the significance between the groups using the t.test.
Best,
Alex
Am Fr., 9. Aug. 2024 um 10:58 Uhr schrieb Charite Learner <
***@***.***>:
… Dear Alex,
Thank you so much for your response.
I think when dealing with this low amount of coverage, a difference of 5%
can safely be ignored. Your DMRs should show much higher differences to be
significant.
Well, I understand what you mean. Yes, we used a more strict criteria to
do dmr calling. Here I just show you an example.
If you are visualizing single DMRs, averaging the methylation per CpG
would allow you to create a point for each CpG and is the only way to get
the boxplot.
I agree with you that it is only way to show all CpG cites methylation
levels in a boxplot.
Using the mean per DMR gives you a weighted estimate of the methylation
and is more biased towards higher coverage CpGs, but this seems reasonable
given the low coverage. But you do not get methylation information per CpG,
so this can not be used for the boxplot. However when reporting average
methylation per DMR, this is probably the value you should use.
But I am a little confused. the mean per DMR you mean
sum(freqC)/sum(Coverage) , right ? It can not be used for the boxplot.
I think I know it must be more complicated process of dmr calling.
when reporting average methylation per DMR, this is probably the value you
should use
You mean average methylation per DMR of DMR calling, or
sum(freqC)/sum(Coverage).
If so, I get you.
------------------------ Last question
For our data, we did paired-comparison DMR calling among 6 groups. So we
got many DMRs in each comparisons.
And in order to see how much degree these DMRs show, we load data into IGV.
In some degree, we can see the difference, for example, one dmr, low
methylation of A, compared to B, but also can be compared to the rest
groups.
However, I was wondering if somebody may ask how do you know it is
significant that this dmr in A versus other rest groups ?
This dmr may just comes from one paired comparison, for example, A versus
B and it may show us really low methylation only in A but high or full
methylation in B and the other 4 groups.
So we want to do a boxplot based on the same DMRs and use t.test to show
significance between all comparisons.
Could you give me some advice on my last question if you understand my
description.
Thanks again.
Best,
Xiangyi
—
Reply to this email directly, view it on GitHub
<#324 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADK7JD3VDECGH4Z2YQ5JEJDZQSAEVAVCNFSM6AAAAABMGYUAH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXGQ4DKNZUGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Many thanks ! Have a nice weekend. Best regards, |
Hello Alex,
I have a small question about RRBS methylation data.
It is my question that I post before.
My question this time is how to use RRBS methylation data to do a boxplot based on CpG sites values.
But if I do a boxplot, I should calculate the mean methylation level of one dmr which may contain many CpGs.
For example, below is a dmr which contains 6 CpG sites. But because here it shows us the methylation percentage not freqC.
chr1.836516 chr1 836516 F 4 100.00 0
chr1.836543 chr1 836543 F 4 0.00 100
chr1.836941 chr1 836941 F 19 68.42 31.58
chr1.836942 chr1 836942 R 1 100.00 0
chr1.836953 chr1 836953 F 19 78.95 21.05
chr1.836954 chr1 836954 R 1 100.00 0
So if I do boxplot for this dmr,
Mean Methylation Level (Group 1) = (100.00+0.00+68.42+100.00+78.95+100.00)/6 =74.56 or 0.7456
However, this value must be different compared to sum(freqC)/sum(Coverage) , right ?
For sum(freqC)/sum(Coverage) ,
it's :
(4 * 1 + 4* 0+ .... 1* 1)/(4+4+19+1+19+1)= 0.70833333
Do you think I can do boxplot this way if I want to show all CpGs methylation level ?
Can I ignore this difference in some degree.
Because I think it is the only way to show the methylation level by boxplot.
I am looking forward to your response.
Best regards,
Xiangyi
The text was updated successfully, but these errors were encountered: