-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bismark.cov.gz file entries for concatenated CGCG or overlapping CGCGC patterns #91
Comments
Hi Albert, Unless you have been playing with different genome browsers (e.g. Ensembl and then UCSC) there should be no counting artefacts from adjacent positions. I guess you could process the data further to CpG methylation reports using |
Yes, I agree it would be good to have a filter in `coverage2cytosine` or
wherever you prefer, so that if a `--discordance_filter X%` flag is
present, it would:
(a) Try to `--merge_CpG`, if for given positions finds a discordance above
`--discordance_filter X%`, then
(b) Produces another file with the same output format, but with the
non-merged discordant entries spitted out.
(Similar behaviour as `--ambig_bam` in the alignment step, but here for
discordant counts).
Does that sound good?
…On Fri, Feb 10, 2017 at 9:25 AM, FelixKrueger ***@***.***> wrote:
Hi Albert,
The two cytosines on the top strand come from the CTOT strand while the Gs
were measured on the CTOB strand, and these events are completely
independent of each other. Not quite sure if I am interpreting the colours
wrong, but is there also a huge difference in coverage between the top and
bottom strands (the numbers in the little pop out windows seem to suggest
that the total calls are quite comparable)?
Unless you have been playing with different genome browsers (e.g. Ensembl
and then UCSC) there should be no counting artefacts from adjacent
positions. I guess you could process the data further to CpG methylation
reports using coverage2cytosine and then process this file to look for
discrepancies at adjacent top and bottom strand Cs. coverage2cytosine
already has a function --merge_CpG which merges the top and bottom strand
evidence together to form a single CpG dinucleotide entity, I guess it
should be very straight forward to add another check to see whether the two
agree with each other and if not just boot the position altogether. Would
that be useful to you?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJpN3A_5_DxrWSBV3CnGWyL3hHaUZGnks5rbC0WgaJpZM4L8P1I>
.
|
That should be fairly straight forward to implement. I may take a look when I am back from skiing... |
I am now back from skiing, is this still something you are interested in? |
Yes please
…On Jul 13, 2017 11:57, "FelixKrueger" ***@***.***> wrote:
I am now back from skiing, is this still something you are interested in?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJpN28pQLfkxjR4J9fseVBhXCPlk26Aks5sNfgKgaJpZM4L8P1I>
.
|
I have now added a new option When in '--merge_CpG' mode, apply a filter for the maximum allowed discordance between top and bottom strand methylation values expressed as the absolute difference in percent methylation. Discordant CpGs are written to a file called 'discordant_CpG_evidence.cov' (not merged). As example consider:
with '--discordance 20'. The methylation % difference here is 31%, so the read would go into the discordant.cov file. CpG positions for which either the top or bottom strand was not measured at all will not be assessed for discordance and hence appear in the regular 'merged_CpG_evidence.cov' file. Is this what you were after? |
Brilliant, thanks!
…On Thu, Jul 13, 2017 at 4:09 PM, FelixKrueger ***@***.***> wrote:
I have now added a new option --discordance <int> to allow filtering for
discordance:
When in '--merge_CpG' mode, apply a filter for the maximum allowed
discordance between top and bottom strand methylation values expressed as
the absolute difference in percent methylation. Discordant CpGs are written
to a file called 'discordant_CpG_evidence.cov' (not merged). As example
consider:
top: gi|170079663|ref|NC_010473.1| 573 + 5 6 CG CGC
bottom: gi|170079663|ref|NC_010473.1| 574 - 13 4 CG CGG
with '--discordance 20'. The methylation % difference here is 31%, so the
read would go into the discordant.cov file. CpG positions for which either
the top or bottom strand was not measured at all will not be assessed for
discordance and hence appear in the regular 'merged_CpG_evidence.cov' file.
Is this what you were after?
[image: image]
<https://user-images.githubusercontent.com/5870509/28173133-6ba3f560-67e5-11e7-8f55-6bbd17e78e20.png>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJpN4gaqxJXC0BkLaTaP1Oq9VtlwUInks5sNjM3gaJpZM4L8P1I>
.
|
Hi,
I am looking at the number of methylated and number of un-methylated Cs reported in the bismark.cov.gz file after running bismark_methylation_extractor, and noticed that for my sample prep, which sequences over 2 of the 4 possible strands (CTOT and CTOB), the methylation counts look a bit odd when there are 2 CGs next to each other, e.g. 'CGCG' or when there is s 'CGCGC' pattern.
These counts look odd in IGV (coloured blue/red in screenshot), which reports a shift from 40/60% to 20/80% when looking at the counts from left to right.
Is this a counting artifact of going through a CGCG pattern and counting the G from the minus strand (position 2) also as the C state of the next base (position 3)? Assuming the correct value is one of the two, e.g. either 40/60 or 20/80 in the example, is there a rule to filter out cases like these as a post-processing of the bismark.cov.gz file? Or using another of the files outputted by bismark_methylation_extractor?
Thx in advance,
A.
The text was updated successfully, but these errors were encountered: