Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bismark.cov.gz file entries for concatenated CGCG or overlapping CGCGC patterns #91

Closed
avilella opened this issue Feb 9, 2017 · 8 comments

Comments

@avilella
Copy link

avilella commented Feb 9, 2017

Hi,

I am looking at the number of methylated and number of un-methylated Cs reported in the bismark.cov.gz file after running bismark_methylation_extractor, and noticed that for my sample prep, which sequences over 2 of the 4 possible strands (CTOT and CTOB), the methylation counts look a bit odd when there are 2 CGs next to each other, e.g. 'CGCG' or when there is s 'CGCGC' pattern.

These counts look odd in IGV (coloured blue/red in screenshot), which reports a shift from 40/60% to 20/80% when looking at the counts from left to right.

Is this a counting artifact of going through a CGCG pattern and counting the G from the minus strand (position 2) also as the C state of the next base (position 3)? Assuming the correct value is one of the two, e.g. either 40/60 or 20/80 in the example, is there a rule to filter out cases like these as a post-processing of the bismark.cov.gz file? Or using another of the files outputted by bismark_methylation_extractor?

Thx in advance,

A.

screen shot 2017-02-09 at 15 06 25

@FelixKrueger
Copy link
Owner

Hi Albert,
The two cytosines on the top strand come from the CTOT strand while the Gs were measured on the CTOB strand, and these events are completely independent of each other. Not quite sure if I am interpreting the colours wrong, but is there also a huge difference in coverage between the top and bottom strands (the numbers in the little pop out windows seem to suggest that the total calls are quite comparable)?

Unless you have been playing with different genome browsers (e.g. Ensembl and then UCSC) there should be no counting artefacts from adjacent positions. I guess you could process the data further to CpG methylation reports using coverage2cytosine and then process this file to look for discrepancies at adjacent top and bottom strand Cs. coverage2cytosine already has a function --merge_CpG which merges the top and bottom strand evidence together to form a single CpG dinucleotide entity, I guess it should be very straight forward to add another check to see whether the two agree with each other and if not just boot the position altogether. Would that be useful to you?

@avilella
Copy link
Author

avilella commented Feb 10, 2017 via email

@FelixKrueger
Copy link
Owner

That should be fairly straight forward to implement. I may take a look when I am back from skiing...

@avilella
Copy link
Author

96539

@FelixKrueger
Copy link
Owner

I am now back from skiing, is this still something you are interested in?

@avilella
Copy link
Author

avilella commented Jul 13, 2017 via email

@FelixKrueger
Copy link
Owner

I have now added a new option --discordance <int> to allow filtering for discordance:

When in '--merge_CpG' mode, apply a filter for the maximum allowed discordance between top and bottom strand methylation values expressed as the absolute difference in percent methylation. Discordant CpGs are written to a file called 'discordant_CpG_evidence.cov' (not merged). As example consider:

 top:     gi|170079663|ref|NC_010473.1|   573     +       5       6       CG      CGC
 bottom:  gi|170079663|ref|NC_010473.1|   574     -       13      4       CG      CGG 

with '--discordance 20'. The methylation % difference here is 31%, so the read would go into the discordant.cov file. CpG positions for which either the top or bottom strand was not measured at all will not be assessed for discordance and hence appear in the regular 'merged_CpG_evidence.cov' file.

Is this what you were after?

image

@avilella
Copy link
Author

avilella commented Jul 13, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants