Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug for bismark unique pairs mapping #105

Closed
KeyingLu opened this issue May 3, 2017 · 6 comments
Closed

possible bug for bismark unique pairs mapping #105

KeyingLu opened this issue May 3, 2017 · 6 comments

Comments

@KeyingLu
Copy link

KeyingLu commented May 3, 2017

Hi,
There have two reads that are complementary.
fastq1:
@NB500998:30:HVJMKBGXY:4:12502:2281:6197 1:N:0:GTAGAGGA
ACGTAAATGTAGTCGTGATATTGGAAGCGGTAAGTAGGAATTTAGGGAGCGTTGTAAATGT
+
AAA/AEEEE/AEEAEAEEEEEEEEEEEEEEEEEAEAEEEEEEEEEE<EEE<E<EA/EEEEA
fastq2:
@NB500998:30:HVJMKBGXY:4:12502:2281:6197 2:N:0:GTAGAGGA
ACATTTACAACGCTCCCTAAATTCCTACTTACCGCTTCCAATATCACGACTACATTTACGT
+
AAAAAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

I use bismark to mapping the reads to reference and get the following results:
NB500998:30:HVJMKBGXY:4:12502:2281:6197_1:N:0:GTAGAGGA 83 chr2 356526 23 61M =356475 -112 ACATTTACAACGCTCCCTAAATTCCTACTTACCGCTTCCAATATCACGACTACATTTACGT AEEEE/AE<E<EEE<EEEEEEEEEEAEAEEEEEEEEEEEEEEEEEAEAEEA/EEEEA/AAA NM:i:8 MD:Z:0G7G0T20G8G0G1G8G9 XM:Z:x.......z..Z..................h..Z.....zx.h....Z...x.......Z. XR:Z:CT XG:Z:GA
NB500998:30:HVJMKBGXY:4:12502:2281:6197_1:N:0:GTAGAGGA 163 chr2 356475 23 61M =356526 112 ACATTTACAACGCTCCCTAAATTCCTACTTACCGCTTCCAATATCACGACTACATTTACGT AAAAAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE NM:i:8 MD:Z:0G7G0T20G8G0G1G8G9 XM:Z:x.......z..Z..................h..Z.....zx.h....Z...x.......Z. XR:Z:GA XG:Z:GA
The two reads are supposed to be mapped to the same locus, but the results show that the two reads are mapped to different locus. That is to say, the reads are non-unique pairs but bismark reports unique mapping.

And I use bsmap to mapping the reads , I got the results of non-unique pairs.I don't have any idea why bismark occur the result.

Thank you in advance!

@FelixKrueger
Copy link
Owner

Thanks for reporting this, I'll look into it later this morning. Just as a quick question, which version of Bismark were you using?

@KeyingLu
Copy link
Author

KeyingLu commented May 3, 2017

bismark_v0.17.0

@FelixKrueger
Copy link
Owner

Another quick question: which genome are we talking about? Thanks.

@KeyingLu
Copy link
Author

KeyingLu commented May 3, 2017

reference genome is hg19

@FelixKrueger
Copy link
Owner

Thanks for this most exquisite corner case! The region you are looking at has a perfect repetition of the following sequence right next to each other:

2 dna:chromosome chromosome:GRCh38:2:356474:356587:1
TGCATTTACGTCGCTCCCTAAATTCCTACTTGCCGCTTCCGGTGTCACGACT GCATTTACGTCGCTCCCTAAATTCCTACTTGCCGCTTCCGGTGTCACGACTGCATTTACGTC

While single-end alignments were not reported at because the sequence has 2 identical alignments close to each other, paired-end alignments were reported because of a probably very rare corner case that can only happen when sequences are perfectly duplicated and still within a valid paired-end distance.

The short version:

I have now changed the way these situations are handled in this commit: e0f9bd2. Please clone the latest development version and try that.

A somewhat longer version:

In this case both R1 and R2 could align to 2 different places in the genome and were therefore not reported in single-end mode. In paired-end mode however, R1 placed at a location that would allow R2 to have two alignments, the first one getting a primary alignment score of AS:i:-6 and a secondary alignment of XS:i:-6. But because R1 was not reported by Bowtie2 to have another alignment location (i.e. no XS:i: tag at all) with this R2 (probably because the alignment would then have been discordant then; maybe a design choice of Bowtie2?), the alignment was not considered ambiguous within it's own alignment thread as it would have been the case for any other secondary alignment, e.g. on another chromosome.
To fix this behaviour we are now assigning the same alignment score of the best alignment to the second best (and so far undefined) alignment score if the partner read has a secondary alignment at all. And this seems to do the trick...

In a nutshell: This issue should now be fixed, but I would assume that we were looking at a very rare corner case here. Thanks for bringing it to my attention nevertheless! Best wishes, Felix

@KeyingLu
Copy link
Author

KeyingLu commented May 4, 2017

I try it again, and the problem has been solved.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants