Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Mapped mate should have mate reference name" problem #1

Closed
drchriscole opened this issue Jul 28, 2015 · 15 comments
Closed

"Mapped mate should have mate reference name" problem #1

drchriscole opened this issue Jul 28, 2015 · 15 comments

Comments

@drchriscole
Copy link

Hi,

Really liking your tool, it should make submitting my BAMs to ENA much simpler. However, I'm getting the error above and have exactly the same features 'id0' does in the SeqAnswers thread. e.g. offending read is only in unmapped.bam and the MAPQ is changed from 255 to 0 by your script.

I'm not using any special settings in TopHat (v2.0.9), only -i to adjust intron size.
Do you have any suggestions for getting around it, please?
Thanks,

Chris

@cbrueffer
Copy link
Owner

Hi Chris, could you post a bit more information about the problematic read, e.g. samtools view unmapped.bam | grep name-of-he-read
Is the mate of the problematic read mapped or unmapped? If the former, could you provide the same command as above for the accepted_hits.bam file?

@drchriscole
Copy link
Author

This is error I get with Picardtools SortSam command:
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 40355112, Read name HWI-ST1398:69:C26EMACXX:8:1315:6828:52823, Mapped mate should have mate reference name

$ samtools view unmapped.bam | grep HWI-ST1398:69:C26EMACXX:8:1315:6828:52823
HWI-ST1398:69:C26EMACXX:8:1315:6828:52823 133 * 0 255 * * 0 0 AAAAAATCCAACCATACTAATACATTTCCTATAAAATAAACTTGTTGATTTATCATTTGTTAATAAATTATTTATTACATCTTTTTGCTCTTGTTCATCTA BC@FFFFFHHHHHJJJJJJJJJJIJJJJJJJJJJJJJJJIJJJJIJJJIJGHJJJJJJJJJIJJJJJJJJJJJJJJJJHHHHHEHFFFFEEEEEEEEEEDD
$ samtools view unmapped_fixup.bam | grep HWI-ST1398:69:C26EMACXX:8:1315:6828:52823HWI-ST1398:69:C26EMACXX:8:1315:6828:52823 133 * 0 0 * * 0 0 AAAAAATCCAACCATACTAATACATTTCCTATAAAATAAACTTGTTGATTTATCATTTGTTAATAAATTATTTATTACATCTTTTTGCTCTTGTTCATCTA BC@FFFFFHHHHHJJJJJJJJJJIJJJJJJJJJJJJJJJIJJJJIJJJIJGHJJJJJJJJJIJJJJJJJJJJJJJJJJHHHHHEHFFFFEEEEEEEEEEDD
$ samtools view accepted_hits.bam | grep HWI-ST1398:69:C26EMACXX:8:1315:6828:52823

@cbrueffer
Copy link
Owner

The problem appears to be that this is paired-end data (read paired / second in pair flags set), however the paired read is nowhere to be found. Picard looks for the paired read using the same read name, can't find it, and concludes it must have a different name, hence the error message. Are both reads present in the original FASTQ files? If yes, TopHat seems to throw the paired read away for some reason.

Basically there are two things one could try: 1. run Picard with VALIDATION_STRINGENCY=LENIENT, or 2. remove the read paired / second in pair flags so Picard doesn't think the read is paired-end and hence doesn't look for it's partner.

@drchriscole
Copy link
Author

Yes, I see them in the FASTQs.

So, this isn't something you've seen before and been able to deal with then? It's wierd that TopHat is discarding reads.

@cbrueffer
Copy link
Owner

I think I've seen reports of this in the Tuxedo-suite Google group, but I've never encountered it myself.

I could add some code to deal with this case the way suggested above (adjusting the flags), but I won't have time to do it until next week (and it's going to be fairly expensive, since there's no quick way to see which reads are affected). Could you test running Picard with VALIDATION_STRINGENCY=LENIENT, to see if that works first?

@drchriscole
Copy link
Author

With the LENIENT filter I see >1 million entries, but not for all runs: ~50%
I note this entry for TopHat v2.0.11 (I'm using 2.0.9):
"Improved support for adding unpaired reads to PE reads in the same TopHat2 run [..] This includes reporting separate counts for the additional unpaired reads and making sure that the SAM flags in the output files reflect the paired or unpaired origin of the reads."
So, I'll try a newer version and see.
Thanks for your help and suggestions.

@cbrueffer
Copy link
Owner

Turns out the approach outlined above was actually quick and easy to implement. Could you please try script from the branch missingreads?

Basically it records the names of the unmapped reads with mapped mate, crosses out the names it can find in the mapped file, and for the remaining ones (the unmapped reads that are marked as having a mapped mate, but the mate can't be found) adjusts the flags.

@drchriscole
Copy link
Author

Thanks very much for doing this.
OK, so I get different, maybe unrelated?, error from Picard: "ERROR: Record 61183083, Read name HWI-ST1398:69:C26EMACXX:8:1101:1135:2241, First of pair flag should not be set for unpaired read."
Grepping in my merged file I see:
HWI-ST1398:69:C26EMACXX:8:1101:1135:2241 68 * 0 0 * * 0 0 CAAACAGCAACAGATGCATCAATGACAGCAGAGGATGTATCAAAGATTGGAGAAGATTATGAGCTATTTTAATNNATACTTGAGGAACCAGGTGTCGATGA ;??DBDBD<D?:<@#########################################################################################
unmapped.bam:
HWI-ST1398:69:C26EMACXX:8:1101:1135:2241 69 * 0 255 * * 0 0 CAAACAGCAACAGATGCATCAATGACAGCAGAGGATGTATCAAAGATTGGAGAAGATTATGAGCTATTTTAATNNATACTTGAGGAACCAGGTGTCGATGA ;??DBDBD<D?:<@#########################################################################################
unmapped_fixup.bam:
HWI-ST1398:69:C26EMACXX:8:1101:1135:2241 68 * 0 0 * * 0 0 CAAACAGCAACAGATGCATCAATGACAGCAGAGGATGTATCAAAGATTGGAGAAGATTATGAGCTATTTTAATNNATACTTGAGGAACCAGGTGTCGATGA ;??DBDBD<D?:<@#########################################################################################

@drchriscole
Copy link
Author

Running with the LENIENT flag reveals lots of reads with this problem.

@cbrueffer
Copy link
Owner

Thanks for testing! Could you test again with the latest version from the branch? This one resets a few more fields; if you still get an error it should be a different one.

@drchriscole
Copy link
Author

Works! No Picard errors.
Excellent, many thanks.

@cbrueffer
Copy link
Owner

Fantastic, thanks for testing! I'll add this to the main script and file a bug report with the TopHat people.

@drchriscole
Copy link
Author

OK. I think this is less of a problem for newer versions of TopHat.

@cbrueffer
Copy link
Owner

I've filed the bug here: https://github.com/infphilo/tophat/issues/16 Could you add there which versions of TopHat you tested?

I'll release a new version of the script today, for your methods.

cbrueffer added a commit that referenced this issue Aug 13, 2015
…te is missing.

Fixes #1 (reported and tested by @drchriscole, thanks!)
@cbrueffer
Copy link
Owner

Hi Chris, sorry for coming back to this, but regarding running Picard with VALIDATION_STRINGENCY=LENIENT you wrote "With the LENIENT filter I see >1 million entries, but not for all runs: ~50%". Could you elaborate on what you saw here? What did these numbers refer to? Did Picard crash or finish?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants