New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adamMarkDuplicates function in AlignmentRecordRDDFunctions class can not mark the same read ? #1037

Closed
xubo245 opened this Issue May 21, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@xubo245

xubo245 commented May 21, 2016

adamMarkDuplicates function in AlignmentRecordRDDFunctions class can not mark the same read .
I copy the first read in reads12.sam (adam-core_2.10-0.19.0 test resources)

simread:1:26472783:false    16  1   26472784    60  75M *   0   0   GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA *   NM:i:0  AS:i:75 XS:i:0
simread:1:26472783:false    16  1   26472784    60  75M *   0   0   GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA *   NM:i:0  AS:i:75 XS:i:0
simread:1:240997787:true    0   1   240997788   60  75M *   0   0   CTTTATTTTTATTTTTAAGGTTTTTTTTGTTTGTTTGTTTTGAGATGGAGTCTCGCTCCACCGCCCAGACTGGAG *   NM:i:0  AS:i:75 XS:i:39
simread:1:189606653:true    0   1   189606654   60  75M *   0   0   TGTATCTTCCTCCCCTGCTGTATGTTTCCTGCCCTCAAACATCACACTCCACGTTCTTCAGCTTTAGGACTTGGA *   NM:i:0  AS:i:75 XS:i:0

total of reads are 201
Mark:

marked:201
dups:0
nonDup:201
{"readInFragment": 0, "contig": {"contigName": "1", "contigLength": 249250621, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": 0}, "start": 26472783, "oldPosition": null, "end": 26472858, "mapq": 60, "readName": "simread:1:26472783:false", "sequence": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "qual": null, "cigar": "75M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": "XS:i:0\tAS:i:75\tNM:i:0", "recordGroupName": "machine foo", "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContig": null, "inferredInsertSize": null}
{"readInFragment": 0, "contig": {"contigName": "1", "contigLength": 249250621, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": 0}, "start": 26472783, "oldPosition": null, "end": 26472858, "mapq": 60, "readName": "simread:1:26472783:false", "sequence": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "qual": null, "cigar": "75M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": "XS:i:0\tAS:i:75\tNM:i:0", "recordGroupName": "machine foo", "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContig": null, "inferredInsertSize": null}
@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 May 21, 2016

I find the reason: the readName of the two reads are the same. So the two reads are maked Duplicates false,they should by distinguished?
I try to modify the two reads in different name,but there are error: null pointer,they no qual!!!
I add qual and the result is no problem:
···
marked:201
dups:1
nonDup:200
{"readInFragment": 0, "contig": {"contigName": "1", "contigLength": 249250621, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": 0}, "start": 26472783, "oldPosition": null, "end": 26472858, "mapq": 60, "readName": "simread:1:264727832:false", "sequence": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "qual": "GTTTAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "cigar": "75M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": "XS:i:0\tAS:i:75\tNM:i:0", "recordGroupName": "machine foo", "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContig": null, "inferredInsertSize": null}
{"readInFragment": 0, "contig": {"contigName": "1", "contigLength": 249250621, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": 0}, "start": 26472783, "oldPosition": null, "end": 26472858, "mapq": 60, "readName": "simread:1:26472783:false", "sequence": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "qual": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "cigar": "75M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": true, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": "XS:i:0\tAS:i:75\tNM:i:0", "recordGroupName": "machine foo", "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContig": null, "inferredInsertSize": null}

···

xubo245 commented May 21, 2016

I find the reason: the readName of the two reads are the same. So the two reads are maked Duplicates false,they should by distinguished?
I try to modify the two reads in different name,but there are error: null pointer,they no qual!!!
I add qual and the result is no problem:
···
marked:201
dups:1
nonDup:200
{"readInFragment": 0, "contig": {"contigName": "1", "contigLength": 249250621, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": 0}, "start": 26472783, "oldPosition": null, "end": 26472858, "mapq": 60, "readName": "simread:1:264727832:false", "sequence": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "qual": "GTTTAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "cigar": "75M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": "XS:i:0\tAS:i:75\tNM:i:0", "recordGroupName": "machine foo", "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContig": null, "inferredInsertSize": null}
{"readInFragment": 0, "contig": {"contigName": "1", "contigLength": 249250621, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": 0}, "start": 26472783, "oldPosition": null, "end": 26472858, "mapq": 60, "readName": "simread:1:26472783:false", "sequence": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "qual": "GTATAAGAGCAGCCTTATTCCTATTTATAATCAGGGTGAAACACCTGTGCCAATGCCAAGACAGGGGTGCCAAGA", "cigar": "75M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": true, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": "XS:i:0\tAS:i:75\tNM:i:0", "recordGroupName": "machine foo", "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContig": null, "inferredInsertSize": null}

···

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 6, 2016

Member

Closing as resolved.

Member

fnothaft commented Jul 6, 2016

Closing as resolved.

@fnothaft fnothaft closed this Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment