Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment Record sort is not consistent with samtools #1504

Closed
akmorrow13 opened this issue Apr 25, 2017 · 4 comments
Closed

Alignment Record sort is not consistent with samtools #1504

akmorrow13 opened this issue Apr 25, 2017 · 4 comments
Assignees
Labels
bug
Milestone

Comments

@akmorrow13
Copy link
Contributor

@akmorrow13 akmorrow13 commented Apr 25, 2017

I have sorted platinum trio files using ADAM and saved them as a single file bam. When generating an index using samtools (samtools index NA12891.bam) I have to resort using samtools for some of the files. I get the following error:
NO_COOR reads not in a single block at the end 0 -1
Meaning that I have to resort.
The ADAM sorting complies with samtools for NA12878, but not NA12891 and NA12892.

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Apr 25, 2017

That is very odd. What's the exact command you used to sort?

@akmorrow13
Copy link
Contributor Author

@akmorrow13 akmorrow13 commented Apr 26, 2017

./bin/adam-submit --master yarn-client --num-executors 12 --executor-cores 8 --executor-memory 20G -- transform -sort_reads -single /data/platinum/adam/NA12891_S1.bam.adam /data/platinum/alignments_sorted/NA12891_S1.sorted.bam

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Jul 11, 2017

Checking this now.

@fnothaft fnothaft self-assigned this Jul 13, 2017
@fnothaft fnothaft added the bug label Jul 13, 2017
@fnothaft fnothaft added this to the 0.23.0 milestone Jul 13, 2017
@fnothaft
Copy link
Member

@fnothaft fnothaft commented Jul 13, 2017

Sigh. This is A Thing.

fnothaft added a commit to fnothaft/adam that referenced this issue Jul 13, 2017
Resolves bigdatagenomics#1504. To handle unmapped reads---which must go at the end of the file,
after all aligned reads---we assign them to a sequence index higher than the
index of the highest read. However, to not put all reads at the same sequence
index, we randomize using the hash code of the read name of the unmapped read.
With the prior logic, reads whose names yielded negative hash codes would
occasionally get indices that were valid contig indices.
fnothaft added a commit to fnothaft/adam that referenced this issue Jul 17, 2017
Resolves bigdatagenomics#1504. To handle unmapped reads---which must go at the end of the file,
after all aligned reads---we assign them to a sequence index higher than the
index of the highest read. However, to not put all reads at the same sequence
index, we randomize using the hash code of the read name of the unmapped read.
With the prior logic, reads whose names yielded negative hash codes would
occasionally get indices that were valid contig indices.
heuermh added a commit that referenced this issue Jul 17, 2017
Resolves #1504. To handle unmapped reads---which must go at the end of the file,
after all aligned reads---we assign them to a sequence index higher than the
index of the highest read. However, to not put all reads at the same sequence
index, we randomize using the hash code of the read name of the unmapped read.
With the prior logic, reads whose names yielded negative hash codes would
occasionally get indices that were valid contig indices.
@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.