New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment Record sort is not consistent with samtools #1504

Closed
akmorrow13 opened this Issue Apr 25, 2017 · 4 comments

Comments

Projects
2 participants
@akmorrow13
Contributor

akmorrow13 commented Apr 25, 2017

I have sorted platinum trio files using ADAM and saved them as a single file bam. When generating an index using samtools (samtools index NA12891.bam) I have to resort using samtools for some of the files. I get the following error:
NO_COOR reads not in a single block at the end 0 -1
Meaning that I have to resort.
The ADAM sorting complies with samtools for NA12878, but not NA12891 and NA12892.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Apr 25, 2017

Member

That is very odd. What's the exact command you used to sort?

Member

fnothaft commented Apr 25, 2017

That is very odd. What's the exact command you used to sort?

@akmorrow13

This comment has been minimized.

Show comment
Hide comment
@akmorrow13

akmorrow13 Apr 26, 2017

Contributor

./bin/adam-submit --master yarn-client --num-executors 12 --executor-cores 8 --executor-memory 20G -- transform -sort_reads -single /data/platinum/adam/NA12891_S1.bam.adam /data/platinum/alignments_sorted/NA12891_S1.sorted.bam

Contributor

akmorrow13 commented Apr 26, 2017

./bin/adam-submit --master yarn-client --num-executors 12 --executor-cores 8 --executor-memory 20G -- transform -sort_reads -single /data/platinum/adam/NA12891_S1.bam.adam /data/platinum/alignments_sorted/NA12891_S1.sorted.bam

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 11, 2017

Member

Checking this now.

Member

fnothaft commented Jul 11, 2017

Checking this now.

@fnothaft fnothaft self-assigned this Jul 13, 2017

@fnothaft fnothaft added the bug label Jul 13, 2017

@fnothaft fnothaft added this to the 0.23.0 milestone Jul 13, 2017

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 13, 2017

Member

Sigh. This is A Thing.

Member

fnothaft commented Jul 13, 2017

Sigh. This is A Thing.

fnothaft added a commit to fnothaft/adam that referenced this issue Jul 13, 2017

[ADAM-1504] Fix transient sorting bug when sorting reads in index mode.
Resolves bigdatagenomics#1504. To handle unmapped reads---which must go at the end of the file,
after all aligned reads---we assign them to a sequence index higher than the
index of the highest read. However, to not put all reads at the same sequence
index, we randomize using the hash code of the read name of the unmapped read.
With the prior logic, reads whose names yielded negative hash codes would
occasionally get indices that were valid contig indices.

fnothaft added a commit to fnothaft/adam that referenced this issue Jul 17, 2017

[ADAM-1504] Fix transient sorting bug when sorting reads in index mode.
Resolves bigdatagenomics#1504. To handle unmapped reads---which must go at the end of the file,
after all aligned reads---we assign them to a sequence index higher than the
index of the highest read. However, to not put all reads at the same sequence
index, we randomize using the hash code of the read name of the unmapped read.
With the prior logic, reads whose names yielded negative hash codes would
occasionally get indices that were valid contig indices.

heuermh added a commit that referenced this issue Jul 17, 2017

[ADAM-1504] Fix transient sorting bug when sorting reads in index mode.
Resolves #1504. To handle unmapped reads---which must go at the end of the file,
after all aligned reads---we assign them to a sequence index higher than the
index of the highest read. However, to not put all reads at the same sequence
index, we randomize using the hash code of the read name of the unmapped read.
With the prior logic, reads whose names yielded negative hash codes would
occasionally get indices that were valid contig indices.

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment