New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose sorting by reference index #952

Closed
fnothaft opened this Issue Feb 20, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Feb 20, 2016

See #823; we don't currently expose sorting by index on the CLI, though.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 24, 2016

Member

Can this be pushed off to 0.20.0?

Member

heuermh commented Feb 24, 2016

Can this be pushed off to 0.20.0?

@heuermh heuermh modified the milestones: 0.20.0, 0.19.0 Feb 24, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 18, 2016

Member

@heuermh or @jpdna, can one of you take this? It should be straightforward, let's discuss on the call today.

Member

fnothaft commented May 18, 2016

@heuermh or @jpdna, can one of you take this? It should be straightforward, let's discuss on the call today.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna May 18, 2016

Member

This does interest me givenmy general interest in sorting/index type stuff. I'm happy to take it unless someone else wants to.

Member

jpdna commented May 18, 2016

This does interest me givenmy general interest in sorting/index type stuff. I'm happy to take it unless someone else wants to.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 18, 2016

Member

Thanks @jpdna! I've assigned it to you.

Member

fnothaft commented May 18, 2016

Thanks @jpdna! I've assigned it to you.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jun 6, 2016

Member

As I interpret it, the goal of this issue is to add a CLI option flag --sort_orig_ref_order to be included when running ADAM2Vcf or when outputting a sam/bam file using transform

The result would be a vcf or sam/bam file with records sorted first by the referenceIndex of the contig, and then by genomic position. The header sequence dictionary rows will also match the referenceIndex ordering.

Is this correct?

Question: I am not clear as to whether indeed we do wish for this command to also sort by genomic position within the reference/contig groups. It is possible the original sam/bam or vcf was not sorted by position, however we have no way to recover the original order, and there is not a guarantee the order ADAM would output would be the same as the original vcf or bam.

Also - I need to look at code to understand more about the cases where
a) the contigs came in lex sorted, but don't appear in the original header
b) the contigs header metadata is present, defining a sort order but its not lexographic
c) contigs come in grouped by records, but not lexographically, and no header

As long a SequenceDictionary exists with contigs with referenceIndex, I guess the above a,b,c is upstream from concern in this issue, but just wanted to check.

Member

jpdna commented Jun 6, 2016

As I interpret it, the goal of this issue is to add a CLI option flag --sort_orig_ref_order to be included when running ADAM2Vcf or when outputting a sam/bam file using transform

The result would be a vcf or sam/bam file with records sorted first by the referenceIndex of the contig, and then by genomic position. The header sequence dictionary rows will also match the referenceIndex ordering.

Is this correct?

Question: I am not clear as to whether indeed we do wish for this command to also sort by genomic position within the reference/contig groups. It is possible the original sam/bam or vcf was not sorted by position, however we have no way to recover the original order, and there is not a guarantee the order ADAM would output would be the same as the original vcf or bam.

Also - I need to look at code to understand more about the cases where
a) the contigs came in lex sorted, but don't appear in the original header
b) the contigs header metadata is present, defining a sort order but its not lexographic
c) contigs come in grouped by records, but not lexographically, and no header

As long a SequenceDictionary exists with contigs with referenceIndex, I guess the above a,b,c is upstream from concern in this issue, but just wanted to check.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jun 6, 2016

Member

I have this implemented, just working on unit tests, will PR tomorrow.

Member

fnothaft commented Jun 6, 2016

I have this implemented, just working on unit tests, will PR tomorrow.

fnothaft added a commit to fnothaft/adam that referenced this issue Jun 6, 2016

[ADAM-952] Expose sorting by reference index.
Resolves #952. Adds function `sortByReferenceIndexAndPosition` on RDDs of
`AlignmentRecord`. This sorts reads by their position on a contig, where
contigs are ordered by contig index. This conforms to the SAM/BAM sort order.

fnothaft added a commit to fnothaft/adam that referenced this issue Jun 7, 2016

[ADAM-952] Expose sorting by reference index.
Resolves #952. Adds function `sortByReferenceIndexAndPosition` on RDDs of
`AlignmentRecord`. This sorts reads by their position on a contig, where
contigs are ordered by contig index. This conforms to the SAM/BAM sort order.

fnothaft added a commit to fnothaft/adam that referenced this issue Jun 7, 2016

[ADAM-952] Expose sorting by reference index.
Resolves #952. Adds function `sortByReferenceIndexAndPosition` on RDDs of
`AlignmentRecord`. This sorts reads by their position on a contig, where
contigs are ordered by contig index. This conforms to the SAM/BAM sort order.

fnothaft added a commit to fnothaft/adam that referenced this issue Jun 7, 2016

[ADAM-952] Expose sorting by reference index.
Resolves #952. Adds function `sortByReferenceIndexAndPosition` on RDDs of
`AlignmentRecord`. This sorts reads by their position on a contig, where
contigs are ordered by contig index. This conforms to the SAM/BAM sort order.

fnothaft added a commit to fnothaft/adam that referenced this issue Jun 17, 2016

[ADAM-952] Expose sorting by reference index.
Resolves #952. Adds function `sortByReferenceIndexAndPosition` on RDDs of
`AlignmentRecord`. This sorts reads by their position on a contig, where
contigs are ordered by contig index. This conforms to the SAM/BAM sort order.

@jpdna jpdna closed this in e51bd90 Jun 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment