New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor some file writing and reading logic #450

Merged
merged 1 commit into from Nov 3, 2014

Conversation

Projects
None yet
4 participants
@ryan-williams
Member

ryan-williams commented Oct 31, 2014

  • Added adamParquetSave method that encapsulates the most common file-writing step at the end of ADAM commands, using the existing ParquetArgs options as well as an outputPath option.
  • Added a few methods for writing output inferred from outputPath file extensions
  • A few random cleanups.

Per usual, commits are in fairly logical chunks, but I'll squash if it looks good.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 31, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/338/
Test PASSed.

AmplabJenkins commented Oct 31, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/338/
Test PASSed.

@@ -64,6 +64,36 @@ class AlignmentRecordRDDFunctions(rdd: RDD[AlignmentRecord])
rdd.filter(overlapsQuery)
}
def maybeSaveBam(args: ADAMSaveArgs): Boolean = {

This comment has been minimized.

@massie

massie Nov 1, 2014

Member

I like how clean this code is. Looks good!

@massie

massie Nov 1, 2014

Member

I like how clean this code is. Looks good!

Option(read.getRecordGroupPlatformUnit).map(_.toString)))
})
})
for {

This comment has been minimized.

@massie

massie Nov 1, 2014

Member

Nice.

@massie

massie Nov 1, 2014

Member

Nice.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Nov 1, 2014

Member

+1 LGTM.

Member

massie commented Nov 1, 2014

+1 LGTM.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Nov 2, 2014

Member

Squash and rebase on master please and I'll merge this.

Member

massie commented Nov 2, 2014

Squash and rebase on master please and I'll merge this.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 2, 2014

Member

+1, LGTM as well.

Member

fnothaft commented Nov 2, 2014

+1, LGTM as well.

Refactor some file writing and reading logic:
- read “OQ” attr from structured SAMRecord field

  converting into and out of BAM using ADAM was dropping the “OQ”
  attr; HTSJDK puts it in its own field but ADAM was looking for it
  in the catch-all attributes field.

- style nits / long lines

- ADAMMain formatting

- apply predicate shorthand

- move adamBamLoad into ADAMContext

  I'm a little torn about this one, since I see the motivation for
  having AlignmentRecordContext own logic for things we know will be
  AlignmentRecords, but adamBamLoad also is called from and calls into
  (via implicit conversion from SparkContext back into ADAMContext)
  other things in ADAMContext that made it feel like a more natural
  fit inside of ADAMContext.

- factor out parquet-saving config

- factor multi-format saving out of Transform

- add helper for saving only aligned records

  a.k.a. BAM or ADAM records; no FASTQs
@ryan-williams

This comment has been minimized.

Show comment
Hide comment
@ryan-williams

ryan-williams Nov 3, 2014

Member

squashed and rebased!

Member

ryan-williams commented Nov 3, 2014

squashed and rebased!

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 3, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/348/
Test PASSed.

AmplabJenkins commented Nov 3, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/348/
Test PASSed.

massie added a commit that referenced this pull request Nov 3, 2014

Merge pull request #450 from ryan-williams/saving
Refactor some file writing and reading logic

@massie massie merged commit a970e6f into bigdatagenomics:master Nov 3, 2014

1 check passed

default Merged build finished.
Details
@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Nov 3, 2014

Member

Thanks, Ryan!

Member

massie commented Nov 3, 2014

Thanks, Ryan!

@ryan-williams ryan-williams deleted the ryan-williams:saving branch Nov 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment