Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor some file writing and reading logic #450

Merged
merged 1 commit into from
Nov 3, 2014

Conversation

ryan-williams
Copy link
Member

  • Added adamParquetSave method that encapsulates the most common file-writing step at the end of ADAM commands, using the existing ParquetArgs options as well as an outputPath option.
  • Added a few methods for writing output inferred from outputPath file extensions
  • A few random cleanups.

Per usual, commits are in fairly logical chunks, but I'll squash if it looks good.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/338/
Test PASSed.

@@ -64,6 +64,36 @@ class AlignmentRecordRDDFunctions(rdd: RDD[AlignmentRecord])
rdd.filter(overlapsQuery)
}

def maybeSaveBam(args: ADAMSaveArgs): Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how clean this code is. Looks good!

@massie
Copy link
Member

massie commented Nov 1, 2014

+1 LGTM.

@massie
Copy link
Member

massie commented Nov 2, 2014

Squash and rebase on master please and I'll merge this.

@fnothaft
Copy link
Member

fnothaft commented Nov 2, 2014

+1, LGTM as well.

- read “OQ” attr from structured SAMRecord field

  converting into and out of BAM using ADAM was dropping the “OQ”
  attr; HTSJDK puts it in its own field but ADAM was looking for it
  in the catch-all attributes field.

- style nits / long lines

- ADAMMain formatting

- apply predicate shorthand

- move adamBamLoad into ADAMContext

  I'm a little torn about this one, since I see the motivation for
  having AlignmentRecordContext own logic for things we know will be
  AlignmentRecords, but adamBamLoad also is called from and calls into
  (via implicit conversion from SparkContext back into ADAMContext)
  other things in ADAMContext that made it feel like a more natural
  fit inside of ADAMContext.

- factor out parquet-saving config

- factor multi-format saving out of Transform

- add helper for saving only aligned records

  a.k.a. BAM or ADAM records; no FASTQs
@ryan-williams
Copy link
Member Author

squashed and rebased!

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/348/
Test PASSed.

massie added a commit that referenced this pull request Nov 3, 2014
Refactor some file writing and reading logic
@massie massie merged commit a970e6f into bigdatagenomics:master Nov 3, 2014
@massie
Copy link
Member

massie commented Nov 3, 2014

Thanks, Ryan!

@ryan-williams ryan-williams deleted the saving branch November 3, 2014 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants