Refactor CLIs for merging sharded files #1167

Merged
merged 4 commits into from Sep 15, 2016

Conversation

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Sep 12, 2016

Resolves #1160, #1161, #1165.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 12, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1481/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1481/
Test PASSed.

@@ -43,6 +43,8 @@ class Fragments2ReadsArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetAr
var asSingleFile: Boolean = false
@Args4jOption(required = false, name = "-sort_reads", usage = "Sort the reads by referenceId and read position")
var sortReads: Boolean = false
+ @Args4jOption(required = false, name = "-deferMerging", usage = "Defers merging single file output")

This comment has been minimized.

@heuermh

heuermh Sep 13, 2016

Member

-camelCase or -snake_case?

@heuermh

heuermh Sep 13, 2016

Member

-camelCase or -snake_case?

+ outputPath,
+ tailPath,
+ optHeaderPath = Some(headPath),
+ writeEmptyGzipBlock = !asSam)

This comment has been minimized.

@heuermh

heuermh Sep 13, 2016

Member

will this boolean writeEmptyGzipBlock flag still work ok when CRAM is added?

@heuermh

heuermh Sep 13, 2016

Member

will this boolean writeEmptyGzipBlock flag still work ok when CRAM is added?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 14, 2016

Member

Just added some tests. This is good to go from my side. Can I get a review/merge? Also, let me know if people would like ba2c484 squashed down.

Member

fnothaft commented Sep 14, 2016

Just added some tests. This is good to go from my side. Can I get a review/merge? Also, let me know if people would like ba2c484 squashed down.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 14, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1491/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1491/
Test PASSed.

+ @Args4jOption(required = false, name = "-header_path", usage = "Optional path to a header")
+ var headerPath: String = null
+ @Args4jOption(required = false,
+ name = "-bufferSize",

This comment has been minimized.

@heuermh

heuermh Sep 14, 2016

Member

sorry, another -camelCase to -snake_case

@heuermh

heuermh Sep 14, 2016

Member

sorry, another -camelCase to -snake_case

+ // check for buffer size in option, if not in option, check hadoop conf,
+ // if not in hadoop conf, fall back on 4MB
+ val bufferSize = optBufferSize.getOrElse(conf.getInt(BUFFER_SIZE_CONF,
+ 4 * 1024 * 1024))

This comment has been minimized.

@heuermh

heuermh Sep 14, 2016

Member

can this default be mentioned at the command line above, say

usage = "Buffer size for merging single file output, default 4MB"
@heuermh

heuermh Sep 14, 2016

Member

can this default be mentioned at the command line above, say

usage = "Buffer size for merging single file output, default 4MB"

This comment has been minimized.

@fnothaft

fnothaft Sep 15, 2016

Member

SGTM! I've addressed this in MergeShardsArgs.

@fnothaft

fnothaft Sep 15, 2016

Member

SGTM! I've addressed this in MergeShardsArgs.

fnothaft added some commits Sep 11, 2016

[ADAM-1161] Add -deferMerging and MergeShards.
Resolves #1161. When used with `-single`, `-deferMerging` allows a user to
postpone merging files that were written as if they were to be merged. Then,
the user can run the `MergeShards` CLI module later to merge the shards.
Add tests for MergeShards CLI.
* For using with CRAM, artificial.fa relies on having it's FASTA index at a
  known location. This is hard to do with the `copyResource` function, so I just
  added a symlink.
* Moved `org.bdgenomics.adam.rdd.FileMergerSuite` to it's correct location in
  the source tree.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 15, 2016

Member

Rebased and addressed review comments.

Member

fnothaft commented Sep 15, 2016

Rebased and addressed review comments.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 15, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1493/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1493/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 15, 2016

Member

Nice new Github review feature, don't have to leave LGTM comments any more. Well, except there's this one.

Member

heuermh commented Sep 15, 2016

Nice new Github review feature, don't have to leave LGTM comments any more. Well, except there's this one.

@heuermh heuermh merged commit ca4d964 into bigdatagenomics:master Sep 15, 2016

1 check passed

default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 15, 2016

Member

Merged as multiple commits.

Thank you, @fnothaft!

Member

heuermh commented Sep 15, 2016

Merged as multiple commits.

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment