Users should be able to save files as `-single` without merging them #1161

fnothaft · 2016-09-10T15:43:03Z

For example, in AlignmentRecordRDD we allow for saving to a single SAM/BAM file, where we write a header and headerless shards, and we then merge them all. The disadvantage of merging the files while running in the same Spark context is that we hold all of the cluster resources (all of the executors) while running a long single node task. We should:

Add a -headerless option, that saves as if we are going to save -single, but doesn't merge the files.
Add a CLI module to merge all of the files separate from a given Spark job.

The text was updated successfully, but these errors were encountered:

Resolves bigdatagenomics#1161. When used with `-single`, `-deferMerging` allows a user to postpone merging files that were written as if they were to be merged. Then, the user can run the `MergeShards` CLI module later to merge the shards.

fnothaft mentioned this issue Sep 11, 2016

Merging files should be multithreaded #1164

Closed

fnothaft mentioned this issue Sep 12, 2016

Refactor CLIs for merging sharded files #1167

Merged

heuermh modified the milestone: 0.20.0 Sep 13, 2016

heuermh mentioned this issue Sep 13, 2016

Release ADAM version 0.20.0 #1048

Closed

61 tasks

heuermh closed this as completed in a39dd39 Sep 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Users should be able to save files as `-single` without merging them #1161

Users should be able to save files as `-single` without merging them #1161

fnothaft commented Sep 10, 2016

Users should be able to save files as -single without merging them #1161

Users should be able to save files as -single without merging them #1161

Comments

fnothaft commented Sep 10, 2016

Users should be able to save files as `-single` without merging them #1161

Users should be able to save files as `-single` without merging them #1161