New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users should be able to save files as `-single` without merging them #1161

Closed
fnothaft opened this Issue Sep 10, 2016 · 0 comments

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Sep 10, 2016

For example, in AlignmentRecordRDD we allow for saving to a single SAM/BAM file, where we write a header and headerless shards, and we then merge them all. The disadvantage of merging the files while running in the same Spark context is that we hold all of the cluster resources (all of the executors) while running a long single node task. We should:

  • Add a -headerless option, that saves as if we are going to save -single, but doesn't merge the files.
  • Add a CLI module to merge all of the files separate from a given Spark job.

fnothaft added a commit to fnothaft/adam that referenced this issue Sep 12, 2016

[ADAM-1161] Add -deferMerging and MergeShards.
Resolves #1161. When used with `-single`, `-deferMerging` allows a user to
postpone merging files that were written as if they were to be merged. Then,
the user can run the `MergeShards` CLI module later to merge the shards.

@heuermh heuermh modified the milestone: 0.20.0 Sep 13, 2016

@heuermh heuermh referenced this issue Sep 13, 2016

Closed

Release ADAM version 0.20.0 #1048

47 of 61 tasks complete

fnothaft added a commit to fnothaft/adam that referenced this issue Sep 14, 2016

[ADAM-1161] Add -deferMerging and MergeShards.
Resolves #1161. When used with `-single`, `-deferMerging` allows a user to
postpone merging files that were written as if they were to be merged. Then,
the user can run the `MergeShards` CLI module later to merge the shards.

@heuermh heuermh closed this in a39dd39 Sep 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment