You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, in AlignmentRecordRDD we allow for saving to a single SAM/BAM file, where we write a header and headerless shards, and we then merge them all. The disadvantage of merging the files while running in the same Spark context is that we hold all of the cluster resources (all of the executors) while running a long single node task. We should:
Add a -headerless option, that saves as if we are going to save -single, but doesn't merge the files.
Add a CLI module to merge all of the files separate from a given Spark job.
The text was updated successfully, but these errors were encountered:
Resolvesbigdatagenomics#1161. When used with `-single`, `-deferMerging` allows a user to
postpone merging files that were written as if they were to be merged. Then,
the user can run the `MergeShards` CLI module later to merge the shards.
Resolvesbigdatagenomics#1161. When used with `-single`, `-deferMerging` allows a user to
postpone merging files that were written as if they were to be merged. Then,
the user can run the `MergeShards` CLI module later to merge the shards.
For example, in
AlignmentRecordRDD
we allow for saving to a single SAM/BAM file, where we write a header and headerless shards, and we then merge them all. The disadvantage of merging the files while running in the same Spark context is that we hold all of the cluster resources (all of the executors) while running a long single node task. We should:-headerless
option, that saves as if we are going to save-single
, but doesn't merge the files.The text was updated successfully, but these errors were encountered: