New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform DAG causes stages to recompute #883

Closed
fnothaft opened this Issue Nov 16, 2015 · 0 comments

Comments

Projects
None yet
1 participant
@fnothaft
Member

fnothaft commented Nov 16, 2015

We need to add caching before Indel Realignment and BQSR when running multiple stages in a pipeline.

@fnothaft fnothaft self-assigned this Nov 16, 2015

fnothaft added a commit to fnothaft/adam that referenced this issue Nov 18, 2015

[ADAM-883] Add caching to Transform pipeline.
The Transform pipeline in the CLI has several stages (e.g., sort, indel
realignment, BQSR) that trigger recomputation. If you are running a single
stage off of local storage/HDFS/Tachyon, this is OK. However, if you're running
multiple stages, or you are loading data from S3/etc, this can lead to serious
performance degradation. To address this, I've added the proper caching
statements. Additionally, I've added a hook so that the user can specify the
storage level to use for caching. Resolves #883.

fnothaft added a commit to fnothaft/adam that referenced this issue Nov 19, 2015

[ADAM-883] Add caching to Transform pipeline.
The Transform pipeline in the CLI has several stages (e.g., sort, indel
realignment, BQSR) that trigger recomputation. If you are running a single
stage off of local storage/HDFS/Tachyon, this is OK. However, if you're running
multiple stages, or you are loading data from S3/etc, this can lead to serious
performance degradation. To address this, I've added the proper caching
statements. Additionally, I've added a hook so that the user can specify the
storage level to use for caching. Resolves #883.

@heuermh heuermh closed this in #884 Nov 19, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment