New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-843] Aggressively project out metadata fields. #844

Merged
merged 1 commit into from Oct 6, 2015

Conversation

Projects
None yet
4 participants
@fnothaft
Member

fnothaft commented Oct 2, 2015

Resolves #843. I have tested this on our cluster and seen a 2.5x performance improvement for INDEL realignment and MarkDups, and a 1.5x performance improvement for BQSR. Additionally, we see approximately a 3-4x reduction in shuffle size.

@fnothaft fnothaft added this to the 0.18.0 milestone Oct 2, 2015

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 2, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/954/
Test PASSed.

AmplabJenkins commented Oct 2, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/954/
Test PASSed.

sc.loadParquetAlignments(args.inputPath)
} else if (args.forceLoadParquet ||
args.limitProjection ||
args.useAlignedReadPredicate) {

This comment has been minimized.

@heuermh

heuermh Oct 5, 2015

Member

if either of these are present along with -force_load_[something other than parquet], should there be a warning?

@heuermh

heuermh Oct 5, 2015

Member

if either of these are present along with -force_load_[something other than parquet], should there be a warning?

This comment has been minimized.

@fnothaft
@fnothaft

fnothaft Oct 6, 2015

Member

This comment has been minimized.

@ryan-williams

ryan-williams Oct 6, 2015

Member

is it weird to have args.limitProjection || args.useAlignedReadPredicate cause interpretation of a file as parquet? would we rather throw an error if someone tries to load a .bam file with one of these flags? I'm not sure, just wondering

@ryan-williams

ryan-williams Oct 6, 2015

Member

is it weird to have args.limitProjection || args.useAlignedReadPredicate cause interpretation of a file as parquet? would we rather throw an error if someone tries to load a .bam file with one of these flags? I'm not sure, just wondering

This comment has been minimized.

@fnothaft

fnothaft Oct 6, 2015

Member

I changed this to throw an IAE.

@fnothaft

fnothaft Oct 6, 2015

Member

I changed this to throw an IAE.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 6, 2015

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/962/

Build result: FAILURE

GitHub pull request #844 of commit b878311 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/844/merge^{commit} # timeout=10 > git branch -a --contains 02db391edc474258ceb81bcab9913d4a5b6f9bfd # timeout=10 > git rev-parse remotes/origin/pr/844/merge^{commit} # timeout=10Checking out Revision 02db391edc474258ceb81bcab9913d4a5b6f9bfd (origin/pr/844/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 02db391edc474258ceb81bcab9913d4a5b6f9bfdFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Oct 6, 2015

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/962/

Build result: FAILURE

GitHub pull request #844 of commit b878311 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/844/merge^{commit} # timeout=10 > git branch -a --contains 02db391edc474258ceb81bcab9913d4a5b6f9bfd # timeout=10 > git rev-parse remotes/origin/pr/844/merge^{commit} # timeout=10Checking out Revision 02db391edc474258ceb81bcab9913d4a5b6f9bfd (origin/pr/844/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 02db391edc474258ceb81bcab9913d4a5b6f9bfdFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 6, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/963/
Test PASSed.

AmplabJenkins commented Oct 6, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/963/
Test PASSed.

@@ -28,5 +28,5 @@ import org.bdgenomics.formats.avro.AlignmentRecord
*/
object AlignmentRecordField extends FieldEnumeration(AlignmentRecord.SCHEMA$) {
val contig, start, end, mapq, readName, sequence, mateAlignmentStart, cigar, qual, recordGroupId, recordGroupName, readPaired, properPair, readMapped, mateMapped, readNegativeStrand, mateNegativeStrand, firstOfPair, secondOfPair, primaryAlignment, failedVendorQualityChecks, duplicateRead, mismatchingPositions, attributes, recordGroupSequencingCenter, recordGroupDescription, recordGroupRunDateEpoch, recordGroupFlowOrder, recordGroupKeySequence, recordGroupLibrary, recordGroupPredictedMedianInsertSize, recordGroupPlatform, recordGroupPlatformUnit, recordGroupSample, mateContig, origQual, supplmentaryAlignment = SchemaValue
val contig, start, end, mapq, readName, sequence, mateAlignmentStart, cigar, qual, recordGroupId, recordGroupName, readPaired, properPair, readMapped, mateMapped, readNegativeStrand, mateNegativeStrand, firstOfPair, secondOfPair, primaryAlignment, failedVendorQualityChecks, duplicateRead, mismatchingPositions, attributes, recordGroupSequencingCenter, recordGroupDescription, recordGroupRunDateEpoch, recordGroupFlowOrder, recordGroupKeySequence, recordGroupLibrary, recordGroupPredictedMedianInsertSize, recordGroupPlatform, recordGroupPlatformUnit, recordGroupSample, mateContig, origQual, supplementaryAlignment, secondaryAlignment = SchemaValue

This comment has been minimized.

@ryan-williams

ryan-williams Oct 6, 2015

Member

worth wrapping this line? hard to tell what changed

@ryan-williams

ryan-williams Oct 6, 2015

Member

worth wrapping this line? hard to tell what changed

This comment has been minimized.

@fnothaft

fnothaft Oct 6, 2015

Member

If you wrap the line, the scripts/format-source script helpfully unwraps the line. Sigh! I agree though.

@fnothaft

fnothaft Oct 6, 2015

Member

If you wrap the line, the scripts/format-source script helpfully unwraps the line. Sigh! I agree though.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 6, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/971/
Test PASSed.

AmplabJenkins commented Oct 6, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/971/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 6, 2015

Member

Thanks!

Member

heuermh commented Oct 6, 2015

Thanks!

heuermh added a commit that referenced this pull request Oct 6, 2015

Merge pull request #844 from fnothaft/project-fewer-fields
[ADAM-843] Aggressively project out metadata fields.

@heuermh heuermh merged commit 9d78117 into bigdatagenomics:master Oct 6, 2015

1 check passed

default Merged build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment