New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Vcf2ADAM and ADAM2Vcf into TransformGenotypes and TransformVariants #1532

Merged
merged 2 commits into from May 31, 2017

Conversation

Projects
4 participants
@heuermh
Member

heuermh commented May 17, 2017

Work in progress.

  • Update docs
  • Failing unit tests due to FT Number=. vs. FT Number=1
  • Test on cluster
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2020/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1532/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains ca1492e # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1532/merge^{commit} # timeout=10Checking out Revision ca1492e (origin/pr/1532/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f ca1492eca4df1318f9b141d20b1764ce8e7047bcFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented May 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2020/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1532/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains ca1492e # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1532/merge^{commit} # timeout=10Checking out Revision ca1492e (origin/pr/1532/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f ca1492eca4df1318f9b141d20b1764ce8e7047bcFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2022/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1532/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 88e8596f3faf13b75fb2f7517083f30d0ef4d246 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1532/merge^{commit} # timeout=10Checking out Revision 88e8596f3faf13b75fb2f7517083f30d0ef4d246 (origin/pr/1532/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 88e8596f3faf13b75fb2f7517083f30d0ef4d246First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented May 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2022/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1532/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 88e8596f3faf13b75fb2f7517083f30d0ef4d246 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1532/merge^{commit} # timeout=10Checking out Revision 88e8596f3faf13b75fb2f7517083f30d0ef4d246 (origin/pr/1532/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 88e8596f3faf13b75fb2f7517083f30d0ef4d246First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 18, 2017

Coverage Status

Coverage increased (+0.2%) to 82.437% when pulling 1a60973 on heuermh:transform-variants into 37b971a on bigdatagenomics:master.

coveralls commented May 18, 2017

Coverage Status

Coverage increased (+0.2%) to 82.437% when pulling 1a60973 on heuermh:transform-variants into 37b971a on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 18, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2025/
Test PASSed.

AmplabJenkins commented May 18, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2025/
Test PASSed.

@fnothaft

Good start, I have a variety of misc suggestions. Specifically, I'd like to remove the saveAsVcf functions and funnel those through VariantContextRDD.

Show outdated Hide outdated adam-cli/src/main/scala/org/bdgenomics/adam/cli/TransformGenotypes.scala Outdated
Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/variant/GenotypeRDD.scala Outdated
Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/variant/GenotypeRDD.scala Outdated
Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/variant/VariantRDD.scala Outdated
Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/variant/VariantRDD.scala Outdated
@@ -0,0 +1,74 @@
##fileformat=VCFv4.2

This comment has been minimized.

@fnothaft

fnothaft May 19, 2017

Member

We already have a adam-core/src/test/resources/sorted.vcf. Why is this file needed?

@fnothaft

fnothaft May 19, 2017

Member

We already have a adam-core/src/test/resources/sorted.vcf. Why is this file needed?

This comment has been minimized.

@heuermh

heuermh May 21, 2017

Member

sorted-variants.vcf is the site-only/-only_variants version for TransformVariants

@heuermh

heuermh May 21, 2017

Member

sorted-variants.vcf is the site-only/-only_variants version for TransformVariants

This comment has been minimized.

@fnothaft

fnothaft May 21, 2017

Member

Ah, I see! Nevermind.

@fnothaft

fnothaft May 21, 2017

Member

Ah, I see! Nevermind.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 22, 2017

Coverage Status

Coverage increased (+0.6%) to 82.845% when pulling bcb3a2f on heuermh:transform-variants into 37b971a on bigdatagenomics:master.

coveralls commented May 22, 2017

Coverage Status

Coverage increased (+0.6%) to 82.845% when pulling bcb3a2f on heuermh:transform-variants into 37b971a on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 22, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2036/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1532/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 900d7dc0591c43a8bf89d87289b2dc7cea4dec47 # timeout=10Checking out Revision 900d7dc0591c43a8bf89d87289b2dc7cea4dec47 (origin/pr/1532/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 900d7dc0591c43a8bf89d87289b2dc7cea4dec47First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented May 22, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2036/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1532/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 900d7dc0591c43a8bf89d87289b2dc7cea4dec47 # timeout=10Checking out Revision 900d7dc0591c43a8bf89d87289b2dc7cea4dec47 (origin/pr/1532/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 900d7dc0591c43a8bf89d87289b2dc7cea4dec47First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 22, 2017

Coverage Status

Coverage increased (+0.6%) to 82.845% when pulling b14b94e on heuermh:transform-variants into 37b971a on bigdatagenomics:master.

coveralls commented May 22, 2017

Coverage Status

Coverage increased (+0.6%) to 82.845% when pulling b14b94e on heuermh:transform-variants into 37b971a on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 22, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2038/
Test PASSed.

AmplabJenkins commented May 22, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2038/
Test PASSed.

@fnothaft fnothaft added this to the 0.23.0 milestone May 24, 2017

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 26, 2017

Coverage Status

Coverage increased (+0.9%) to 82.796% when pulling baf789d on heuermh:transform-variants into fc3e5fd on bigdatagenomics:master.

coveralls commented May 26, 2017

Coverage Status

Coverage increased (+0.9%) to 82.796% when pulling baf789d on heuermh:transform-variants into fc3e5fd on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2068/
Test PASSed.

AmplabJenkins commented May 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2068/
Test PASSed.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 30, 2017

Coverage Status

Coverage increased (+0.9%) to 82.807% when pulling 96af178 on heuermh:transform-variants into b0c5c5f on bigdatagenomics:master.

coveralls commented May 30, 2017

Coverage Status

Coverage increased (+0.9%) to 82.807% when pulling 96af178 on heuermh:transform-variants into b0c5c5f on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 30, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2072/
Test PASSed.

AmplabJenkins commented May 30, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2072/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 30, 2017

Member

@heuermh Ping me when this is ready for review.

Member

fnothaft commented May 30, 2017

@heuermh Ping me when this is ready for review.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh May 30, 2017

Member

The commit history looks a mess, otherwise I think it is ready to review.

Member

heuermh commented May 30, 2017

The commit history looks a mess, otherwise I think it is ready to review.

@fnothaft

Two small bits! Otherwise looks really good. Thanks @heuermh!

optProjection = None,
stringency = stringency)
if (args.outputPath.endsWith(".vcf")) {

This comment has been minimized.

@fnothaft

fnothaft May 31, 2017

Member

+1! I like this refactor. One thing here though is that we lost the check for compressed formats (e.g., ".vcf.gz").

@fnothaft

fnothaft May 31, 2017

Member

+1! I like this refactor. One thing here though is that we lost the check for compressed formats (e.g., ".vcf.gz").

This comment has been minimized.

@heuermh

heuermh May 31, 2017

Member

There wasn't a check for compressed formats before, and I don't think we support writing compressed VCF through HadoopBAM. With a code change to use FileExtensions.isVcfExt

$ ./bin/adam-submit transformVariants \
  -single \
  adam-core/src/test/resources/small.vcf \
  small.vcf.gz

$ gzcat small.vcf.gz 
gzcat: small.vcf.gz: not in gzip format

$ head small.vcf.gz 
##fileformat=VCFv4.2
##FILTER=<ID=IndelFS,Description="FS > 200.0">
##FILTER=<ID=IndelQD,Description="QD < 2.0">
...

$ ./bin/adam-submit transformVariants \
  -single \
  adam-core/src/test/resources/small.vcf \
  small.vcf.bgz

$ head small.vcf.bgz 
##fileformat=VCFv4.2
##FILTER=<ID=IndelFS,Description="FS > 200.0">
##FILTER=<ID=IndelQD,Description="QD < 2.0">
...

$ ./bin/adam-submit transformVariants \
  -single \
  adam-core/src/test/resources/small.vcf \
  small.vcf.bgzf

Command body threw exception:
java.lang.AssertionError: assertion failed: BCF not yet supported
Exception in thread "main" java.lang.AssertionError: assertion failed: BCF not yet supported
	at scala.Predef$.assert(Predef.scala:170)
	at org.bdgenomics.adam.rdd.variant.VariantContextRDD.saveAsVcf(VariantContextRDD.scala:164)
	at org.bdgenomics.adam.rdd.variant.VariantContextRDD.saveAsVcf(VariantContextRDD.scala:135)
	at org.bdgenomics.adam.cli.TransformVariants.run(TransformVariants.scala:128)
	at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
	at org.bdgenomics.adam.cli.TransformVariants.run(TransformVariants.scala:75)
...

Do we have to set a configuration parameter or otherwise kick HadoopBAM to write .gz or .bgz/.bgzf?

@heuermh

heuermh May 31, 2017

Member

There wasn't a check for compressed formats before, and I don't think we support writing compressed VCF through HadoopBAM. With a code change to use FileExtensions.isVcfExt

$ ./bin/adam-submit transformVariants \
  -single \
  adam-core/src/test/resources/small.vcf \
  small.vcf.gz

$ gzcat small.vcf.gz 
gzcat: small.vcf.gz: not in gzip format

$ head small.vcf.gz 
##fileformat=VCFv4.2
##FILTER=<ID=IndelFS,Description="FS > 200.0">
##FILTER=<ID=IndelQD,Description="QD < 2.0">
...

$ ./bin/adam-submit transformVariants \
  -single \
  adam-core/src/test/resources/small.vcf \
  small.vcf.bgz

$ head small.vcf.bgz 
##fileformat=VCFv4.2
##FILTER=<ID=IndelFS,Description="FS > 200.0">
##FILTER=<ID=IndelQD,Description="QD < 2.0">
...

$ ./bin/adam-submit transformVariants \
  -single \
  adam-core/src/test/resources/small.vcf \
  small.vcf.bgzf

Command body threw exception:
java.lang.AssertionError: assertion failed: BCF not yet supported
Exception in thread "main" java.lang.AssertionError: assertion failed: BCF not yet supported
	at scala.Predef$.assert(Predef.scala:170)
	at org.bdgenomics.adam.rdd.variant.VariantContextRDD.saveAsVcf(VariantContextRDD.scala:164)
	at org.bdgenomics.adam.rdd.variant.VariantContextRDD.saveAsVcf(VariantContextRDD.scala:135)
	at org.bdgenomics.adam.cli.TransformVariants.run(TransformVariants.scala:128)
	at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
	at org.bdgenomics.adam.cli.TransformVariants.run(TransformVariants.scala:75)
...

Do we have to set a configuration parameter or otherwise kick HadoopBAM to write .gz or .bgz/.bgzf?

This comment has been minimized.

@fnothaft

fnothaft May 31, 2017

Member

Ah, sorry about that, you are correct that we haven't been checking that for output.

That said, I read through the source and Hadoop-BAM writes BGZIP if you set the FileOutputFormat.COMPRESS configuration value to true.

@fnothaft

fnothaft May 31, 2017

Member

Ah, sorry about that, you are correct that we haven't been checking that for output.

That said, I read through the source and Hadoop-BAM writes BGZIP if you set the FileOutputFormat.COMPRESS configuration value to true.

This comment has been minimized.

@heuermh

heuermh May 31, 2017

Member

How about leaving this pr as-is then, squashed to two commits to keep commit c1b7375 separate, and then a new pr to add support for writing .gz (as BGZIP VCF) and .bgz/.bgzf (also as BGZIP VCF, I believe the exception error message above is incorrect)?

@heuermh

heuermh May 31, 2017

Member

How about leaving this pr as-is then, squashed to two commits to keep commit c1b7375 separate, and then a new pr to add support for writing .gz (as BGZIP VCF) and .bgz/.bgzf (also as BGZIP VCF, I believe the exception error message above is incorrect)?

optProjection = None,
stringency = stringency)
if (args.outputPath.endsWith(".vcf")) {

This comment has been minimized.

@fnothaft

fnothaft May 31, 2017

Member

Ditto here RE: compression.

@fnothaft

fnothaft May 31, 2017

Member

Ditto here RE: compression.

@@ -150,7 +150,7 @@ object DefaultHeaderLines {
VCFHeaderLineType.Float,
"Read-backed phasing quality")
lazy val genotypeFilter = new VCFFormatHeaderLine("FT",
1,
VCFHeaderLineCount.UNBOUNDED,

This comment has been minimized.

@fnothaft

fnothaft May 31, 2017

Member

Can you keep this split out into a different commit so that we have a single commit per issue? No sweat if you've already squashed.

@fnothaft

fnothaft May 31, 2017

Member

Can you keep this split out into a different commit so that we have a single commit per issue? No sweat if you've already squashed.

This comment has been minimized.

@heuermh

heuermh May 31, 2017

Member

Will do

@heuermh

heuermh May 31, 2017

Member

Will do

@fnothaft

Moving the compression stuff to a later issue works for me! Thanks @heuermh.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 31, 2017

Coverage Status

Coverage increased (+0.6%) to 82.796% when pulling 504e325 on heuermh:transform-variants into bbd695e on bigdatagenomics:master.

coveralls commented May 31, 2017

Coverage Status

Coverage increased (+0.6%) to 82.796% when pulling 504e325 on heuermh:transform-variants into bbd695e on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 31, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2075/
Test PASSed.

AmplabJenkins commented May 31, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2075/
Test PASSed.

@fnothaft fnothaft merged commit b7762c2 into bigdatagenomics:master May 31, 2017

2 of 3 checks passed

codacy/pr Not so good... This pull request quality could be better.
Details
coverage/coveralls Coverage increased (+0.6%) to 82.796%
Details
default Merged build finished.
Details
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 31, 2017

Member

Merged! Thanks @heuermh!

Member

fnothaft commented May 31, 2017

Merged! Thanks @heuermh!

@heuermh heuermh deleted the heuermh:transform-variants branch May 31, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment