Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix writing variants to GCS buckets #3485

Merged
merged 1 commit into from Sep 7, 2017
Merged

Conversation

tomwhite
Copy link
Contributor

Writing reads was fixed in 73f2a62, but unfortunately the same problem occurs with variants.

This commit (a861a23) fixes the problem for variants, when deployed with a Hadoop-BAM fix (HadoopGenomics/Hadoop-BAM#143).

@codecov-io
Copy link

codecov-io commented Aug 23, 2017

Codecov Report

Merging #3485 into master will increase coverage by <.001%.
The diff coverage is 71.429%.

@@              Coverage Diff              @@
##             master    #3485       +/-   ##
=============================================
+ Coverage     79.94%   79.94%   +<.001%     
  Complexity    17897    17897               
=============================================
  Files          1198     1199        +1     
  Lines         64980    64986        +6     
  Branches      10120    10120               
=============================================
+ Hits          51945    51950        +5     
+ Misses         9002     9001        -1     
- Partials       4033     4035        +2
Impacted Files Coverage Δ Complexity Δ
...er/engine/spark/datasources/VariantsSparkSink.java 83.019% <100%> (ø) 11 <0> (ø) ⬇️
...nder/tools/spark/pipelines/PrintVariantsSpark.java 66.667% <66.667%> (ø) 2 <2> (?)
...er/tools/spark/sv/discovery/AlignmentInterval.java 87.963% <0%> (-0.926%) 50% <0%> (-2%)
...te/hellbender/engine/spark/VariantWalkerSpark.java 74.468% <0%> (+2.128%) 14% <0%> (ø) ⬇️
...e/hellbender/engine/spark/SparkContextFactory.java 73.973% <0%> (+2.74%) 11% <0%> (ø) ⬇️

@@ -118,7 +118,7 @@ private static void writeVariantsSingle(
}

final JavaRDD<VariantContext> sortedVariants = sortVariants(variants, header, numReducers);
final String outputPartsDirectory = outputFile + ".parts";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a test that writes variants to gcs so we don't break this again? There's a PrintReadsSparkIntegrationTest that does it for bams, a ReadSparkSinkUnit test would be ideal.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwhite It would be good to add a test so we never the break the vcf writing again without knowing. JP added the gcs connector to our test dependencies so we can use it to write to gcs via spark during local tests as long permissions are configured correctly. (which they are on travis)

@tomwhite tomwhite force-pushed the tw_write_variants_gcs branch 2 times, most recently from f900d54 to 4217b1b Compare August 30, 2017 15:30
* For this to work, the settings in src/main/resources/core-site.xml must be correct,
* and the project name and credential file it points to must be present.
*/
@Test(dataProvider = "gcsTestingData", groups = "bucket")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this is great

@tomwhite tomwhite dismissed lbergelson’s stale review September 7, 2017 09:52

Added requested changes

@tomwhite tomwhite merged commit 80d8662 into master Sep 7, 2017
@tomwhite tomwhite deleted the tw_write_variants_gcs branch September 7, 2017 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants