New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix writing variants to GCS buckets #3485
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3485 +/- ##
=============================================
+ Coverage 79.94% 79.94% +<.001%
Complexity 17897 17897
=============================================
Files 1198 1199 +1
Lines 64980 64986 +6
Branches 10120 10120
=============================================
+ Hits 51945 51950 +5
+ Misses 9002 9001 -1
- Partials 4033 4035 +2
|
@@ -118,7 +118,7 @@ private static void writeVariantsSingle( | |||
} | |||
|
|||
final JavaRDD<VariantContext> sortedVariants = sortVariants(variants, header, numReducers); | |||
final String outputPartsDirectory = outputFile + ".parts"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a test that writes variants to gcs so we don't break this again? There's a PrintReadsSparkIntegrationTest
that does it for bams, a ReadSparkSinkUnit
test would be ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwhite It would be good to add a test so we never the break the vcf writing again without knowing. JP added the gcs connector to our test dependencies so we can use it to write to gcs via spark during local tests as long permissions are configured correctly. (which they are on travis)
f900d54
to
4217b1b
Compare
* For this to work, the settings in src/main/resources/core-site.xml must be correct, | ||
* and the project name and credential file it points to must be present. | ||
*/ | ||
@Test(dataProvider = "gcsTestingData", groups = "bucket") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this is great
4217b1b
to
f375452
Compare
Writing reads was fixed in 73f2a62, but unfortunately the same problem occurs with variants.
This commit (a861a23) fixes the problem for variants, when deployed with a Hadoop-BAM fix (HadoopGenomics/Hadoop-BAM#143).