Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evoquer: a BigQuery-based joint calling tool #6011

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

droazen
Copy link
Collaborator

@droazen droazen commented Jun 19, 2019

No description provided.

@codecov
Copy link

codecov bot commented Aug 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@41db9df). Click here to learn what that means.
The diff coverage is 0.714%.

@@            Coverage Diff            @@
##             master    #6011   +/-   ##
=========================================
  Coverage          ?   7.002%           
  Complexity        ?     2962           
=========================================
  Files             ?     1998           
  Lines             ?   150096           
  Branches          ?    16654           
=========================================
  Hits              ?    10509           
  Misses            ?   138811           
  Partials          ?      776
Impacted Files Coverage Δ Complexity Δ
...org/broadinstitute/hellbender/engine/GATKTool.java 59.259% <ø> (ø) 65 <0> (?)
...rgumentcollections/IntervalArgumentCollection.java 47.917% <ø> (ø) 11 <0> (?)
...roadinstitute/hellbender/utils/SimpleInterval.java 40.909% <ø> (ø) 17 <0> (?)
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 21.333% <ø> (ø) 9 <0> (?)
...ute/hellbender/utils/variant/GATKVCFConstants.java 0% <ø> (ø) 0 <0> (?)
...ender/utils/nio/SeekableByteChannelPrefetcher.java 0% <ø> (ø) 0 <0> (?)
...notyper/GenotypeCalculationArgumentCollection.java 0% <ø> (ø) 0 <0> (?)
...annotator/allelespecific/AS_RMSMappingQuality.java 0% <0%> (ø) 0 <0> (?)
...bender/tools/walkers/annotator/StrandBiasTest.java 0% <0%> (ø) 0 <0> (?)
...titute/hellbender/utils/bigquery/ServiceUtils.java 0% <0%> (ø) 0 <0> (?)
... and 22 more

@droazen
Copy link
Collaborator Author

droazen commented Sep 13, 2019

I've rebased this branch and updated it to the latest GnarlyGenotyper. It looks like with the new version of google-cloud-java our dependency conflict with BigQuery has been resolved, so we can work towards getting this merged.

@droazen droazen force-pushed the dr_jts_lb_evoquer branch 4 times, most recently from 0e19356 to a67919b Compare October 17, 2019 20:17
@droazen droazen force-pushed the dr_jts_lb_evoquer branch 2 times, most recently from fe652b1 to cf7d02b Compare November 7, 2019 13:53
@droazen droazen changed the title Evoquer: the "group by" version (WIP -- DO NOT MERGE) Evoquer: a BigQuery-based joint calling tool Nov 7, 2019
Fix genotype allele remapping issue

Set samplesAreUniquified = false in ref confidence merger to avoid sample name mangling

Add a hack to skip sites with the wrong number of values in allele-specific annotations

This is necessary to work with our current test datasets

More robust hack to deal with malformed allele-specific annotations

evoquer.sh improvements

Handle missing values (".") in AS_QualByDepth

Add additional VCF header line declarations required by the GnarlyGenotyper

Avoid potential NPE in onShutdown()

Update test dataset name

Keep the NON_REF allele during merging when running the GnarlyGenotyper

Comment out empty annotations in GnarlyGenotyper

Updates to evoquer.wdl

Add clarification about removal of NON_REF allele

More evoquer.wdl updates

Support raw precomputed AVRO inputs

Add a --run-query-in-batch-mode argument

More WDL updates

Add WDLs

Set PIPELINE_MAX_ALT_COUNT to 50 in the GnarlyGenotyper to work around an IndexOutOfBoundsException

dalio run 2 WDL settings

Switch to 'evoquer_dalio40k_july_updated_dataset_map'

Replace USING with ON in the main query, and adjust WHERE clause for better query performance

GnarlyGenotyperEngine: create GenotypeLikelihoodCalculators on-the-fly when there are too many alleles

Add dalio WDL + supporting files to scripts/unsupported/evoquer

Update WDL to use clustered versions of the dalio tables

Remove old WDL directory to avoid potential confusion

Revert "Replace USING with ON in the main query, and adjust WHERE clause for better query performance"

This reverts commit 80e5140.

Switch to the latest gatk/master version of the GnarlyGenotyper

Switch back to shaded NIO classes

added optimized query

Andrea bug fix: don't reuse Avro records in Storage API

evoquer integration tests

Remove temporary seenPositions hack from the EvoquerEngine

Fix some failing tests

Fix post-rebase breakage

fix for jackson dependency issue

only update minicluster to 3+

updating to use hadoop3 everywhere
adding  NO_GCE_CHECK=true

make tests cloudy

make errors more verbose

replace 2.11 with scalaVersion

Update Evoquer integration test expected output

Update to google-cloud-java 0.117.0

Don't hardcode the project ID in executeQueryWithStorageAPI()

Update protobuf-java to 3.8.0

SortingCollection
* fix bug in median calculation along with unit test

* Parameterize table name to use for list of samples, and update query to only extract that list of samples.  This adds a new capability of subsetting, but also fixes a bug where the unrestricted query extracted samples that were not in the sample_list and thus the inferred genotypes could be incorrect

* Function to dry-run query in order to get the estimated bytes processed

* fixed test case

* revert changes for ExcessHet to match master

* Disable CNNVariantPipelineTest.testTrainingReadModel until failures are resolved. (#6331)

Co-authored-by: Andrea Haessly <ahaessly@broadinstitute.org>
Co-authored-by: Chris Norman <cnorman@broadinstitute.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants