Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lb merge gvs branch #8248

Closed
wants to merge 664 commits into from
Closed

lb merge gvs branch #8248

wants to merge 664 commits into from

Conversation

lbergelson
Copy link
Member

@lbergelson lbergelson commented Mar 17, 2023

rsasch and others added 30 commits September 29, 2021 14:49
* add param to just print final query
* update print message
* export VAT to TSV

* call export vat task

* adding array values

* pass the normalized vcf

* clean up date spacing

* loop through vms rather than in the task

* remove partitioning because we are not going to do it for now
* add filtering on the FT tag level

* remove duplicates by 4 cols, not only 2

* make annotations machine bigger

* add duplicate count info

* dont rename things---use the names that already exist!

* drop all the excessive AS_QUALapprox info since it just adds weight

* swap gzip

* check with Lee about VAT - markings

* template update

* add in line counts for metrics

* get the count of high alt allele sites

* AC_hemi for all

* add new docker image

* get sample counts

* clean up files for memory preservation

* re-order errors

* pass dropped variants

* comments cleanup
* removed streaming

* removed streaming

* support for indexes as input

* move to same directory

* move to same directory

* move to same directory part two

* comments

* comments

* comments
* First implementation of working Reference Ranges-based Extract!

* added integration test

* PR cleanup

* parameter cleanup

* fix for upstream deletions

* comments from PR

* updated post rebase

* cherry pick

* Set gcloud config directory in travis (#7525)

* Set gcloud config directory in travis

* This fixes an issue while installing gcloud on travis due to permissions in the root directories

Co-authored-by: Louis Bergelson <louisb@broadinstitute.org>
* fix bug copying files

* expose branch
* add reblocking wdl
* rename wdl
* make serivce account and site_id optional
* add requester pays option
* fix basename
* change all outputs to file
* change outputs to external bucket path
* use pending BQ writer
* finalize stream before commit
* update create tables for ref ranges
* fix flushing and closing
* use python to update data model
* return error if response is not 200
* add bq table creation to assign ids, cleanup
* update import with ref_ranges as default and 40 as drop state. update docs
* fix tables for duplicate check
* create separate readme for AoU Gvs workspace
* handle null vet and pet creator
* add initial test for write api
* added sample_load_status table creation
* log load status and verify sample is not loaded
* used _default stream instead (#7588)
* safety checks for restarts via preemptibles (#7590)
* add is_loaded check back to GvsCreateFilterSet
* 3 retries and bigger machine for writeAPI ingest (#7592)
gbggrant and others added 27 commits January 25, 2023 10:00
* Renaming everything to use 'dataset_name'
* Added a new suite of tools for variant filtering based on site-level annotations. (#7954)

* Adds wdl that tests joint VCF filtering tools (#7932)

* adding filtering wdl

* renaming pipeline

* addressing comments

* added bash

* renaming json

* adding glob to extract for extra files

* changing dollar signs

* small comments

* Added changes for specifying model backend and other tweaks to WDLs and environment.

* Added classes for representing a collection of labeled variant annotations.

* Added interfaces for modeling and scoring backends.

* Added a new suite of tools for variant filtering based on site-level annotations.

* Added integration tests.

* Added test resources and expected results.

* Miscellaneous changes.

* Removed non-ASCII characters.

* Added documentation for TrainVariantAnnotationsModel and addressed review comments.

Co-authored-by: meganshand <mshand@broadinstitute.org>

* Added toggle for selecting resource-matching strategies and miscellaneous minor fixes to new annotation-based filtering tools. (#8049)

* Adding use_allele_specific_annotation arg and fixing task with empty input in JointVcfFiltering WDL (#8027)

* Small changes to JointVCFFiltering WDL

* making default for use_allele_specific_annotations

* addressing comments

* first stab

* wire through WDL changes

* fixed typo

* set model_backend input value

* add gatk_override to JointVcfFiltering call

* typo in indel_annotations

* make model_backend optional

* tabs and spaces

* make all model_backends optional

* use gatk 4.3.0

* no point in changing the table names as this is a POC

* adding new branch to dockstore

* adding in branching logic for classic VQSR vs VQSR-Lite

* implementing the separate schemas for the VQSR vs VQSR-Lite branches, including Java changes necessary to produce the different tsv files

* passing classic flag to indel run of CreateFilteringFiles

* Update GvsCreateFilterSet.wdl

cleaning up verbiage

* Removed mapping error rate from estimate of denoised copy ratios output by gCNV and updated sklearn. (#7261)

* cleanup up sloppy comment

---------

Co-authored-by: samuelklee <samuelklee@users.noreply.github.com>
Co-authored-by: meganshand <mshand@broadinstitute.org>
Co-authored-by: Rebecca Asch <rasch@broadinstitute.org>
* update deliverables to includes VDS notes

* add some notes about default behavior

* rip out VCF AoU instructions

* tidy up useability

* update comment on defaults

* Update scripts/variantstore/docs/aou/AOU_DELIVERABLES.md

Co-authored-by: Bec Asch <rsasch@users.noreply.github.com>

* Update scripts/variantstore/docs/aou/AOU_DELIVERABLES.md

Co-authored-by: Bec Asch <rsasch@users.noreply.github.com>

* Update scripts/variantstore/docs/aou/AOU_DELIVERABLES.md

Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>

* update comments on python code

* put correct script location in

* fix markdown

* add filter input to avro creation

* cost notes

* fix nits

* add extract for control samples

* add sts instructions

* Update scripts/variantstore/docs/aou/AOU_DELIVERABLES.md

Co-authored-by: Bec Asch <rsasch@users.noreply.github.com>

---------

Co-authored-by: Bec Asch <rsasch@users.noreply.github.com>
Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
* limiting scatter width for beta customers in order to stay under google quotas

* Just moving the beta scatter width to an argument so we can experiment with different numbers (if quotas are increased for a customer) without having to modify our code

* renaming beta variable to something clearer then rearranging logic to unify the beta and broad use cases better.
…into beta workflow (#8200)

* limiting scatter width for beta customers in order to stay under google quotas

* Just moving the beta scatter width to an argument so we can experiment with different numbers (if quotas are increased for a customer) without having to modify our code

* Piping beta information down to GvsImportGenomes

* renaming beta variable to something clearer then rearranging logic to unify the beta and broad use cases better.

* renaming beta variable

* temporarily add it to dockstore

* updating to a newer gatk

* CromwellOnAzure + Azure SQL DB + AAD first steps doc [VS-805] (#8191)

* bits of cleanup

* Don't know why this got yanked in when I rebased

* haste makes waste!

---------

Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
Update GvsExtractCallset to support VQSR Lite
* Create new ExtractCohortLite program for extraction of VQSR Lite data.
* Update dockers
* Make GvsExtractCallset.wdl run using VQSR Lite or VQSR Classic
* Name vqsr lite version of filter_set_info table consistently between wdls.
* Update GvsCreateFilterSet.wdl to handle error in JointVcfFiltering and to not load filtered sites into filter_set_info_vqsr_lite.
* updating docker images

* updating docker images... further

* updating docker images to one that's actually correct

* FOUND THE LAST ONE

* Testing filter creation with George's docker image

* dockstore

* cleaning up docker
* VS-815: Add Support for YNG to VQSR Lite
* Up the memory of a task in JointVcfFiltering.wdl.
* Use 'HDD' rather than 'LOCAL' in JointVcfFiltering.wdl
* Update GvsCalculatePrecisionAndSensitivity.wdl to allow for different scale of calibration_sensitivity vs. lod score.
Also retrieving score from JointVcfFiltering and storing that in BQ and in the VCF.
* deleted VDS

* only one left
…tion of Delta (#8205)

* Lees name

* add vds validation script written by Tim

* fix rd tim typo

* make sure temp dir is set and not default for validate()

* swap to consistent kebab case

Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>

* clean up validation

* put init in the right place

* add proper example to notes

* update code formatting

---------

Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
* Lees name

* add vds validation script written by Tim

* fix rd tim typo

* make sure temp dir is set and not default for validate()

* swap to consistent kebab case

Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>

* clean up validation

* put init in the right place

* add proper example to notes

* update code formatting

* update review

---------

Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
@droazen droazen closed this Mar 17, 2023
@lbergelson
Copy link
Member Author

Testing how nasty a merge is.

@droazen
Copy link
Contributor

droazen commented Mar 17, 2023

We'll use the work in this branch as the basis for standalone PRs that don't involve an actual git merge, starting with just the build.gradle changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.