-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New pipeline runner image. #841
base: main
Are you sure you want to change the base?
Conversation
* Add mito local constraint * Fix tests * lint
@@ -1,11 +1,11 @@ | |||
# Run locally with: | |||
# | |||
# gcloud builds submit --quiet --substitutions='_VEP_VERSION=110' --config .cloudbuild/vep-docker.cloudbuild.yaml v03_pipeline/deploy | |||
# gcloud builds submit --quiet --substitutions='_REFERENCE_GENOME=GRCh38' --config .cloudbuild/vep-docker.cloudbuild.yaml v03_pipeline/deploy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left automated builds to another ticket to avoid scope creep:
…b.com:broadinstitute/seqr-loading-pipelines into benb/pipeline-runner-docker-electric-boogaloo
@@ -44,5 +44,6 @@ jobs: | |||
- name: Copy files to release directory | |||
run: |- | |||
gcloud storage rm -r gs://seqr-luigi/releases/dev/latest/ || echo 'No latest release' | |||
gcloud storage cp v03_pipeline/bin/* gs://seqr-luigi/releases/dev/latest/ | |||
gcloud storage cp v03_pipeline/bin gs://seqr-luigi/releases/dev/latest/bin/ | |||
gcloud storage cp v03_pipeline/var/vep_config gs://seqr-luigi/releases/dev/latest/var/vep_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the structure of the deploy artifact a bit so the vep configs are deployed and accessible by the dataproc init.
@@ -146,7 +146,7 @@ grpcio==1.63.0 | |||
# grpcio-status | |||
grpcio-status==1.48.2 | |||
# via google-api-core | |||
hail==0.2.130 | |||
hail==0.2.132 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a hail bug blocking the docker container from working
from v03_pipeline.lib.logger import get_logger | ||
|
||
|
||
def run(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a shim to get rolling.
gcc -Wall -Werror -O2 /vep.c -o /vep | ||
chmod u+s /vep | ||
|
||
gcloud storage cp gs://seqr-luigi/releases/$ENVIRONMENT/latest/bin/download_vep_data.bash /download_vep_data.bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this so that shared code with the local deploy would be modularized (in bash!! 😬). I'm endeavoring to test on dataproc right now.
@@ -2,7 +2,7 @@ | |||
"command": [ | |||
"bash", | |||
"-c", | |||
"/vep --warning_file STDERR --format vcf -json --hgvs --biotype --canonical --mane --minimal --numbers --regulatory --allele_number --no_stats --cache --offline --assembly GRCh38 --fasta /opt/vep/.vep/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --check_ref --dont_skip --plugin LoF,loftee_path:/plugins,gerp_bigwig:/opt/vep/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/opt/vep/.vep/human_ancestor.fa.gz,conservation_file:/opt/vep/.vep/loftee.sql --plugin UTRAnnotator,file=/opt/vep/.vep/uORF_5UTR_GRCh38_PUBLIC.txt --plugin SpliceRegion,Extended --plugin AlphaMissense,file=/opt/vep/.vep/AlphaMissense_hg38.tsv.gz --dir_plugins /plugins -o STDOUT | sed s/5utr/fiveutr/g" | |||
"/vep GRCh38 --warning_file STDERR --format vcf -json --hgvs --biotype --canonical --mane --minimal --numbers --regulatory --allele_number --no_stats --cache --offline --assembly GRCh38 --fasta /opt/vep/.vep/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --check_ref --dont_skip --plugin LoF,loftee_path:/plugins,gerp_bigwig:/opt/vep/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/opt/vep/.vep/human_ancestor.fa.gz,conservation_file:/opt/vep/.vep/loftee.sql --plugin UTRAnnotator,file=/opt/vep/.vep/uORF_5UTR_GRCh38_PUBLIC.txt --plugin SpliceRegion,Extended --plugin AlphaMissense,file=/opt/vep/.vep/AlphaMissense_hg38.tsv.gz --dir_plugins /plugins -o STDOUT | sed s/5utr/fiveutr/g" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vep binary was updated to accept a reference genome arg, which we then use to volume mount a subdirectory of /vep_data
up into /opt/vep/.vep/
"--minimal", | ||
"--assembly", "GRCh37", | ||
"--fasta", "/opt/vep/.vep/homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz", | ||
"--plugin", "LoF,human_ancestor_fa:/opt/vep/.vep/loftee_data/GRCh37/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/opt/vep/.vep/loftee_data/GRCh37/phylocsf_gerp.sql,gerp_file:/opt/vep/.vep/loftee_data/GRCh37/GERP_scores.final.sorted.txt.gz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loftee_data for GRCh37
is nested but it isn't nested for 38
. So I think this is right as is.
"env": { | ||
"PERL5LIB": "" | ||
}, | ||
"vep_json_schema": "Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64,afr_allele:String,afr_maf:Float64,allele_string:String,amr_allele:String,amr_maf:Float64,clin_sig:Array[String],end:Int32,eas_allele:String,eas_maf:Float64,ea_allele:String,ea_maf:Float64,eur_allele:String,eur_maf:Float64,exac_adj_allele:String,exac_adj_maf:Float64,exac_allele:String,exac_afr_allele:String,exac_afr_maf:Float64,exac_amr_allele:String,exac_amr_maf:Float64,exac_eas_allele:String,exac_eas_maf:Float64,exac_fin_allele:String,exac_fin_maf:Float64,exac_maf:Float64,exac_nfe_allele:String,exac_nfe_maf:Float64,exac_oth_allele:String,exac_oth_maf:Float64,exac_sas_allele:String,exac_sas_maf:Float64,id:String,minor_allele:String,minor_allele_freq:Float64,phenotype_or_disease:Int32,pubmed:Array[Int32],sas_allele:String,sas_maf:Float64,somatic:Int32,start:Int32,strand:Int32}],context:String,end:Int32,id:String,input:String,intergenic_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],impact:String,minimised:Int32,variant_allele:String}],most_severe_consequence:String,motif_feature_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],high_inf_pos:String,impact:String,minimised:Int32,motif_feature_id:String,motif_name:String,motif_pos:Int32,motif_score_change:Float64,strand:Int32,variant_allele:String}],regulatory_feature_consequences:Array[Struct{allele_num:Int32,biotype:String,consequence_terms:Array[String],impact:String,minimised:Int32,regulatory_feature_id:String,variant_allele:String}],seq_region_name:String,start:Int32,strand:Int32,transcript_consequences:Array[Struct{allele_num:Int32,amino_acids:String,biotype:String,canonical:Int32,ccds:String,cdna_start:Int32,cdna_end:Int32,cds_end:Int32,cds_start:Int32,codons:String,consequence_terms:Array[String],distance:Int32,domains:Array[Struct{db:String,name:String}],exon:String,gene_id:String,gene_pheno:Int32,gene_symbol:String,gene_symbol_source:String,hgnc_id:String,hgvsc:String,hgvsp:String,hgvs_offset:Int32,impact:String,intron:String,lof:String,lof_flags:String,lof_filter:String,lof_info:String,minimised:Int32,polyphen_prediction:String,polyphen_score:Float64,protein_end:Int32,protein_start:Int32,protein_id:String,sift_prediction:String,sift_score:Float64,strand:Int32,swissprot:String,transcript_id:String,trembl:String,uniparc:String,variant_allele:String}],variant_class:String}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unaltered thus far.
@@ -0,0 +1,23 @@ | |||
FROM docker:dind as BUILD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual new Dockerfile, replacing https://github.com/broadinstitute/seqr-loading-pipelines/blob/main/docker/Dockerfile
No description provided.