Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pipeline runner image. #841

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

bpblanken
Copy link
Collaborator

No description provided.

@bpblanken bpblanken changed the title Benb/pipeline runner docker electric boogaloo New pipeline runner image. Jul 22, 2024
@@ -1,11 +1,11 @@
# Run locally with:
#
# gcloud builds submit --quiet --substitutions='_VEP_VERSION=110' --config .cloudbuild/vep-docker.cloudbuild.yaml v03_pipeline/deploy
# gcloud builds submit --quiet --substitutions='_REFERENCE_GENOME=GRCh38' --config .cloudbuild/vep-docker.cloudbuild.yaml v03_pipeline/deploy
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left automated builds to another ticket to avoid scope creep:

#848

@@ -44,5 +44,6 @@ jobs:
- name: Copy files to release directory
run: |-
gcloud storage rm -r gs://seqr-luigi/releases/dev/latest/ || echo 'No latest release'
gcloud storage cp v03_pipeline/bin/* gs://seqr-luigi/releases/dev/latest/
gcloud storage cp v03_pipeline/bin gs://seqr-luigi/releases/dev/latest/bin/
gcloud storage cp v03_pipeline/var/vep_config gs://seqr-luigi/releases/dev/latest/var/vep_config
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the structure of the deploy artifact a bit so the vep configs are deployed and accessible by the dataproc init.

@@ -146,7 +146,7 @@ grpcio==1.63.0
# grpcio-status
grpcio-status==1.48.2
# via google-api-core
hail==0.2.130
hail==0.2.132
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a hail bug blocking the docker container from working

from v03_pipeline.lib.logger import get_logger


def run():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a shim to get rolling.

gcc -Wall -Werror -O2 /vep.c -o /vep
chmod u+s /vep

gcloud storage cp gs://seqr-luigi/releases/$ENVIRONMENT/latest/bin/download_vep_data.bash /download_vep_data.bash
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this so that shared code with the local deploy would be modularized (in bash!! 😬). I'm endeavoring to test on dataproc right now.

@@ -2,7 +2,7 @@
"command": [
"bash",
"-c",
"/vep --warning_file STDERR --format vcf -json --hgvs --biotype --canonical --mane --minimal --numbers --regulatory --allele_number --no_stats --cache --offline --assembly GRCh38 --fasta /opt/vep/.vep/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --check_ref --dont_skip --plugin LoF,loftee_path:/plugins,gerp_bigwig:/opt/vep/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/opt/vep/.vep/human_ancestor.fa.gz,conservation_file:/opt/vep/.vep/loftee.sql --plugin UTRAnnotator,file=/opt/vep/.vep/uORF_5UTR_GRCh38_PUBLIC.txt --plugin SpliceRegion,Extended --plugin AlphaMissense,file=/opt/vep/.vep/AlphaMissense_hg38.tsv.gz --dir_plugins /plugins -o STDOUT | sed s/5utr/fiveutr/g"
"/vep GRCh38 --warning_file STDERR --format vcf -json --hgvs --biotype --canonical --mane --minimal --numbers --regulatory --allele_number --no_stats --cache --offline --assembly GRCh38 --fasta /opt/vep/.vep/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --check_ref --dont_skip --plugin LoF,loftee_path:/plugins,gerp_bigwig:/opt/vep/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/opt/vep/.vep/human_ancestor.fa.gz,conservation_file:/opt/vep/.vep/loftee.sql --plugin UTRAnnotator,file=/opt/vep/.vep/uORF_5UTR_GRCh38_PUBLIC.txt --plugin SpliceRegion,Extended --plugin AlphaMissense,file=/opt/vep/.vep/AlphaMissense_hg38.tsv.gz --dir_plugins /plugins -o STDOUT | sed s/5utr/fiveutr/g"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vep binary was updated to accept a reference genome arg, which we then use to volume mount a subdirectory of /vep_data up into /opt/vep/.vep/

"--minimal",
"--assembly", "GRCh37",
"--fasta", "/opt/vep/.vep/homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz",
"--plugin", "LoF,human_ancestor_fa:/opt/vep/.vep/loftee_data/GRCh37/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/opt/vep/.vep/loftee_data/GRCh37/phylocsf_gerp.sql,gerp_file:/opt/vep/.vep/loftee_data/GRCh37/GERP_scores.final.sorted.txt.gz",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loftee_data for GRCh37 is nested but it isn't nested for 38. So I think this is right as is.

"env": {
"PERL5LIB": ""
},
"vep_json_schema": "Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64,afr_allele:String,afr_maf:Float64,allele_string:String,amr_allele:String,amr_maf:Float64,clin_sig:Array[String],end:Int32,eas_allele:String,eas_maf:Float64,ea_allele:String,ea_maf:Float64,eur_allele:String,eur_maf:Float64,exac_adj_allele:String,exac_adj_maf:Float64,exac_allele:String,exac_afr_allele:String,exac_afr_maf:Float64,exac_amr_allele:String,exac_amr_maf:Float64,exac_eas_allele:String,exac_eas_maf:Float64,exac_fin_allele:String,exac_fin_maf:Float64,exac_maf:Float64,exac_nfe_allele:String,exac_nfe_maf:Float64,exac_oth_allele:String,exac_oth_maf:Float64,exac_sas_allele:String,exac_sas_maf:Float64,id:String,minor_allele:String,minor_allele_freq:Float64,phenotype_or_disease:Int32,pubmed:Array[Int32],sas_allele:String,sas_maf:Float64,somatic:Int32,start:Int32,strand:Int32}],context:String,end:Int32,id:String,input:String,intergenic_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],impact:String,minimised:Int32,variant_allele:String}],most_severe_consequence:String,motif_feature_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],high_inf_pos:String,impact:String,minimised:Int32,motif_feature_id:String,motif_name:String,motif_pos:Int32,motif_score_change:Float64,strand:Int32,variant_allele:String}],regulatory_feature_consequences:Array[Struct{allele_num:Int32,biotype:String,consequence_terms:Array[String],impact:String,minimised:Int32,regulatory_feature_id:String,variant_allele:String}],seq_region_name:String,start:Int32,strand:Int32,transcript_consequences:Array[Struct{allele_num:Int32,amino_acids:String,biotype:String,canonical:Int32,ccds:String,cdna_start:Int32,cdna_end:Int32,cds_end:Int32,cds_start:Int32,codons:String,consequence_terms:Array[String],distance:Int32,domains:Array[Struct{db:String,name:String}],exon:String,gene_id:String,gene_pheno:Int32,gene_symbol:String,gene_symbol_source:String,hgnc_id:String,hgvsc:String,hgvsp:String,hgvs_offset:Int32,impact:String,intron:String,lof:String,lof_flags:String,lof_filter:String,lof_info:String,minimised:Int32,polyphen_prediction:String,polyphen_score:Float64,protein_end:Int32,protein_start:Int32,protein_id:String,sift_prediction:String,sift_score:Float64,strand:Int32,swissprot:String,transcript_id:String,trembl:String,uniparc:String,variant_allele:String}],variant_class:String}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unaltered thus far.

@bpblanken bpblanken marked this pull request as ready for review July 26, 2024 21:39
@bpblanken bpblanken requested a review from a team as a code owner July 26, 2024 21:39
@@ -0,0 +1,23 @@
FROM docker:dind as BUILD
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant