Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the Downstream plugin predict stop_lost downstream sequences #286

Closed
susannasiebert opened this issue Feb 25, 2020 · 12 comments
Closed
Assignees

Comments

@susannasiebert
Copy link

Right now it appears that the Downstream plugin doesn't output a predicted downstream sequence for stop_lost variants (or it is just a sequence of multiple Xs). Is this the expected behavior? Would it be possible to add this feature?

@helensch
Copy link
Contributor

Hi

I am looking into the downstream sequence for frameshift,stop_lost variants.

Please could you let me know which version of VEP you are using and the vep command being used with the Downstream plugin.

Thanks

@helensch helensch self-assigned this Feb 26, 2020
@susannasiebert
Copy link
Author

It looks like the particular VCF I'm looking at is a bit older and was annotated with VEP 84. Here is the VEP header:
##VEP=v84 cache=/var/lib/cwl/stgcfd43713-2826-47bb-88e5-98975e7395ce/cache/homo_sapiens/84_GRCh38 db=. dbSNP=146 genebuild=2014-07 COSMIC=75 polyphen=2.2.2 regbuild=13.0 sift=sift5.2.2 ClinVar=201601 gencode=GENCODE 22 HGMD-PUBLIC=20154 ESP=20141103 assembly=GRCh38.p5

@helensch
Copy link
Contributor

Hi

Are you running VEP in offline mode (using the flag --offline)?

If VEP is run in offline mode using the flag --offline, a FASTA file is required to get the sequences for the 3' UTR.

Sequence may be incomplete without a FASTA file or database connection

I have updated the documentation for the plugin for future releases.

Thank you for flagging this issue.

Helen

@huimingx
Copy link

Hi Helen,

I've been testing the downstream plugin with stop lost variants with Susanna and it seems like with the following command options I still fail to obtain a downstream sequence:

 --vcf -term SO -transcript_version --offline --cache --symbol --dir ./VEP_cache --check_existing --flag_pick --fasta all_sequences.fa --plugin Downstream --plugin Wildtype --everything --assembly GRCh38 --cache_version 95 --species homo_sapiens

The example variant I'm looking at is : chr1:212360768 ref:TA var:T

Thanks.

@helensch
Copy link
Contributor

helensch commented Apr 24, 2020

Hi

Are you getting any information returned on the change in length relative to the reference protein?

The Downstream plug returns 2 fields
-DownstreamProtein : Predicted downstream translation for frameshift mutations
-ProteinLengthChange : Predicted change in protein product length

When I ran VEP with the Downstream plugin for your example variant the
following was returned:

Location chr1:212360769
Allele -
Consequence frameshift_variant,stop_lost
Amino_acids */X
Codons tAa/ta
DownstreamProtein
ProteinLengthChange 1

Are you getting a value returned for ProteinLengthChange?

There is no downstream protein as the downstream sequence start with 'A' and result is a stop codon.

To test the Downstream plugin is returning sequence, an example variant to use is
#CHROM POS ID REF ALT QUAL FILTER INFO
19 643600 test_2 CCT C . . .

Location 19:643601-643602
Allele -
Consequence frameshift_variant,stop_lost
Amino_acids S*/SX
Codons tcCTga/tcga
DownstreamProtein SRP
ProteinLengthChange 3

Regards
Helen

@susannasiebert
Copy link
Author

Hi Helen,

Thank you for looking into this. We confirmed that the example variant results in the expected DownstreamProtein sequence. We also identified a few similar variants in our VCFs so we think we have our VEP commands working correctly now.

A more general question for the VEP Consequence annotation would be whether variants that result in basically "replacing" the stop codon should have a Consequence of stop_retained_variant instead of stop_lost.

@aparton
Copy link
Contributor

aparton commented Apr 29, 2020

Hi @susannasiebert, @huimingx

I'm glad to hear that you've got your VEP commands working correctly.

Regarding your more general question of when we assign stop_retained_variant and stop_lost, we take our consequence terms and descriptions from the Sequence Ontology database, and we use the following definition for stop_lost:

stop_lost: http://www.sequenceontology.org/browser/current_release/term/SO:0001578 - "A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript."

So the consequence we assign depends on where a theoretical new-stop-codon is positioned.

With the release of Ensembl 100 (officially released this afternoon), we have introduced the option --shift_3prime into VEP, where insertions and deletions within repeated regions will be shifted as far as possible in the 3' direction before consequence calculation. In the example provided by @huimingx above, this will now correctly provide a downstream consequence for your variant - see: http://rest.ensembl.org/vep/human/region/1:212360768-212360769/T?shift_3prime=1&content-type=application/json&minimal=1

If you have any other issues or if there's anything else we can do to help, please feel free to get in touch.

Kind Regards,
Andrew

@susannasiebert
Copy link
Author

susannasiebert commented May 6, 2020

Hi @aparton,

Sorry to bother you about this again. We thought we had it fixed but we're still seeing some odd behavior. We are not seeing any XXXs in the DownstreamProtein field anymore, but we are also not seeing any downstream sequence predictions if the variant is stop_lost only (it works for frameshift_variant&stop_lost). For example:

1	158095120	1_158095120_G/T	G	T	.	.	CSQ=T|stop_lost|HIGH|KIRREL1|ENSG00000183853|Transcript|ENST00000359209.10|protein_coding|15/15||ENST00000359209.10:c.2274G>T|ENSP00000352138.6:p.Ter758TyrextTer85|2341|2274|758|*/Y|taG/taT|||1||1|SNV|HGNC|HGNC:15734|YES|1|P1|CCDS1172.2|ENSP00000352138|Q96J84||UPI0000443FBD|||||||||||||||||||||||||||||||||||||MLSLLVWILTLSDTFSQGTQTRFSQEPADQTVVAGQRAVLPCVLLNYSGIVQWTKDGLALGMGQGLKAWPRYRVVGSADAGQYNLEITDAELSDDASYECQATEAALRSRRAKLTVLIPPEDTRIDGGPVILLQAGTPHNLTCRAFNAKPAATIIWFRDGTQQEGAVASTELLKDGKRETTVSQLLINPTDLDIGRVFTCRSMNEAIPSGKETSIELDVHHPPTVTLSIEPQTVQEGERVVFTCQATANPEILGYRWAKGGFLIEDAHESRYETNVDYSFFTEPVSCEVHNKVGSTNVSTLVNVHFAPRIVVDPKPTTTDIGSDVTLTCVWVGNPPLTLTWTKKDSNMVLSNSNQLLLKSVTQADAGTYTCRAIVPRIGVAEREVPLYVNGPPIISSEAVQYAVRGDGGKVECFIGSTPPPDRIAWAWKENFLEVGTLERYTVERTNSGSGVLSTLTINNVMEADFQTHYNCTAWNSFGPGTAIIQLEEREVLPVGIIAGATIGASILLIFFFIALVFFLYRRRKGSRKDVTLRKLDIKVETVNREPLTMHSDREDDTASVSTATRVMKAIYSSFKDDVDLKQDLRCDTIDTREEYEMKDPTNGYYNVRAHEDRPSSRAVLYADYRAPGPARFDGRPSSRLSHSSGYAQLNTYSRGPASDYGPEPTPPGPAAPAGTDTTSQLSYENYEKFNSHPFPGAAGYPTYRLGYPQAPPSGLERTPYEAYDPIGKYATATRFSYTSQHSDYGQRFQQRMQTHV|||||||||||||||||||,T|stop_lost|HIGH|KIRREL1|ENSG00000183853|Transcript|ENST00000360089.8|protein_coding|13/13||ENST00000360089.8:c.1782G>T|ENSP00000353202.4:p.Ter594TyrextTer85|2373|1782|594|*/Y|taG/taT|||1|||SNV|HGNC|HGNC:15734||1|||ENSP00000353202||Q5W0F9|UPI00001AA15B|||||||||||||||||||||||||||||||||||||MGQGLKAWPRYRVVGSADAGQYNLEITDAELSDDASYECQATEAALRSRRAKLTVLNPPTVTLSIEPQTVQEGERVVFTCQATANPEILGYRWAKGGFLIEDAHESRYETNVDYSFFTEPVSCEVHNKVGSTNVSTLVNVHFAPRIVVDPKPTTTDIGSDVTLTCVWVGNPPLTLTWTKKDSNMVLSNSNQLLLKSVTQADAGTYTCRAIVPRIGVAEREVPLYVNGPPIISSEAVQYAVRGDGGKVECFIGSTPPPDRIAWAWKENFLEVGTLERYTVERTNSGSGVLSTLTINNVMEADFQTHYNCTAWNSFGPGTAIIQLEEREVLPVGIIAGATIGASILLIFFFIALVFFLYRRRKGSRKDVTLRKLDIKVETVNREPLTMHSDREDDTASVSTATRVMKAIYSSFKDDVDLKQDLRCDTIDTREEYEMKDPTNGYYNVRAHEDRPSSRAVLYADYRAPGPARFDGRPSSRLSHSSGYAQLNTYSRGPASDYGPEPTPPGPAAPAGTDTTSQLSYENYEKFNSHPFPGAAGYPTYRLGYPQAPPSGLERTPYEAYDPIGKYATATRFSYTSQHSDYGQRFQQRMQTHV|||||||||||||||||||,T|stop_lost|HIGH|KIRREL1|ENSG00000183853|Transcript|ENST00000368172.1|protein_coding|11/11||ENST00000368172.1:c.1716G>T|ENSP00000357154.1:p.Ter572TyrextTer85|1728|1716|572|*/Y|taG/taT|||1|||SNV|HGNC|HGNC:15734||2|||ENSP00000357154||Q5W0G0|UPI0000047A8F|||||||||||||||||||||||||||||||||||||MNEAIPSGKETSIELDVHHPPTVTLSIEPQTVQEGERVVFTCQATANPEILGYRWAKGGFLIEDAHESRYETNVDYSFFTEPVSCEVHNKVGSTNVSTLVNVHFAPRIVVDPKPTTTDIGSDVTLTCVWVGNPPLTLTWTKKDSNMGPRPPGSPPEAALSAQVLSNSNQLLLKSVTQADAGTYTCRAIVPRIGVAEREVPLYVNGPPIISSEAVQYAVRGDGGKVECFIGSTPPPDRIAWAWKENFLEVGTLERYTVERTNSGSGVLSTLTINNVMEADFQTHYNCTAWNSFGPGTAIIQLEEREVLPVGIIAGATIGASILLIFFFIALVFFLYRRRKGSRKDVTLRKLDIKVETVNREPLTMHSDREDDTASVSTATRVMKAIYSSFKDDVDLKQDLRCDTIDTREEYEMKDPTNGYYNVRAHEDRPSSRAVLYADYRAPGPARFDGRPSSRLSHSSGYAQLNTYSRGPASDYGPEPTPPGPAAPAGTDTTSQLSYENYEKFNSHPFPGAAGYPTYRLGYPQAPPSGLERTPYEAYDPIGKYATATRFSYTSQHSDYGQRFQQRMQTHV|||||||||||||||||||,T|stop_lost|HIGH|KIRREL1|ENSG00000183853|Transcript|ENST00000368173.7|protein_coding|13/13||ENST00000368173.7:c.1974G>T|ENSP00000357155.4:p.Ter658TyrextTer85|2378|1974|658|*/Y|taG/taT|||1|||SNV|HGNC|HGNC:15734||2||CCDS72952.1|ENSP00000357155||B4DN67|UPI00017A76F9|||||||||||||||||||||||||||||||||||||MLSLLVWILTLSDTFSQVPPEDTRIDGGPVILLQAGTPHNLTCRAFNAKPAATIIWFRDGTQQEGAVASTELLKDGKRETTVSQLLINPTDLDIGRVFTCRSMNEAIPSGKETSIELDVHHPPTVTLSIEPQTVQEGERVVFTCQATANPEILGYRWAKGGFLIEDAHESRYETNVDYSFFTEPVSCEVHNKVGSTNVSTLVNVHFAPRIVVDPKPTTTDIGSDVTLTCVWVGNPPLTLTWTKKDSNMVLSNSNQLLLKSVTQADAGTYTCRAIVPRIGVAEREVPLYVNGPPIISSEAVQYAVRGDGGKVECFIGSTPPPDRIAWAWKENFLEVGTLERYTVERTNSGSGVLSTLTINNVMEADFQTHYNCTAWNSFGPGTAIIQLEEREVLPVGIIAGATIGASILLIFFFIALVFFLYRRRKGSRKDVTLRKLDIKVETVNREPLTMHSDREDDTASVSTATRVMKAIYSSFKDDVDLKQDLRCDTIDTREEYEMKDPTNGYYNVRAHEDRPSSRAVLYADYRAPGPARFDGRPSSRLSHSSGYAQLNTYSRGPASDYGPEPTPPGPAAPAGTDTTSQLSYENYEKFNSHPFPGAAGYPTYRLGYPQAPPSGLERTPYEAYDPIGKYATATRFSYTSQHSDYGQRFQQRMQTHV|||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000254728|CTCF_binding_site|||||||||||||||SNV||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

This is a TAG mutated to a TAT which codes for tyrosine and not a new stop codon so I would expect a DownstreamProtein prediction. We also aren't seeing any values for the ProteinLengthChange for the stop_lost only variants.

The CSQ header is

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|PICK|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|SOURCE|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DownstreamProtein|ProteinLengthChange|WildtypeProtein|gnomADe|gnomADe_AF|gnomADe_AF_AFR|gnomADe_AF_AMR|gnomADe_AF_ASJ|gnomADe_AF_EAS|gnomADe_AF_FIN|gnomADe_AF_NFE|gnomADe_AF_OTH|gnomADe_AF_SAS|clinvar|clinvar_CLINSIGN|clinvar_PHENOTYPE|clinvar_SCORE|clinvar_RCVACC|clinvar_TESTEDINGTR|clinvar_PHENOTYPELIST|clinvar_NUMSUBMIT|clinvar_GUIDELINES">

and the VEP command we ran is

/usr/bin/perl -I /opt/lib/perl/VEP/Plugins /usr/bin/variant_effect_predictor.pl --vcf -term SO -transcript_version --offline --cache --symbol -o TCGA_300_stop_lost_vcf_input_format_v95_with_test.vcf -I TCGA_300_stop_lost_vcf_input_format.bed --synonyms /gscmnt/gc2560/core/model_data/2887491634/build50f99e75d14340ffb5b7d21b03887637/chromAlias.ensembl.txt --dir /gscmnt/gc2560/core/cwl/inputs/VEP_cache --check_existing --custom /gscmnt/gc2560/core/model_data/genome-db-ensembl-gnomad/2dd4b53431674786b760adad60a29273/fixed_b38_exome.vcf.gz,gnomADe,vcf,exact,1,AF,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH,AF_SAS --custom /gscmnt/gc2560/core/custom_clinvar_vcf/v20181028/custom.vcf.gz,clinvar,vcf,exact,1,CLINSIGN,PHENOTYPE,SCORE,RCVACC,TESTEDINGTR,PHENOTYPELIST,NUMSUBMIT,GUIDELINES --flag_pick --fasta /gscmnt/gc2560/core/model_data/2887491634/build21f22873ebe0486c8e6f69c15435aa96/all_sequences.fa --plugin Downstream --plugin Wildtype --everything --assembly GRCh38 --cache_version 95 --species homo_sapiens

@helensch
Copy link
Contributor

helensch commented May 7, 2020

Hi @susannasiebert

The Downstream plugin predicts the downstream effects of a frameshift variant on the protein
sequence of a transcript. It does not predict for 'stop_lost'.

https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#downstream

Regards
Helen

@susannasiebert
Copy link
Author

Is there a supported plugin for this use case?

@helensch
Copy link
Contributor

helensch commented Aug 7, 2020

Hi @susannasiebert

There is not a supported plugin for this use case.

I will discuss with the team if this functionality can be included in a Plugin. However may only be a functionality in the longer term.

Regards
Helen

@helensch
Copy link
Contributor

helensch commented Feb 7, 2022

Hi @susannasiebert

Your request for including stop_lost was added to our work list for investigation.

I will close off this ticket, but we will contact you if we do make this change.

Please feel free to reopen the ticket or open a new one if you have further questions.

Regards
Helen

@helensch helensch closed this as completed Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants