Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with data from gff flag when upgrading to VEP110 #1570

Open
atimms opened this issue Dec 1, 2023 · 10 comments
Open

issue with data from gff flag when upgrading to VEP110 #1570

atimms opened this issue Dec 1, 2023 · 10 comments
Assignees

Comments

@atimms
Copy link

atimms commented Dec 1, 2023

Describe the issue

The --gff flag doesn't annotate variants when using a cache and VEP 110.

Additional information

The same files and combination works with VEP107 and VEP109, and will work when not using a cache. The gff is listed in the header but is not present in any of the expected variants. This seems to be different behaviour between 110 and previous versions, I wondered if this is intentional and if there is a workaround to use a GFF with the cache in VEP 100.

System

  • VEP version: 110
  • VEP Cache version: 110
    using singularity built from biocontainer docker image

Full VEP command line

vep \
--offline \
--cache \
--dir_cache ${vep_cache_dir} \
--refseq \
--use_given_ref \
--species homo_sapiens \
--assembly GRCh38 \
--fasta ${ref_fasta} \
--format vcf \
--vcf \
--variant_class \
--af_gnomadg \
--numbers \
--total_length \
--hgvs \
--symbol \
--canonical \
--exclude_predicted \
--distance 5000,250  \
--gff ${extra_transcripts_gff} \
--input_file ${vcf} \
--output_file ${vep110_vcf} 

Full error message

No error message

@nuno-agostinho
Copy link
Contributor

Hi @atimms, thanks for opening this issue!

I am going to check what is going on and come back to you.

Kind regards,
Nuno

@nuno-agostinho
Copy link
Contributor

Hi @atimms, sorry for the delay in replying.

I am not able to reproduce your issue using an Ensembl GFF file and a random input VCF. The variants are annotated as expected.

Could you please give me a set of example variants and the GFF file that you are having issues with? Thanks.

Best regards,
Nuno Agostinho

@atimms
Copy link
Author

atimms commented Jan 8, 2024

Thanks for getting back to me.

I've attached the test VCF and gff file I was using. As I said before the annoitation working when using VEP107 but didn't when using VEP110.

Any help would be great.

Andrew

off_menu_transcripts.hg38.gff.gz
small.vcf.gz

@nuno-agostinho
Copy link
Contributor

nuno-agostinho commented Jan 22, 2024

Hey @atimms,

Sorry for the delay in replying.

This is a bug related with a change we made in VEP 110 to compare the chromosome name with the annotation. Unfortunately, your annotation uses the chr prefix (such as chr16) and the region is not considered as part of the same chromosome.

I will fix the comparison of chromosome names to consider synonyms as soon as possible. In the meantime, you can strip the chr prefix from your GFF annotation as a workaround.

Please tell me if the workaround works for you and if you found any other issues regarding custom annotation in VEP.

I am sorry for the inconvenience.

Best regards,
Nuno

@atimms
Copy link
Author

atimms commented Jan 22, 2024

That's an easy resolution, thanks for letting me know what the issue was.

Andrew

@Sean-3B
Copy link

Sean-3B commented Mar 6, 2024

Hello @nuno-agostinho,

I also had similar issue that the --custom option for gff annotation didn't work with VEP v111 using NCBI RefSeq GFF file.

Thanks to your answer in this issue, now I solved the problem by converting the NCBI chromosome ID (e.g. NC_000001.11) to chromosome number (e.g. 1).

But, I still need to preprocess NCBI gff file when I use it at VEP annotation.

So, may I ask you to fix the VEP v111 to be able to use NCBI gff file?

Best regards,
Sean

@nuno-agostinho
Copy link
Contributor

Hi @atimms and @Sean-3B,

The bug fix was merged into our existing codebase and will be available in VEP 112.

@Sean-3B: unfortunately, this fix is not trivial to add to VEP 111. The workaround for now is to replace the chromosome names in GTF/GFF files until VEP 112 is released. Sorry for the inconvenience. Let me know if you need help with this.

Best regards,
Nuno

@barbarian1803
Copy link

Hi,

I also have the same issue. I use GFF from Refseq I downloaded from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/GCF_000001405.40-RS_2023_03/

The GFF use Refseq chromosome naming and I also provide synonym file but also has the same issue. Can you confirm if I have to change the chromosome name in the GFF file?
Thank you.

@atimms
Copy link
Author

atimms commented Mar 11, 2024 via email

@barbarian1803
Copy link

Does this affect BAM file that is used? For example, RefSeq bam file use RefSeq chromosome name. Do I need to modify the chromosome name in the bam file too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants