Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Died at /work/Software/Download/Variant_Package/annovar/coding_change.pl line 553, <FASTA> line 149454. #34

Closed
haiwufan opened this issue Jun 8, 2018 · 6 comments

Comments

@haiwufan
Copy link

haiwufan commented Jun 8, 2018

Hi Developer!
I test the latest annovar, and get a error.

log record as below:

$ table_annovar.pl $Sample.combined.vcf $Anno_db --vcfinput -buildver hg38 -out $Sample --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring .

NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 N0202G2.combined.vcf > N0202G2.avinput>
NOTICE: Finished reading 365921 lines from VCF file
NOTICE: A total of 362523 locus in VCF file passed QC threshold, representing 346476 SNPs (239967 transitions and 106509 transversions) and 16547 indels/substitutions
NOTICE: Finished writing allele frequencies based on 346476 SNP genotypes (239967 transitions and 106509 transversions) and 16547 indels/substitutions for 1 samples

NOTICE: Running with system command </work/Software/Download/Variant_Package/annovar/table_annovar.pl N0202G2.avinput /work/Database/Annovar_db/hg38_20180130 -buildver hg38 -outfile N0202G2 --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring . -otherinfo>

NOTICE: Processing operation=r protocol=cytoBand

NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype cytoBand -buildver hg38 -outfile N0202G2 N0202G2.avinput /work/Database/Annovar_db/hg38_20180130>
NOTICE: Output file is written to N0202G2.hg38_cytoBand
NOTICE: Reading annotation database /work/Database/Annovar_db/hg38_20180130/hg38_cytoBand.txt ... Done with 1293 regions
NOTICE: Finished region-based annotation on 363002 genetic variants
NOTICE: Variants with invalid input format were written to N0202G2.invalid_input

NOTICE: Processing operation=g protocol=refGeneWithVer

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype refGeneWithVer -outfile N0202G2.refGeneWithVer -exonsort N0202G2.avinput /work/Database/Annovar_db/hg38_20180130>
NOTICE: Output files were written to N0202G2.refGeneWithVer.variant_function, N0202G2.refGeneWithVer.exonic_variant_function
NOTICE: Reading gene annotation from /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt ... Done with 74727 transcripts (including 18443 without coding sequence annotation) for 28059 unique genes
NOTICE: Processing next batch with 363002 unique variants in 363002 input lines
NOTICE: Reading FASTA sequences from /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa ... Done with 21138 sequences
WARNING: A total of 526 sequences will be ignored due to lack of correct ORF annotation
NOTICE: Variants with invalid input format were written to N0202G2.refGeneWithVer.invalid_input

NOTICE: Running with system command <coding_change.pl N0202G2.refGeneWithVer.exonic_variant_function.orig /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa -alltranscript -out N0202G2.refGeneWithVer.fa -newevf N0202G2.refGeneWithVer.exonic_variant_function>
Died at /work/Software/Download/Variant_Package/annovar/coding_change.pl line 553, line 149454.
Error running system command: <coding_change.pl N0202G2.refGeneWithVer.exonic_variant_function.orig /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa -alltranscript -out N0202G2.refGeneWithVer.fa -newevf N0202G2.refGeneWithVer.exonic_variant_function>
Error running system command: </work/Software/Download/Variant_Package/annovar/table_annovar.pl N0202G2.avinput /work/Database/Annovar_db/hg38_20180130 -buildver hg38 -outfile N0202G2 --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring . -otherinfo>

And then I check temp file $Sample.refGeneWithVer.fa file and found :

$ tail $Sample.refGeneWithVer.fa
LNLGIFASRLYYHWCKPQQKGLRLLCGSQVPVEVMGFPEFADCWENFVDHEKPLSFNPYKMLEELDKNSRAIKRRLERIKQS*

line343937 NM_004711.4 WILDTYPE
MEGGAYGAGKAGGAFDPYTLVRQPHTILRVVSWLFSIVVFGSIVNEGYLNSASEGEEFCIYNRNPNACSYGVAVGVLAFLTCLLYLALDVYFPQISSVKD
RKKAVLSDIGVSAFWAFLWFVGFCYLANQWQVSKPKDNPLNEGTDAARAAIAFSFFSIFTWAGQAVLAFQRYQIGADSALFSQDYMDPSQDSSMPYAPYV
EPTGPDPAGMGGTYQQPANTFDTEPQGYQSQGY*
line343937 NM_004711.4 c.605_606insCAA p.P202_T203insN protein-altering (position 202-203 has insertion N)
MEGGAYGAGKAGGAFDPYTLVRQPHTILRVVSWLFSIVVFGSIVNEGYLNSASEGEEFCIYNRNPNACSYGVAVGVLAFLTCLLYLALDVYFPQISSVKD
RKKAVLSDIGVSAFWAFLWFVGFCYLANQWQVSKPKDNPLNEGTDAARAAIAFSFFSIFTWAGQAVLAFQRYQIGADSALFSQDYMDPSQDSSMPYAPYV
EPNTGPDPAGMGGTYQQPANTFDTEPQGYQSQGY*
WARNING: invalid triplets found in DNA sequence to be translated: in

Then I get line343937 info from $Sample.refGeneWithVer.exonic_variant_function.orig file, but don't not found some problem.

$ grep "NM_004711" $Sample.refGeneWithVer.exonic_variant_function.orig
line343937 nonframeshift insertion SYNGR1:NM_004711.4:exon4:c.605_606insCAA:p.P202delinsPN chr22 39381817 39381817 - CAA 1 9966.73 223 chr22 39381817 rs149306472 C CCAA 9966.73 PASS AC=2;AF=1.00;AN=2;DB;DP=230;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.000;MQ=60.00;QD=44.69;SOR=1.179;set=variant2 GT:AD:DP:GQ:PL 1/1:0,223:223:99:10004,673,0

@haiwufan
Copy link
Author

haiwufan commented Jun 8, 2018

$ tail $Sample.refGeneWithVer.fa

default

@kaichop
Copy link
Contributor

kaichop commented Jun 8, 2018 via email

@haiwufan
Copy link
Author

haiwufan commented Jun 8, 2018

Hi Kaichop
I annotate the "chr22 39381817 39381817 - CAA" variant, It's ok.
Then I split avinput by chromosome, and annotate separately by Annovar. At last I confirm this error happend on chrX chromosome, but still can't determine which position variant.

I upload my chrX avinput file, temporary file and logs.

$ /work/Software/annovar_2018-04-16/table_annovar.pl ../chrX.avinput $Anno_db -buildver hg38 -out chrX --checkfile --otherinfo -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring -

Bug_Report.zip

@haiwufan
Copy link
Author

haiwufan commented Jun 8, 2018

I found "chrX 1403271 1403271 A - " variant will cause this error.

My hg38_refGeneWithVer.txt info contain this position:
hg38_refGeneWithVer.xlsx

@kaichop
Copy link
Contributor

kaichop commented Jun 9, 2018 via email

@haiwufan
Copy link
Author

Thanks kaichop,

I have found the reason. When I build hg38_refGeneMrna.fa from hg38_refGene.txt, I choose hg38 refgenome fasta file from GATK bundle database. But this file mask some region of chromosome with 'N'. So I got the error hg38_refGeneMrna.fa.
Then, I rebuild hg38_refGeneMrna.fa, and choose refgenome fasta from UCSC. I solved this problem.

Thanks again!

@kaichop kaichop closed this as completed Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants