Assert mutated_peptide in peptide_info failed #6

choosehappy · 2017-10-08T09:50:46Z

In on some samples (it works on others) the python script failed with this error:

File "MuPeXI.py", line 79, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "MuPeXI.py", line 720, in peptide_extraction
peptide_info, pepmatch_file_names = normal_peptide_correction(mutated_peptides_missing_normal, mutation_info, p_length, reference_peptide_file_names, peptide_info, peptide_match, tmp_dir, pepmatch_file_names, webserver, print_mismatch, num_mismatches)
File "MuPeXI.py", line 825, in normal_peptide_correction
assert mutated_peptide in peptide_info_

I’ve narrowed it down to one indel:
chr14 22482290 . T TCCG

and use the following command line:
python MuPeXI.py -v /data/DO/p2/final/2017-09-21_p2/batch2-ensemble-annotated_indel.vcf.recode.vcf -a HLA-A02:06 -c ./config.ini -d /data/DO/p2/final/2017-09-21_p2/mupexi/ -l 9

The value of mutated_peptide is “XIRQGAQKL”

and the value of peptide_info is (which doesn't contain the peptide):

defaultdict(<type 'dict'>, {'RQGAQKLVF': {'IQGAQKLVF': [mutation_info(gene_id='ENSG00000211836', trans_id='ENST00000390484',
mutation_consequence='inframe_insertion', chr='14', pos='22482290-22482291', cdna_pos='4-5', prot_pos=2, prot_pos_to=None, aa_normal='I', aa_mut='IR',
codon_normal='att', codon_mut='atCCGt', alt_allele='CCG', symbol='TRAJ54'), peptide_sequence_info(chop_normal_sequence='-----------',
mutation_sequence='XIRQGAQKLVF', normal_sequence='XIQGAQKLVF', mutation_position='2:4', consequence='I'), '1:2', pep_match_info(normal_peptide='IQGAQKLVF',
mismatch=1, mismatch_peptide='---------')]}, 'IRQGAQKLV': {'IRQGKAKLV': [mutation_info(gene_id='ENSG00000211836', trans_id='ENST00000390484',
mutation_consequence='inframe_insertion', chr='14', pos='22482290-22482291', cdna_pos='4-5', prot_pos=2, prot_pos_to=None, aa_normal='I', aa_mut='IR',
codon_normal='att', codon_mut='atCCGt', alt_allele='CCG', symbol='TRAJ54'), peptide_sequence_info(chop_normal_sequence='-----------',
mutation_sequence='XIRQGAQKLVF', normal_sequence='XIQGAQKLVF', mutation_position='2:4', consequence='I'), '1:3', pep_match_info(normal_peptide='IRQGKAKLV',
mismatch=2, mismatch_peptide='---------')]}})

Any thoughts?

Cheers,
Andrew

choosehappy · 2017-10-09T05:40:08Z

By the way, the same issue occurs when submitting that VCF to your web-server:

with the same vcf file (attached, rename from txt to vcf):

batch2-ensemble-annotated_indel.vcf.recode.txt

Cheers,
Andrew

ambj · 2017-10-09T07:39:43Z

Hi Andrew

It looks like an insertion exception that I have not accounted for. I can also see that it must be an somewhat odd transcript since it includes X in the normal sequence.
I will look into this, and try to fix the bug. Would you be able to provide me with a test VCF file including the error prone insertion?

Best,
Anne-Mette

choosehappy · 2017-10-09T07:44:52Z

Thanks for responding so quickly

Just to confirm, do you see the vcf file attached in the second comment (as a txt file; it won't be let me upload .vcf files)? If not I can email or host it somewhere else.

That file contains only the variant which is causing the error, if you'd like I can include a larger VCF?

From my side, I don't think anything exceptional has been done to generate these variants. I used a straightforward normal vs tumor calling on WES NGS reads pipeline to produce the vcf

choosehappy · 2017-10-09T07:51:39Z

Actually, I also made a small change to the python script to support more types of variant callers (freebayes, varscan, vardict, mutect).

You can find the gist of the difference here:

https://gist.github.com/choosehappy/e54f23e41ad9246bdafd4d4ceb4e97d7

Feel free to incorporate those changes if you'd like (i can submit a pull request if you prefer)

ambj · 2017-10-09T08:00:33Z

Ah yes, thanks! and it is fine with the single mutation.
No, it dont think anything is out of the ordinary with the variant calling, but sometimes the variants are called in for example pseudo genes where the reference sequence is odd (including X's). Here some exceptions needs to be taken into account. The more VCF files are run trough MuPeXI the higher the likelihood of finding an exception to the ordinary which is not encountered for in the current version.
And thank you for the edits - that have been on my todo for some time.

choosehappy · 2017-10-09T08:04:17Z

Not a problem, glad to help.

In particular, those edits were necessary because we're using ensemble calls from the bcbio variant calling pipeline, and i discovered that the FORMAT fields for each mutation are not of the same format. whichever caller "first" called the mutation has its format put into the final file once confirmed by a second caller, so a robust approach which accepts "any" format on the fly was necessary

choosehappy · 2017-10-27T13:53:50Z

So I've stumbled across a set of peptides which also fail that assertion.
I looked into trying to figure out which mutations are causing them but they're more difficult to track down than the one mentioned above

Anyway, these are the ones which raise an error, is this sufficient information?

CISTSRSSGGVS
CAPCISTSRSSG
ISTSRSSGGVSI
PCISTSRSSGGV
SRSSGGVSILHC
STSRSSGGVSIL
SSGGVSILHCHN
TSRSSGGVSILH
RSSGGVSILHCH
APCISTSRSSGG
VCAPCISTSR*SS

ambj · 2017-10-27T14:56:59Z

Dear Andrew

I'm very sorry for not solving the course of this error sooner. I'm currently writing up my PhD thesis which is to be handed in, in the middle of december. Fore this reason im very sorry to inform you that i will not be able to solve this problem in the near future, but it is on my todo and will be solved as soon as possible.
I hope this does not course you, or any other user experiencing the same error, too much trouble.

Best,
Anne-Mette

choosehappy · 2017-10-27T15:08:40Z

Thank you for updating us on your situation; it is certainly understandable. Is it possible for you to provide a time estimate? say January 2018?

In the meantime, when that assertion is triggered, I've had Mupexi simply skip it. Besides not considering that particular neuropeptide, are there any other (unforeseen) consequences?

ambj · 2017-10-27T15:24:01Z

As a time estimate I hope to have MuPeXI updated by the beginning of February 2018.

I would not recommend avoiding the assertions in the code, as these check that the information you seek in the different sets or directories actually exists. I would rater identify the mutations in the VCF file causing the errors and exclude these from the analysis until the bug is fixed.
With that said I think skipping these assertions would only leave you with odd outputs with empty fields in the output file but, but I'm not completely confident in this statement. In the worst case a shift would occur which would make the entire output useless, instead of just missing out in a few peptides predicted.

choosehappy · 2017-10-27T17:18:24Z

I see, thank you for that insight. I'll have to be careful.

I was wondering if you knew of a quick way to identify which mutation is causing the error? The way the normal_peptide_correction function is written, which is at the peptide level, its not obvious to me which mutation from the vcf file created the problematic peptide?

For the first error i reported, it was easy to find since "pep_match" only contained a single mutation, but now that variable contains many mutations.

Is there a way to know which one should be removed from the vcf file?

ambj · 2017-10-30T08:48:08Z

It is not strait forward - but it would be a great idea to print the valuable information for the assertion errors. What i normally to when i trouble shoot is print information needed to find the troubling mutation.

From the mutation info generated from the VEP file you cen print the information needd from the VCF file.
build_vep_info (line 539):
Mutation_Info = namedtuple('mutation_info', ['gene_id', 'trans_id', 'mutation_consequence','chr', 'pos', 'cdna_pos', 'prot_pos', 'prot_pos_to', 'aa_normal', 'aa_mut', 'codon_normal', 'codon_mut', 'alt_allele', 'symbol'])

the mutation info is also used in the normal_peptide_correction function. therefor i would put in the following line, right before the error occurs. the last information printet is the the mutation causing the problem.
print( '{}/t{}/t{}/t{}/t{}'.format(mutation_info.chr, mutation_info.pos, mutation_info.codon_normal, mutation_info.codon_mut, mutation_info.alt_allele))

From this you can look it up in your VCF file. There might be some incompatibilities between VEP and VCF, how the exact genomic position is counted.

I know its not the smoothes solution, but it might help you along the way to begin with.

ambj · 2018-04-17T08:24:28Z

Hi Andrew
I have fixed this bug in the newest version of MuPeXI on the dev branch. This will be merged with the master branch later this week with the new release.
I tested the mutation you reported and could not reproduce the error in the newest version og MuPeXI.
I hope this have solved your problem

Best,
/AM

ambj closed this as completed Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assert mutated_peptide in peptide_info failed #6

Assert mutated_peptide in peptide_info failed #6

choosehappy commented Oct 8, 2017

choosehappy commented Oct 9, 2017

ambj commented Oct 9, 2017

choosehappy commented Oct 9, 2017

choosehappy commented Oct 9, 2017

ambj commented Oct 9, 2017

choosehappy commented Oct 9, 2017

choosehappy commented Oct 27, 2017

ambj commented Oct 27, 2017

choosehappy commented Oct 27, 2017

ambj commented Oct 27, 2017

choosehappy commented Oct 27, 2017

ambj commented Oct 30, 2017

ambj commented Apr 17, 2018

Assert mutated_peptide in peptide_info failed #6

Assert mutated_peptide in peptide_info failed #6

Comments

choosehappy commented Oct 8, 2017

choosehappy commented Oct 9, 2017

ambj commented Oct 9, 2017

choosehappy commented Oct 9, 2017

choosehappy commented Oct 9, 2017

ambj commented Oct 9, 2017

choosehappy commented Oct 9, 2017

choosehappy commented Oct 27, 2017

ambj commented Oct 27, 2017

choosehappy commented Oct 27, 2017

ambj commented Oct 27, 2017

choosehappy commented Oct 27, 2017

ambj commented Oct 30, 2017

ambj commented Apr 17, 2018