New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert mutated_peptide in peptide_info failed #6
Comments
By the way, the same issue occurs when submitting that VCF to your web-server: with the same vcf file (attached, rename from txt to vcf): batch2-ensemble-annotated_indel.vcf.recode.txt Cheers, |
Hi Andrew It looks like an insertion exception that I have not accounted for. I can also see that it must be an somewhat odd transcript since it includes X in the normal sequence. Best, |
Thanks for responding so quickly Just to confirm, do you see the vcf file attached in the second comment (as a txt file; it won't be let me upload .vcf files)? If not I can email or host it somewhere else. That file contains only the variant which is causing the error, if you'd like I can include a larger VCF? From my side, I don't think anything exceptional has been done to generate these variants. I used a straightforward normal vs tumor calling on WES NGS reads pipeline to produce the vcf |
Actually, I also made a small change to the python script to support more types of variant callers (freebayes, varscan, vardict, mutect). You can find the gist of the difference here: https://gist.github.com/choosehappy/e54f23e41ad9246bdafd4d4ceb4e97d7 Feel free to incorporate those changes if you'd like (i can submit a pull request if you prefer) |
Ah yes, thanks! and it is fine with the single mutation. |
Not a problem, glad to help. In particular, those edits were necessary because we're using ensemble calls from the bcbio variant calling pipeline, and i discovered that the FORMAT fields for each mutation are not of the same format. whichever caller "first" called the mutation has its format put into the final file once confirmed by a second caller, so a robust approach which accepts "any" format on the fly was necessary |
So I've stumbled across a set of peptides which also fail that assertion. Anyway, these are the ones which raise an error, is this sufficient information? CISTSRSSGGVS |
Dear Andrew I'm very sorry for not solving the course of this error sooner. I'm currently writing up my PhD thesis which is to be handed in, in the middle of december. Fore this reason im very sorry to inform you that i will not be able to solve this problem in the near future, but it is on my todo and will be solved as soon as possible. Best, |
Thank you for updating us on your situation; it is certainly understandable. Is it possible for you to provide a time estimate? say January 2018? In the meantime, when that assertion is triggered, I've had Mupexi simply skip it. Besides not considering that particular neuropeptide, are there any other (unforeseen) consequences? |
As a time estimate I hope to have MuPeXI updated by the beginning of February 2018. I would not recommend avoiding the assertions in the code, as these check that the information you seek in the different sets or directories actually exists. I would rater identify the mutations in the VCF file causing the errors and exclude these from the analysis until the bug is fixed. |
I see, thank you for that insight. I'll have to be careful. I was wondering if you knew of a quick way to identify which mutation is causing the error? The way the normal_peptide_correction function is written, which is at the peptide level, its not obvious to me which mutation from the vcf file created the problematic peptide? For the first error i reported, it was easy to find since "pep_match" only contained a single mutation, but now that variable contains many mutations. Is there a way to know which one should be removed from the vcf file? |
It is not strait forward - but it would be a great idea to print the valuable information for the assertion errors. What i normally to when i trouble shoot is print information needed to find the troubling mutation. From the mutation info generated from the VEP file you cen print the information needd from the VCF file. the mutation info is also used in the normal_peptide_correction function. therefor i would put in the following line, right before the error occurs. the last information printet is the the mutation causing the problem. From this you can look it up in your VCF file. There might be some incompatibilities between VEP and VCF, how the exact genomic position is counted. I know its not the smoothes solution, but it might help you along the way to begin with. |
Hi Andrew Best, |
In on some samples (it works on others) the python script failed with this error:
File "MuPeXI.py", line 79, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "MuPeXI.py", line 720, in peptide_extraction
peptide_info, pepmatch_file_names = normal_peptide_correction(mutated_peptides_missing_normal, mutation_info, p_length, reference_peptide_file_names, peptide_info, peptide_match, tmp_dir, pepmatch_file_names, webserver, print_mismatch, num_mismatches)
File "MuPeXI.py", line 825, in normal_peptide_correction
assert mutated_peptide in peptide_info_
I’ve narrowed it down to one indel:
chr14 22482290 . T TCCG
and use the following command line:
python MuPeXI.py -v /data/DO/p2/final/2017-09-21_p2/batch2-ensemble-annotated_indel.vcf.recode.vcf -a HLA-A02:06 -c ./config.ini -d /data/DO/p2/final/2017-09-21_p2/mupexi/ -l 9
The value of mutated_peptide is “XIRQGAQKL”
and the value of peptide_info is (which doesn't contain the peptide):
defaultdict(<type 'dict'>, {'RQGAQKLVF': {'IQGAQKLVF': [mutation_info(gene_id='ENSG00000211836', trans_id='ENST00000390484',
mutation_consequence='inframe_insertion', chr='14', pos='22482290-22482291', cdna_pos='4-5', prot_pos=2, prot_pos_to=None, aa_normal='I', aa_mut='IR',
codon_normal='att', codon_mut='atCCGt', alt_allele='CCG', symbol='TRAJ54'), peptide_sequence_info(chop_normal_sequence='-----------',
mutation_sequence='XIRQGAQKLVF', normal_sequence='XIQGAQKLVF', mutation_position='2:4', consequence='I'), '1:2', pep_match_info(normal_peptide='IQGAQKLVF',
mismatch=1, mismatch_peptide='---------')]}, 'IRQGAQKLV': {'IRQGKAKLV': [mutation_info(gene_id='ENSG00000211836', trans_id='ENST00000390484',
mutation_consequence='inframe_insertion', chr='14', pos='22482290-22482291', cdna_pos='4-5', prot_pos=2, prot_pos_to=None, aa_normal='I', aa_mut='IR',
codon_normal='att', codon_mut='atCCGt', alt_allele='CCG', symbol='TRAJ54'), peptide_sequence_info(chop_normal_sequence='-----------',
mutation_sequence='XIRQGAQKLVF', normal_sequence='XIQGAQKLVF', mutation_position='2:4', consequence='I'), '1:3', pep_match_info(normal_peptide='IRQGKAKLV',
mismatch=2, mismatch_peptide='---------')]}})
Any thoughts?
Cheers,
Andrew
The text was updated successfully, but these errors were encountered: