New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated transition IDs in TargetedExperiment object from PQP source, caused by decoy peptides with same sequence but different gene #5653
Comments
Hi, I am seeing similar issues. |
Hi @shubham1637 , looking back at this issue again, I think the main problem was caused by different genes were assigned to same one peptide sequence, and the decoys in my first narrative was only one way to reach it. Which means if peptideA has GeneI in some rows and GeneII in other rows, this would lead to duplicated transition IDs, since the join action for tables in PQP file will also use gene column. Different genes would be kept, and any other values that were same would be repeated. If you get same error in second code block when running OSW, and error in third code block when runnning TargetedFileConverter, I think you can have a look at the genes (or protein groups?) in your tsv or pqp file. Hope this would be helpful. Best, |
You r right.
I removed gene column altogether and it doesn't throw error anymore.
Thanks!
Best,
Shubham
…On Mon., Feb. 7, 2022, 11:40 p.m. Ronghui Lou, ***@***.***> wrote:
Hi @shubham1637 <https://github.com/shubham1637> , looking back at this
issue again, I think the main problem was caused by different genes were
assigned to same one peptide sequence, and the decoys in my first narrative
was only one way to reach it.
Which means if peptideA has GeneI in some rows and GeneII in other rows,
this would lead to duplicated transition IDs, since the join action for
tables in PQP file will also use gene column. Different genes would be
kept, and any other values that were same would be repeated.
If you get same error in second code block when running OSW, and error in
third code block when runnning TargetedFileConverter, I think you can have
a look at the genes (or protein groups?) in your tsv or pqp file.
Hope this would be helpful.
Best,
Ronghui
—
Reply to this email directly, view it on GitHub
<#5653 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUNCXCHTK6V6265NXYXZUTU2CNDBANCNFSM5H4K2SZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
For anyone who reaches here, The main problem of this issue would be caused by the assay library file itself, and I think this should be fixed by users ourselves, but not a issue for developers. So I would like to close this issue. If you meet duplicated transition ID error, please check: only one unique gene and one unique protein was assigned to each peptide, but not two or more different ones appear in different rows |
It seems this issue still persists |
Part of the issue could come from the SQL select query here: https://github.com/OpenMS/OpenMS/blob/develop/src/openms/source/ANALYSIS/OPENSWATH/TransitionPQPFile.cpp which could lead to duplicated entries when you have 1:n mappings of peptides to proteins / genes. We should address this
|
Hi @hroest ,
I'm using OpenSwath to analyze DIA data acquired from QEHF, and an error occurred when I ran OSW with generated pqp file.
The error was caused by some decoy peptides which got same sequence from different target peptide sequences (also belonged to different genes), and this lead to duplicated transition IDs at the pqp reading step for aggregation of gene table
If the input mzml was converted from thermo raw file by msconvert without peak picking, there will be no exception raised, and just stopped when searching, like this
When the input mzml was converted with peak picking, the error will be invalid ID
If I use TargetedFileConverter again, from pqp to tsv, the error will be raised correctly, in the checking step after reading database and generating TargetedExperiment
The file attched below (extracted from pqp file) is all transitions with duplicated IDs after running DecoyGenerator.
example_of_same_seq_in_diff_genes.txt
Two kinds of decoy peptides with same sequence:
Peptide
FVQDLSK
belongs toQ91ZJ5;DECOY_P52196
, in which DECOY_P52196 has original proteinID P52196 with a peptideFQLVDSR
, gene name of these two is Ugp2 and TstPeptide
YLDLLQK
belongs to protein groupDECOY_Q0KK55;DECOY_Q6PHN7
, and the original sequence is YLLDLLR and YLLQLLR, with one AA difference, belongs to Tmem164 and Kndc1 respectively (after shuffle, protein are combined but gene are individually kept)Currently I directly dropped decoy peptides which have same sequences as targets and same decoy peptide sequence belong to different genes, when assay file was still in tsv format before converting to pqp and it worked fine now
Maybe this case is rare since it needs both genes assgined and same sequences from randomly generated decoy peptides
I'd suggest an optional parameter to control if the decoys are allowed as same as target ones, or just filter them out.
And a checking step for pqp file in OSW will be great, like that in TargetedFileConverter, to find some invalid items before next step.
Best regards,
Ronghui
The text was updated successfully, but these errors were encountered: