Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not sure if I'm running this correctly. All QValue and PepQValue are 0 #98

Closed
ttessie2 opened this issue Apr 10, 2020 · 12 comments
Closed
Labels

Comments

@ttessie2
Copy link

I've just starting running MSGF+. I'm just using some publicly available yeast data from MASSIVE. I used philosopher to create my target-decoy database. Things seem to be running without any errors however when I look at the tsv file all the QValues are 0. I double checked the -decoy parameter to make sure it matched what my decoys are appended with and that didn't solve anything. If I search through the tsv file I find rev decoy hits in there. Where could this source of error be coming from?

@FarmGeek4Life
Copy link
Collaborator

Because you used philosopher to create your target-decoy database, you are probably searching with -tda 0. If you use -tda 0, the QValues (and PepQValues) are not calculated; MS-GF+ does not automatically check for decoy prefixes in the .fasta file.

What should work for this is the following:

  • Make a copy of the .fasta file, but have it end with .revCat.fasta instead of .fasta
  • Run the search with -tda 1.

What this does is bypass the MS-GF+ decoy creation process, while still performing a full target-decoy search.

@ttessie2
Copy link
Author

Thank you for the reply!
I originally had it set to -tda 0 but the error you get says:

Error while indexing: 2020-04-08-decoys-contam-UP000002311.revCat.revCat.fasta (too many redundant proteins)
If the database contains forward and reverse proteins, run MS-GF+ (or BuildSA) again with "-tda 0"
If the decoy protein names do not start with XXX either rename them, or use the -decoy switch

After reading that I switched it to '-tda 1' and added '-decoy rev' because the database contains forward and reverse proteins and they do not start with XXX.
Looking at the decoys they are appended rev_ but in my command line I only put rev. Would this make a difference??

For clarity this is what I entered originally. This ran but the QValues were 0.
C:\MSGF+>java -Xmx3500M -jar MSGFPlus.jar -s C:\TPP\data\params\WT_Rep_1_Resp_Prot.mzML -d C:\FragPipe_Skyline\Philosopher\2020-04-08-decoys-contam-UP000002311.fa -inst 1 -t 20ppm -ti 1,2 -ntt 2 -tda 1 -decoy rev -o demo.mzid

@FarmGeek4Life
Copy link
Collaborator

Do not enter the ".revCat.fasta" on the command line for MS-GF+; give it the ".fasta" file, and have the ".revCat.fasta" file in the same directory. MS-GF+ will automatically find the ".revCat.fasta" file and use it instead of generating it.

@FarmGeek4Life
Copy link
Collaborator

And, let me look at the code a little; it's possible that there is some automatic handling of some of this that I don't remember.

@ttessie2
Copy link
Author

Okay, so should I have -d database.fasta -tda 1 -decoy rev?
So when the program runs it will look for the .revCat.fasta file within the directory?
And I shouldn't have the target database and reverse database within the same file?

@FarmGeek4Life
Copy link
Collaborator

Okay, so should I have -d database.fasta -tda 1 -decoy rev?
yes
So when the program runs it will look for the .revCat.fasta file within the directory?
yes
And I shouldn't have the target database and reverse database within the same file?
For the database.fasta file - it may not matter (but it definitely needs the target database). The database.revCat.fasta file needs to have both the target and reverse/decoy database, as one file. (the name "revCat" is just a shortened form of "reverse concatenated", meaning it has both target and decoy hits.)

@ttessie2
Copy link
Author

Okay, I'm running this now so I'll see how it goes.
When you say the .revCat.fasta should be in the same directory. Are you referring to the same directory the fasta.db file?

@FarmGeek4Life
Copy link
Collaborator

database.fasta and database.revCat.fasta need to be in the same directory, e.g. on Windows that might be:
C:\msgf\database.fasta
C:\msgf\database.revCat.fasta

@ttessie2
Copy link
Author

Thanks for the help! I have it working now. Much appreciated! I am new to MS analysis so this has been great.
Last question if you don't mind my asking. What is your preferred next step for protein level analysis? I am more familiar with the TPP pipeline using iprophet -> proteinProphet. It looks like MSGF+ just gives PSM and peptide level analysis, is that correct?

@alchemistmatt
Copy link
Collaborator

Correct: MS-GF+only identifies peptides and reports the proteins that they're associated with. Some options for protein rollup are IDPicker and InfernoRDN. There are also several commercial tools that do a great job, including Scaffold

I suggest IDPicker, since it supports protein parsimony and combining multiple datasets. In contrast, InfernoRDN, just supports protein rollup on a single dataset at a time. IDPicker should be able to read the .mzid files created by MS-GF+.

If you want to perform quantitation (using Selected Ion Chromatograms of the MS1 parent ions), you'd have to analyze your data with MASIC then merge the MASIC results with MS-GF+ using MASIC Results Merger.

@ttessie2
Copy link
Author

Great, thank you!

@Jokendo-collab
Copy link

Jokendo-collab commented Apr 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants