The input file should be a resulting PSM (peptide-spectrum match) output file from a database search such as Mascot or Andromeda in tab delimited format. The minimum required columns are :
- The amino acid sequence of a PSM
- A column that describes the modifications of the sequence.
(Note for MaxQuant this information is already provided in
- (or Intensity) the AUC (or SPCs) for each PSM
- Mascot (or search engine equivalent) score for each PSM
- Percolator (or search engine equivalent) score for each PSM Note that PEP can be used in place of q_value with the rough observational approximation that PEP / 10 = q_value
A base configuration file for renaming column headers is provided with
To generate it, simply run
gpgrouper getconfig and a config file will be generated.
The configuration file can be specified by the
--configfile flag in
gpgrouper run, but if not specified
and the default
gpgrouper_config.ini file exists, it will be used.
This file can be edited with addition of new column aliases as needed, though it should be set up to work with ProteomeDiscoverer+Mascot and MaxQuant already.
Additional info for MaxQuant
For MaxQuant output, the
evidence.txt PSMs file should be used as input. gpgrouper is designed
to be run separately for each experiment. As of writing, it is not configured to work with multiple
label-free experiments searched together under say MaxQuant. Therefore, separate experiments
need to be separated. In other words, if MaxQuant is used to search multiple experiments at
once (potentially with match between runs) the evidence.txt file needs to be separated into separate “experiment” files.
A simple script is available for doing just this.
FASTA Database File
The database file should be a pre-constructed tab delimited file for matching
PSMs to their respective GeneIDs.
The required columns are
Note that PyGrouper uses
GeneID to group PSMs, so if a GeneID is lacking for
a desired grouping another identifier can be substituted in such as
HomologeneID can be an empty column if this information is not available