Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Input File

The input file should be a resulting PSM (peptide-spectrum match) output file from a database search such as Mascot or Andromeda in tab delimited format. The minimum required columns are :

The amino acid sequence of a PSM
A column that describes the modifications of the sequence.

(Note for MaxQuant this information is already provided in Modified Sequence)

(or Intensity) the AUC (or SPCs) for each PSM
Mascot (or search engine equivalent) score for each PSM
Percolator (or search engine equivalent) score for each PSM Note that PEP can be used in place of q_value with the rough observational approximation that PEP / 10 = q_value
  • Charge

Renaming Columns

A base configuration file for renaming column headers is provided with gpgrouper To generate it, simply run gpgrouper getconfig and a config file will be generated. The configuration file can be specified by the --configfile flag in gpgrouper run, but if not specified and the default gpgrouper_config.ini file exists, it will be used.

This file can be edited with addition of new column aliases as needed, though it should be set up to work with ProteomeDiscoverer+Mascot and MaxQuant already.

Additional info for MaxQuant

For MaxQuant output, the evidence.txt PSMs file should be used as input. gpgrouper is designed to be run separately for each experiment. As of writing, it is not configured to work with multiple label-free experiments searched together under say MaxQuant. Therefore, separate experiments need to be separated. In other words, if MaxQuant is used to search multiple experiments at once (potentially with match between runs) the evidence.txt file needs to be separated into separate “experiment” files. A simple script is available for doing just this.

FASTA Database File

The database file should be a pre-constructed tab delimited file for matching PSMs to their respective GeneIDs. The required columns are TaxonID, HomologeneID, GeneID, ProteinGI, FASTA. Note that PyGrouper uses GeneID to group PSMs, so if a GeneID is lacking for a desired grouping another identifier can be substituted in such as ProteinGI. HomologeneID can be an empty column if this information is not available or desired.


in house grouping program for proteomics




No releases published


No packages published