Simple data analysis
LoadingCombined peptide reports
LoadingCombined protein reports
LoadingCompatible operating systems
LoadingHow to Build
LoadingOpen search analysis
LoadingPipeline mode for TMT analysis
LoadingSimple data analysis
LoadingStep by step analysis with Comet
LoadingStep by step TMT analysis
Clone this wiki locally
This tutorial will show you how to use Philosopher for a complete proteomics data analysis, starting from a raw LC-MS file and ending with protein reports. Philosopher can also be used on Windows, though the commands in this tutorial are formatted for GNU/Linux
The commands in this tutorial should be executed in a particular order. Consider this the "default" order in which to perform an analysis:
- Create a workspace
- Download a database
- Search with MSFragger
- Quantify (optional)
For this tutorial, we will use a publicly-available LC-MS data file from a human cell line sample described in this publication. Download the file 20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.raw from the dataset FTP location (the full listing is here).
You can choose to use the .raw spectral format, or you can convert it to the mzML format (which is needed for quantification). A tutorial on raw file conversion can be found here.
For additional help on any of the Philosopher commands, you can use the
--help flag (e.g.
philosopher workspace --help), which will provide a description of all available flags for each command.
(Note: use the full path to the Philosopher binary file in place of philosopher in the following steps.)
Place the 20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.raw in a new folder, which we will call the 'workspace'. We will create the workspace with the Philosopher workspace command, which will enable the program to store processed data in a special binary format for quick access. (If you have already initialized a Philosopher workspace in this same location, run
philosopher workspace --clean to prepare it for a new analysis.)
Inside your workspace folder, open a new terminal window and run this command:
philosopher workspace --init
From now on, all steps should be executed inside this same directory.
For the first step we will download and format a database file using the database command, but first we need to find the Proteome ID (PID) for our organism. Searching the UniProt proteome list, we can see that the Homo sapiens proteome ID is UP000005640, so let's prepare the database file with the following command:
philosopher database --id UP000005640 --contam --reviewed
Philosopher will retrieve all reviewed protein sequences from this proteome, add common contaminants, and generate decoy sequences labeled with the tag rev_.
You should see that a new file was created in the workspace. (Note: Databases must be processed within the current Philosopher workspace for the analysis to finish properly. If you are not downloading the database with Philosopher in the current workspace, see building a custom database from existing sequences or annotating an existing database if you already have one formatted from a previous Philosopher analysis.)
(Note: use the full path to the MSFragger.jar file in place of MSFragger.jar in the following steps.)
java -jar MSFragger.jar --config to print three MSFragger parameter files (closed, nonspecific, and open).
In the closed_fragger.params file (you can remove the other two
.params files if desired), update the database_name parameter to the name of the database file we downloaded in the previous step (e.g. 2019-11-04-td-rev-UP000005640.fas). You can also change the calibrate_mass parameter near the top of the file from
0 to speed up the search even more.
Launch the search by running:
java -Xmx32g -jar MSFragger.jar closed_fragger.params 20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.raw. (Adjust the
-Xmx flag to the appropriate amount of RAM for your computer.)
The search should be done in a few minutes or less. The search hits are now stored in a file called 20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.pepXML.
The next step is to validate the peptide hits with PeptideProphet:
philosopher peptideprophet --database 2019-11-04-td-rev-UP000005640.fas --decoy rev_ --ppm --accmass --expectscore --decoyprobs --nonparam 20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.pepXML
Note that tag identifying decoy sequences ("rev_" in this case) has been specified with the
This will generate a new file called interact-20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.pep.xml.
Next, perform protein inference and generate a protXML file:
philosopher proteinprophet interact-20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.pep.xml
Now we have all necessary files to filter our data using the FDR approach:
philosopher filter --sequential --razor --picked --tag rev_ --pepxml interact-20190202_QExHFX2_RSLC8_PST_HeLa_10ng_1ulLoop_muPAC_1hr_15k_7.pep.xml --protxml interact.prot.xml
Note that the "rev_" tag is specified again at this step. The filter algorithm can be applied in many different ways, use the
--help flag and choose the best method to analyze your data. Scoring results will be shown in the console, and all processed data will be stored in your workspace for further analysis.
Filtered search results can be quantified using MS1 peak intensities at this point.
philosopher freequant --dir .
(Isobaric label-based quantification can also be performed at this point, see the labelquant command for more information.)
Now we can inspect the results by printing the PSM, peptide, and protein reports:
As an optional last step, backup your data in case you wish to print the reports again later.
philosopher workspace --backup
We've demonstrated how to run a complete proteomics analysis with TMT quantification using Philosopher. By providing easy access to advanced analysis software and custom processing algorithms, protein reports can be obtained from LC-MS files in just a few minutes.