Skip to content

TMT analysis

Sarah Haynes edited this page Nov 14, 2019 · 4 revisions

This tutorial will show you how to use Philosopher for a complete proteomics data analysis, starting from a raw LC-MS file and ending with TMT-quantified protein reports. Philosopher can also be used on Windows, though the commands in this tutorial are formatted for GNU/Linux

What are the basic steps?

The commands in this tutorial should be executed in a particular order. Consider this the "default" order in which to perform an analysis:

  1. Create a workspace
  2. Download a database
  3. Search with MSFragger
  4. PeptideProphet
  5. ProteinProphet
  6. Filter
  7. Quantify
  8. Report

Before we start

For this tutorial, we will use a publicly-available LC-MS data file from a TMT 10-plex phosphorylation-enriched human cell line sample described in this publication. Download the file 06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.raw from the dataset FTP location (the full listing is here).

Philosopher requires the mzML spectral format for quantification. A tutorial on raw file conversion can be found here. Once you've converted the .raw file to .mzML, you are ready to get started.

For additional help on any of the Philosopher commands, you can use the --help flag (e.g. philosopher workspace --help), which will provide a description of all available flags for each command.

1. Create a workspace

(Note: use the full path to the Philosopher binary file in place of philosopher in the following steps.) Place the 06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.raw in a new folder, which we will call the 'workspace'. We will create the workspace with the Philosopher workspace command, which will enable the program to store processed data in a special binary format for quick access.

Inside your workspace folder, open a new terminal window and run this command: philosopher workspace --init

From now on, all steps should be executed inside this same directory.

2. Download a protein database

For the first step we will download and format a database file using the database command, but first we need to find the Proteome ID (PID) for our organism. Searching the UniProt proteome list, we can see that the Homo sapiens proteome ID is UP000005640, so let's prepare the database file with the following command:

philosopher database --id UP000005640 --contam

Philosopher will retreive all reviewed protein sequences from this proteome, add common contaminants, and generate decoy sequences labeled with the tag rev_.

You should see that a new file was created in the workspace. (Note: all databases must be reformatted through Philosopher, see building a custom database or annotating an existing database.)

3. Perform a database search with MSFragger

(Note: use the full path to the MSFragger.jar file in place of MSFragger.jar in the following steps.)

Run java -jar MSFragger.jar --config to print three MSFragger parameter files (closed, nonspecific, and open).

In the closed_fragger.params file, update the database_name parameter to the name of the database file we downloaded in the previous step (e.g. 2019-11-04-td-rev-UP000005640.fas). Below variable_mod_02, add a third variable modification for the TMT isobaric label on the peptide N-terminus: variable_mod_03 = 229.162932 n^, and a fourth to account for phosphorylation: variable_mod_04 = 79.966331 STY.

Towards the bottom of the parameter file, change the add_K_lysine value from 0.000000 to 229.162932. You can also change the calibrate_mass parameter near the top of the file from 2 to 0 to speed up the search even more.

Launch the search by running: java -Xmx32g -jar MSFragger.jar closed_fragger.params 06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.mzML. (Adjust the -Xmx flag to the appropriate amount of RAM for your computer.)

The search should be done in a few minutes or less. The search hits are now stored in a file called 06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.pepXML.

4. PeptideProphet

The next step is to validate the peptide hits with PeptideProphet:

philosopher peptideprophet --database 2019-11-04-td-rev-UP000005640.fas --ppm --accmass --expectscore --decoyprobs --nonparam 06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.pepXML

This will generate a new file called interact-06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.pep.xml.

5. ProteinProphet

Next, perform protein inference and generate a protXML file:

philosopher proteinprophet interact-06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.pep.xml

6. Filter and estimate FDR

Now we have all necessary files to filter our data using the FDR approach:

philosopher filter --razor --pepxml interact-06_CPTAC_TMTS1-NCI7_P_JHUZ_20170509_LUMOS.pep.xml --protxml interact.prot.xml

The filter algorithm can be applied in many different ways, use the --help flag and choose the best method to analyze your data. Scoring results will be shown in the console, and all processed data will be stored in your workspace for further analysis.

7. Perform label-based quantification

Filtered search results can now be quantified. Before quantifying each TMT channel, make a new text file and fill it with the following to tell Philosopher what sample is in each TMT channel:

126 control_1
127N treated_1
127C control_2
128N treated_2
128C control_3
129N treated_3
129C control_4
130N treated_4
130C control_5
131N treated_5

Save this new file as annotation.txt, then run the quantification. (For other types of TMT labeling (6, 11, or 16-plex), use the appropriate --plex value.)

philosopher labelquant --plex 10 --dir . , where the . indicates the current workspace.

8. Report the results

Now we can inspect the results by printing the PSM, peptide, and protein reports: philosopher report

Backup

As an optional last step, backup your data in case you wish to print the reports again later.

philosopher workspace --backup

Concluding remarks

We've demonstrated how to run a complete proteomics analysis with TMT quantification using Philosopher. By providing easy access to advanced analysis software and custom processing algorithms, protein reports can be obtained from LC-MS files in just a few minutes.

You can’t perform that action at this time.