Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

philosopher pipeline error at the step "Running TMT-Integrator" #386

Closed
JasonYJWei opened this issue Oct 24, 2022 · 19 comments
Closed

philosopher pipeline error at the step "Running TMT-Integrator" #386

JasonYJWei opened this issue Oct 24, 2022 · 19 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@JasonYJWei
Copy link

The error information is as following:

tail -n 20 philosopher.PipelineModel.log
time="08:25:44" level=info msg="Restoring peptide results"
time="08:27:42" level=info msg="collecting data from individual experiments"
time="08:27:45" level=info msg="summarizing the quantification"
time="08:34:00" level=info msg="Processing combined file"
time="08:35:35" level=info msg="Converged to 1.01 % FDR with 12262 Proteins" decoy=124 threshold=0.9974 total=12386
time="08:36:28" level=info msg="Restoring protein results"
time="08:42:22" level=info msg="Processing spectral counts"
time="08:43:31" level=info msg="Processing peptide counts"
time="08:45:03" level=info msg="Processing intensities"
time="08:45:14" level=info msg="Running TMT-Integrator"
TMT-Integrator v4.0.2
Exception in thread "main" java.io.FileNotFoundException: /philosopher_workspace/Quantification.PipelineModel/philosopher.yml (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.(FileInputStream.java:157)
at java.base/java.io.FileInputStream.(FileInputStream.java:112)
at java.base/java.io.FileReader.(FileReader.java:60)
at TMTIntegrator.LoadParam(TMTIntegrator.java:174)
at TMTIntegrator.main(TMTIntegrator.java:29)
time="08:45:15" level=info msg=Done

Actually, my workspace was set up as follows:
├── 20CPTAC_HNSCC_Proteome_JHU_20190809
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f01.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f01.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f02.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f02.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f03.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f03.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f04.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f04.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f05.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f05.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f06.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f06.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f07.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f07.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f08.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f08.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f09.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f09.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f10.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f10.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f11.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f11.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f12.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f12.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f13.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f13.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f14.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f14.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f15.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f15.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f16.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f16.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f17.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f17.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f18.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f18.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f19.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f19.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f20.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f20.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f21.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f21.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f22.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f22.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f23.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f23.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f24.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_f24.pepXML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_fA.mzML
│   ├── 20CPTAC_HNSCC_W_JHU_20190809_LUMOS_fA.pepXML
│   ├── annotation.txt
│   ├── interact_2.png
│   ├── interact_3.png
│   ├── interact_4.png
│   ├── interact_5.png
│   ├── interact_6.png
│   ├── interact_7.png
│   ├── interact.pep.xml
│   ├── ion.tsv
│   ├── peptide.tsv
│   ├── protein.fas
│   ├── protein.tsv
│   └── psm.tsv
├── bin
│   ├── MSFragger-3.5.jar
│   ├── philosopher
│   └── TMTIntegrator_v4.0.2.jar
├── combined_peptide.tsv
├── combined.pep.xml
├── combined_protein.tsv
├── combined.prot.xml
├── database
│   ├── 2022-10-12-decoys-contam-uniprot.RMduplicate.RMasterisk.fa.fas
│   └── 2022-10-12-decoys-contam-uniprot.RMduplicate.RMasterisk.fa.fas.1.pepindex
├── params
│   └── philosopher.S054.yml

And the parameters setting for TMTIntegrator are as following:
tail -n 29 params/philosopher.S054.yml

Integrated Isobaric Quantification: # TMT-Integrator v3.2.0
path: bin/TMTIntegrator_v4.0.2.jar # path to TMT-Integrator jar
memory: 256 # memory allocation, in Gb
output: # the location of output files
channel_num: 11 # number of channels in the multiplex (e.g. 10, 11)
ref_tag: pooled sample # unique tag for identifying the reference channel (Bridge sample added to each multiplex)
groupby: -1 # level of data summarization(0: PSM aggregation to the gene level; 1: protein; 2: peptide sequence; 3: PTM site; -1: generate reports at all levels)
psm_norm: false # perform additional retention time-based normalization at the PSM level
outlier_removal: true # perform outlier removal
prot_norm: -1 # normalization (0: None; 1: MD (median centering); 2: GN (median centering + variance scaling); -1: generate reports with all normalization options)
min_pep_prob: 0.9 # minimum PSM probability threshold (in addition to FDR-based filtering by Philosopher)
min_purity: 0.5 # ion purity score threshold
min_percent: 0.05 # remove low intensity PSMs (e.g. value of 0.05 indicates removal of PSMs with the summed TMT reporter ions intensity in the lowest 5% of all PSMs)
unique_pep: false # allow PSMs with unique peptides only (if true) or unique plus razor peptides (if false), as classified by Philosopher and defined in PSM.tsv files
unique_gene: 0 # additional, gene-level uniqueness filter (0: allow all PSMs; 1: remove PSMs mapping to more than one GENE with evidence of expression in the dataset; 2:remove all PSMs mapping to more than one GENE in the fasta file)
best_psm: true # keep the best PSM only (highest summed TMT intensity) among all redundant PSMs within the same LC-MS run
prot_exclude: none # exclude proteins with specified tags at the beginning of the accession number (e.g. none: no exclusion; sp|,tr| : exclude protein with sp| or tr|)
allow_overlabel: true # allow PSMs with TMT on S (when overlabeling on S was allowed in the database search)
allow_unlabeled: true # allow PSMs without TMT tag or acetylation on the peptide n-terminus
mod_tag: none # PTM info for generation of PTM-specific reports (none: for Global data; S[167],T[181],Y[243]: for Phospho; K[170]: for K-Acetyl)
min_site_prob: -1 # site localization confidence threshold (-1: for Global; 0: as determined by the search engine; above 0 (e.g. 0.75): PTMProphet probability, to be used with phosphorylation only)
ms1_int: true # use MS1 precursor ion intensity (if true) or MS2 summed TMT reporter ion intensity (if false) as part of the reference sample abundance estimation
top3_pep: true # use top 3 most intense peptide ions as part of the reference sample abundance estimation
print_RefInt: true # print individual reference sample abundance estimates for each multiplex in the final reports (in addition to the combined reference sample abundance estimate)
add_Ref: -1 # add an artificial reference channel if there is no reference channel (-1: don't add the reference; 0: use summation as the reference; 1: use average as the reference; 2: use median as the reference)
max_pep_prob_thres: 0 # the threshold for maximum peptide probability
min_ntt: 0 # minimum allowed number of enzymatic termini
aggregation_method: 0 # the aggregation method from the PSM level to the specified level (0: median, 1: weighted-ratio)

Do you think I need to copy the "params/philosopher.S054.yml" file to the workspace root path? Thank you.

@prvst prvst self-assigned this Oct 24, 2022
@prvst prvst added the help wanted Extra attention is needed label Oct 24, 2022
@prvst
Copy link
Collaborator

prvst commented Oct 24, 2022

I believe the error happens because TMT-I is trying to locate a file called philosopher.yml,and your file is called philosopher.S054.yml. TMT-I must have the name hardcoded.

@JasonYJWei
Copy link
Author

I believe the error happens because TMT-I is trying to locate a file called philosopher.yml,and your file is called philosopher.S054.yml. TMT-I must have the name hardcoded.

So, you mean I need to keep the config file in the params directory named as "philosopher.yml", not a customer name. And there's no need to create a copy in the root path. Am I right?

If that's the case, let me try again to see if we can fix the issue. Thanks a lot.

@JasonYJWei
Copy link
Author

I believe the error happens because TMT-I is trying to locate a file called philosopher.yml,and your file is called philosopher.S054.yml. TMT-I must have the name hardcoded.

So, you mean I need to keep the config file in the params directory named as "philosopher.yml", not a customer name. And there's no need to create a copy in the root path. Am I right?

If that's the case, let me try again to see if we can fix the issue. Thanks a lot.

Since I have run all the previous steps. Can I simply test it with the following parameters:
Steps:
Database Search: no
Peptide Validation: no
PTM Localization: no
Protein Inference: no
Label-Free Quantification: no
Isobaric Quantification: no
Bio Cluster Quantification: no
FDR Filtering: no
Individual Reports: no
Integrated Reports: no
Integrated Isobaric Quantification: yes

@JasonYJWei
Copy link
Author

As just tested, it still report the same error:
time="12:13:47" level=info msg="Running TMT-Integrator"
TMT-Integrator v4.0.2
Exception in thread "main" java.io.FileNotFoundException: /philosopher_workspace/Quantification.PipelineModel/philosopher.yml (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.(FileInputStream.java:157)
at java.base/java.io.FileInputStream.(FileInputStream.java:112)
at java.base/java.io.FileReader.(FileReader.java:60)
at TMTIntegrator.LoadParam(TMTIntegrator.java:174)
at TMTIntegrator.main(TMTIntegrator.java:29)
time="12:13:48" level=info msg=Done

@JasonYJWei
Copy link
Author

Additionally, if I copy the renamed config file "params/philosopher.yml" to the workspace root path. And run philosopher pipeline for only the last step "Integrated Isobaric Quantification". It reports the following error.

time="12:17:10" level=info msg="Running TMT-Integrator"
TMT-Integrator v4.0.2
TMT-Integrator can't find the reference channel. Please check if the reference tag is correctly defined in the parameter file.
time="12:17:10" level=info msg=Done

Seems we can't run only the last step, or maybe there's some other wrong operations.

@prvst
Copy link
Collaborator

prvst commented Oct 24, 2022

Since TMT-I is your only missing step, you can just run it directly without philosopher.

@JasonYJWei
Copy link
Author

Since TMT-I is your only missing step, you can just run it directly without philosopher.

As you know, CPTAC S054 contains 20 experiments (25 fractions per experiment), it's easy to run TMT-I for each of the fractions. So, it's very appreciate if you can show me how to generate a merged quantification table with the 500 separated "protein.tsv" files. Thanks.

@prvst
Copy link
Collaborator

prvst commented Oct 24, 2022

Sorry, I don't understand. TMT-I does not use protein tables.

@JasonYJWei
Copy link
Author

Sorry, I don't understand. TMT-I does not use protein tables.

Right now, I can run only one fraction at one time. And I have all the 500 fractions' quantification results. But I don't know how to do the quantification will all the S054 fractions. Or if there's method to combine the 500 fractions' quantification results together.

@prvst
Copy link
Collaborator

prvst commented Oct 24, 2022

@JasonYJWei if you're having difficulties following the tutorials from the Wiki, I suggest you try using Fragpipe, with the graphical interface

@JasonYJWei
Copy link
Author

@JasonYJWei if you're having difficulties following the tutorials from the Wiki, I suggest you try using Fragpipe, with the graphical interface

So, you don't think this bug can be fixed easily?

@prvst
Copy link
Collaborator

prvst commented Oct 24, 2022

Can you be more specific? I asked you to run TMT-I directly; you said that you dont know how to do the quantification, these are two distinct steps. I suggest you check the tutorials and see the examples in there.

@JasonYJWei
Copy link
Author

Can you be more specific? I asked you to run TMT-I directly; you said that you dont know how to do the quantification, these are two distinct steps. I suggest you check the tutorials and see the examples in there.

Before you asked to try run TMT-I directly, I have already do the step-by-step TMT analysis. In that case I have get the quantification for each of the fractions. But I want to know if I can get a result table that contain all the fractions' quantification together. I'm sorry that you may not understand my question clearly. As you know philoshopher / fragpipe can do the pipeline analysis together with all the fractions as the input together, and give the merged result. So, I'm just run the pipeline analysis. But have the bugs at the last step. That's why I'm here to open this issue.

@prvst
Copy link
Collaborator

prvst commented Oct 24, 2022

If you have multiple fractions, you should already have them processed together them as a single experiment, you'll have report tables in the TSV format inside the directory. These tables already have the quantification of all fractions.

@JasonYJWei
Copy link
Author

JasonYJWei commented Oct 24, 2022 via email

@prvst
Copy link
Collaborator

prvst commented Oct 25, 2022

To merge different experiments, you need to run TMT-Integrator

@JasonYJWei
Copy link
Author

To merge different experiments, you need to run TMT-Integrator

So, that's what I'm here. The last step when running philosopher pipeline was failed caused by can't find the file "philosopher.yml", which I have placed in the directory "params". Could you please help to fix this issue?

@prvst
Copy link
Collaborator

prvst commented Oct 25, 2022

The easiest option you have is to run TMT-I manually, please follow the tutorials: https://github.com/Nesvilab/TMT-Integrator

@JasonYJWei
Copy link
Author

The easiest option you have is to run TMT-I manually, please follow the tutorials: https://github.com/Nesvilab/TMT-Integrator

Got it. I'll try to run TMT-I manually with all the psm.tsv output by philosopher. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants