Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor main ramclust.R function #29

Closed
3 of 5 tasks
hechth opened this issue Aug 1, 2022 · 33 comments · Fixed by #39
Closed
3 of 5 tasks

Refactor main ramclust.R function #29

hechth opened this issue Aug 1, 2022 · 33 comments · Fixed by #39
Assignees

Comments

@hechth
Copy link
Collaborator

hechth commented Aug 1, 2022

The ramclust.R file contains a function covering the whole workflow, but the rc.*.R files actually contain the same functionality in multiple steps, which is more convenient to test and maintain.

image

@arpita-007
Copy link

Hi,
I am using flow which you mentioned. But the function 'rc.ramclustr' is showing the following error-

RC_F <- rc.ramclustr(ramclustObj = RC_E, st = NULL,

  • sr = NULL, maxt = NULL, deepSplit = FALSE, blocksize = 2000,
  • mult = 5, hmax = NULL, collapse = TRUE,
  • minModuleSize = 2, linkage = "average",
  • cor.method = "pearson", rt.only.low.n = TRUE, fftempdir = NULL)
    calculating ramclustR similarity: nblocks = 3
    1 2 3 RAMClust feature similarity matrix calculated and stored:
    RAMClust distances converted to distance object
    fastcluster based clustering complete
    dynamicTreeCut based pruning complete
    RAMClust has condensed 2652 features into 444 spectra
    collapsing feature into spectral signal intensities
    Error in rc.ramclustr(ramclustObj = RC_E, st = NULL, sr = NULL, maxt = NULL, :
    this appears to be an older format ramclustR object and does not have a "phenoData" slot with sample names

If I use the function 'ramclustr', it is asking for xcms object. If I give xcms object, then it is telling me to do the filtering before clustering.
Can you pleaseeeeee help me out!!!!! I am struggling a lot! Any help would be much appreciated.

Thank you!

@cbroeckl
Copy link
Owner

@arpita-007 I think this is an easy fix. It is asking you for phenotype data, which must be missing. you can add phenotype/experimental design data using the defineExperiment function, then feeding that in as an option in the rc.get.xcms.data() function with the ExpDes option.

pheno <- RAMClustR::defineExperiment()
RC <- RAMClustR::rc.get.xcms.data( ExpDes = pheno)
RC <- RAMClustR::rc.ramclustr(ramclustObj = RC)

@arpita-007
Copy link

@cbroeckl Thank you so much for responding and for the guidance. Your suggestion worked. I could do the clustering after subtracting blank and normalization. But now I am getting an error in importing the msfinder.formulas.

import.msfinder.formulas(ramclustObj = RC_F, msp.dir = NULL)
Press 1 for .mat or 2 for .msp to continue2
Error in do[[i]] : subscript out of bounds
import.msfinder.formulas(ramclustObj = RC_F, mat.dir = NULL, msp.dir = NULL)
Press 1 for .mat or 2 for .msp to continue1
Error in do[[i]] : subscript out of bounds
import.msfinder.formulas(ramclustObj = RC_F)
Press 1 for .mat or 2 for .msp to continue2
Error in do[[i]] : subscript out of bounds
import.msfinder.formulas(ramclustObj = RC_F, mat.dir = NULL, msp.dir = "C:/Users/DR Pallavi Lab/Documents/spectra/ms/spectra/msp")
Press 1 for .mat or 2 for .msp to continue 2
Error in do[[i]] : subscript out of bounds

Also while exporting the data with exportDataset() function I am getting this-

exportDataset( ramclustObj = RC_G, which.data = "SpecAbund", label.by = "ann", appendFactors = TRUE)
Error in which(row.names(ramclustObj$ExpDes$design) == "fact1name"):(which(row.names(ramclustObj$ExpDes$design) == :
argument of length 0

Thank you in advance!!

@cbroeckl
Copy link
Owner

Did you run MSFinder? You need to run this program manually using the exported .mat files as input, then run import.msfinder.formulas. If MSFinder ran, it should have written directories for each compound which contain formula results which ramclustR imports. At this time there are no R-based tools which perform a comparable set up steps, so we are reliant on running external programs (MSFinder or Sirius are the ones i have used and have import functions for, currently) for the actual MS/MS spectrum annotation.

@arpita-007
Copy link

Thank you @cbroeckl!! I will do as you suggested.

@arpita-007
Copy link

Hi @cbroeckl

I was using the same flow again for a different experiment and the same error appeared. I did as you suggested but it is not working.

pheno <- RAMClustR::defineExperiment()
RC <- RAMClustR::rc.get.xcms.data(xcmsObj = fill_GRP,

  •                               taglocation = "pathGRP",
    
  •                               MStag = NULL,
    
  •                               MSMStag = NULL,
    
  •                               ExpDes = pheno,
    
  •                               mzdec = 3,
    
  •                               ensure.no.na = TRUE)
    

RC_B <- rc.feature.replace.na(

  • ramclustObj = RC,
  • replace.int = 0.1,
  • replace.noise = 0.1,
  • replace.zero = TRUE)
    replaced 445885 of 1032504 total feature values ( 43 % )

RC_C <- rc.feature.filter.blanks(ramclustObj = RC_B,

  •                              qc.tag = c("QC", "sample.names.sample_group"), 
    
  •                              blank.tag = c("Blank", "sample.names.sample_group"), 
    
  •                              sn = 3, remove.blanks = TRUE)
    

41.1% of features move forward
df phenoData
ma MSdata
Features which failed to demonstrate signal intensity of at least 3 fold greater in QC samples than in blanks were removed from the feature dataset. 25336 of 43021 features were removed.

RC_D <- rc.feature.normalize.tic(ramclustObj = RC_C)
RC_E <- rc.feature.filter.cv(ramclustObj = RC_D, qc.tag = c("QC", "sample.names.sample_group"),

  •                          max.cv = 0.3)
    

MSdata : 5477 passed the CV filter
Features were filtered based on their qc sample CV values. Only features with CV vaules less than or equal to 0.3 in MSdata set were retained. 12208 of 17685 features were removed.

RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)
calculating ramclustR similarity: nblocks = 6
1 2 3 4 5 6 RAMClust feature similarity matrix calculated and stored:
RAMClust distances converted to distance object
fastcluster based clustering complete
dynamicTreeCut based pruning complete
RAMClust has condensed 5477 features into 851 spectra
collapsing feature into spectral signal intensities
Error in RAMClustR::rc.ramclustr(ramclustObj = RC_E) :
this appears to be an older format ramclustR object and does not have a "phenoData" slot with sample names

I created an experiment design. You were telling about phenotype data. If I am not wrong, phenotype data and phenoData (shown in error) are different.
I am not sure what to do in this case.

Thank you

@cbroeckl
Copy link
Owner

@arpita-007 - what does this show:

RC_F$ExpDes

RC_F$phenoData

fill_GRP@phenoData

the @phenoData slot from the xcms object should be brought to the RAMClustR object - this error suggests that this isn't happening, at least not in the way i anticipated.

@arpita-007
Copy link

Then what can be done to bring the phenoData to the RAMClustR object?

@cbroeckl
Copy link
Owner

show me the output of these:

head(RC_F$ExpDes)

head(RC_F$phenoData)

head(fill_GRP@phenoData)

@arpita-007
Copy link

RC_F is not yet created because of the error. Here is the RC_E:

head(RC_E$ExpDes)
$design
Value Description
Experiment GRP experiment name, no spaces
Species Homo sapiens species name
Sample Serum sample type
Contributor Arpita individual and/or organizational affiliation
platform LC-MS GC-MS or LC-MS

$instrument
value
chrominst Dionex 3000
msinst Orbitrap fusion
column Acquity HSS T3
solvA Water
solvB Methanol
CE1 30 V
CE2
mstype Orbi
msmode Positive
ionization ESI
colgas Helium
msscanrange 50-1500 Da
conevolt 30 V
MSlevs 2

head(RC_E$phenoData)
sample.names.sample_name sample.names.sample_group filenames
2 A2_QC1 QC A2_QC1.mzML
4 A4_A_1 Sample A4_A_1.mzML
5 A5_A_2 Sample A5_A_2.mzML
7 A7_C_1 Sample A7_C_1.mzML
8 A8_C_2 Sample A8_C_2.mzML
10 B1_D_1 Sample B1_D_1.mzML
filepaths
2 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A2_QC1.mzML
4 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A4_A_1.mzML
5 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A5_A_2.mzML
7 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A7_C_1.mzML
8 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A8_C_2.mzML
10 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\B1_D_1.mzML
head(fill_GRP@phenoData

  • )
    An object of class 'NAnnotatedDataFrame'
    rowNames: 1 2 ... 6 (6 total)
    varLabels: sample_name sample_group
    varMetadata: labelDescription
    Multiplexing: 1 - Single run

@cbroeckl
Copy link
Owner

cbroeckl commented Jan 27, 2023

what does this return?

is.null(RC_E$phenoData$sample.names)

@cbroeckl
Copy link
Owner

and this:

names(RC_E$phenoData)

@arpita-007
Copy link

is.null(RC_E:$phenoData$sample.names)
Error: unexpected '$' in "is.null(RC_E:$"

@arpita-007
Copy link

names(RC_E$phenoData)
[1] "sample.names.sample_name" "sample.names.sample_group" "filenames"
[4] "filepaths"

@arpita-007
Copy link

I tried this too:

is.null(RC_E:$phenoData$sample.names)
Error: unexpected '$' in "is.null(RC_E:$"
is.null(RC_E:$phenoData$sample.names.sample_name)
Error: unexpected '$' in "is.null(RC_E:$"

@cbroeckl
Copy link
Owner

i think the issue is that the first column of your RC_E$phenoData data frame is supposed to be 'sample.names' but for some reason is isn't. Try this:

names(RC_E$phenoData)[1] <- "sample.names"
RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)

@arpita-007
Copy link

Resolved I guess!

names(RC_E$phenoData)[1] <- "sample.names"
RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)
calculating ramclustR similarity: nblocks = 6
1 2 3 4 5 6 RAMClust feature similarity matrix calculated and stored:
RAMClust distances converted to distance object
fastcluster based clustering complete
dynamicTreeCut based pruning complete
RAMClust has condensed 5477 features into 854 spectra
collapsing feature into spectral signal intensities
RC_F

Call:
fastcluster::hclust(d = tmp.ramclustObj, method = linkage)

Cluster method : average
Distance : RAMClustR
Number of objects: 5477

@cbroeckl
Copy link
Owner

I am not sure why this happened - i will have to some more homework, but this gets you moving forward.

@arpita-007
Copy link

@cbroeckl Thanks a lot again :)

@arpita-007
Copy link

arpita-007 commented Mar 11, 2023

Sorry to bother you again, but
can you please tell in
rc.get.xcms.data(xcmsObj = fill_GDMHCP,
taglocation = "phenoData[,1]",
MStag = NULL,
MSMStag = NULL,
ExpDes = pheno,
mzdec = 4,
ensure.no.na = FALSE)

what file should be given in MStag?

Thanks

@hechth
Copy link
Collaborator Author

hechth commented Mar 13, 2023

@arpita-007 The MStag parameter is not a file - how do you indicate which files are MS1 and which are MS2? Or do only use MS1 data?

@arpita-007
Copy link

arpita-007 commented Mar 13, 2023

@hechth We do not have separate files for MS1 and MS2. We use single files for both. Though we have MS2 data written in mgf. format by XCMS, can we use that?

@hechth
Copy link
Collaborator Author

hechth commented Mar 13, 2023

@arpita-007 the idea behind RAMClustR is to extract MS1 and MS2 info from the files individually and run XCMS on those and then in the peak alignment step to align the feature tables, representing MS1 and MS2 as different samples.

If you have MS2 data in mgf format from XCMS, can you check if the MS2 data is also contained in the XCMS object used in R?

@cbroeckl
Copy link
Owner

@arpita-007 - if you have only MS1, if i recall you can just leave it as NULL and the processing will proceed appropriately. RAMClustR doesn't currently deal with DDA-like MS/MS data.

@arpita-007
Copy link

@hechth I could not locate the XCMS object containing the MS2 data. But as @cbroeckl suggested, I proceeded with MS1 only.
Thanks to both of you for solving all my doubts and making it easier for me.
Thank you :)

@arpita-007
Copy link

Hi,
Can you please help me to understand this error? I am getting t his for a particular file only. I ran same code for 3 different mode files (RP pos, RP neg, HILIC pos) but I am seeing this error for my 4th file.

library(RAMClustR)

pheno <- RAMClustR::defineExperiment()
path2 <- file.path("E:/Placenta_final files/RAMClustR_clustering/PHCN_input_clustering_after corr.csv")
path2
[1] "E:/Placenta_final files/RAMClustR_clustering/PHCN_input_clustering_after corr.csv"
RC_PHCN <- ramclustR(ms = path2,

  •              featdelim = "_", 
    
  •              st = 5, 
    
  •              ExpDes = pheno, 
    
  •              sampNameCol = 1)
    
    organizing dataset
    normalizing dataset
    Calculating ramclustR similarity using 3 nblocks.
    1 2 3 Error in ramclustObj[startv:stopv] <- column :
    replacement has length zero

@cbroeckl
Copy link
Owner

@arpita-007 - can you send me the file you are using as input? cbroeckl at colostate dot edu.

@arpita-007
Copy link

PHCN file is giving error while PHCP processed successfully with same codes.

PHCN_input_clustering_after corr.csv
PHCP_input_clustering_after corr.csv

@cbroeckl
Copy link
Owner

@arpita-007 - i think this is a rare event coupled with imperfect code. the file that fails has exactly 2000 features, which happens to be what the default blocksize setting is. try setting the option in the ramclustr function: blocksize = 1200. i suspect it will run fine. let me know if this fixes it please!

@arpita-007
Copy link

@cbroeckl Yes, it fixed the issue. Thank you.

@hechth
Copy link
Collaborator Author

hechth commented Apr 14, 2023

@cbroeckl thanks for the proposed solution - we will implement a bugfix for that!

@hechth
Copy link
Collaborator Author

hechth commented Apr 14, 2023

@arpita-007 and @cbroeckl I think we can maybe close this issue as most things have been adressed and resolved?

I created issues for the things which still have to be taken care of.

Most other things are adressed in the open PR #39

@arpita-007
Copy link

@hechth Yes sure. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants