Refactor main ramclust.R function #29

hechth · 2022-08-01T06:57:23Z

The ramclust.R file contains a function covering the whole workflow, but the rc.*.R files actually contain the same functionality in multiple steps, which is more convenient to test and maintain.

Implement a test case using individual steps to build the whole ramclust.R workflow #30
Replace the sections in ramclust.R with the respective sub-steps of the workflow
Implement unit tests for all functions
Include a data-flow diagram and step-wise procedure in the documentation
Group lower-level functions into higher top-level functions

The text was updated successfully, but these errors were encountered:

arpita-007 · 2023-01-17T14:27:28Z

Hi,
I am using flow which you mentioned. But the function 'rc.ramclustr' is showing the following error-

RC_F <- rc.ramclustr(ramclustObj = RC_E, st = NULL,

sr = NULL, maxt = NULL, deepSplit = FALSE, blocksize = 2000,
mult = 5, hmax = NULL, collapse = TRUE,
minModuleSize = 2, linkage = "average",
cor.method = "pearson", rt.only.low.n = TRUE, fftempdir = NULL)
calculating ramclustR similarity: nblocks = 3
1 2 3 RAMClust feature similarity matrix calculated and stored:
RAMClust distances converted to distance object
fastcluster based clustering complete
dynamicTreeCut based pruning complete
RAMClust has condensed 2652 features into 444 spectra
collapsing feature into spectral signal intensities
Error in rc.ramclustr(ramclustObj = RC_E, st = NULL, sr = NULL, maxt = NULL, :
this appears to be an older format ramclustR object and does not have a "phenoData" slot with sample names

If I use the function 'ramclustr', it is asking for xcms object. If I give xcms object, then it is telling me to do the filtering before clustering.
Can you pleaseeeeee help me out!!!!! I am struggling a lot! Any help would be much appreciated.

Thank you!

cbroeckl · 2023-01-17T15:43:41Z

@arpita-007 I think this is an easy fix. It is asking you for phenotype data, which must be missing. you can add phenotype/experimental design data using the defineExperiment function, then feeding that in as an option in the rc.get.xcms.data() function with the ExpDes option.

pheno <- RAMClustR::defineExperiment()
RC <- RAMClustR::rc.get.xcms.data( ExpDes = pheno)
RC <- RAMClustR::rc.ramclustr(ramclustObj = RC)

arpita-007 · 2023-01-18T13:32:45Z

@cbroeckl Thank you so much for responding and for the guidance. Your suggestion worked. I could do the clustering after subtracting blank and normalization. But now I am getting an error in importing the msfinder.formulas.

import.msfinder.formulas(ramclustObj = RC_F, msp.dir = NULL)
Press 1 for .mat or 2 for .msp to continue2
Error in do[[i]] : subscript out of bounds
import.msfinder.formulas(ramclustObj = RC_F, mat.dir = NULL, msp.dir = NULL)
Press 1 for .mat or 2 for .msp to continue1
Error in do[[i]] : subscript out of bounds
import.msfinder.formulas(ramclustObj = RC_F)
Press 1 for .mat or 2 for .msp to continue2
Error in do[[i]] : subscript out of bounds
import.msfinder.formulas(ramclustObj = RC_F, mat.dir = NULL, msp.dir = "C:/Users/DR Pallavi Lab/Documents/spectra/ms/spectra/msp")
Press 1 for .mat or 2 for .msp to continue 2
Error in do[[i]] : subscript out of bounds

Also while exporting the data with exportDataset() function I am getting this-

exportDataset( ramclustObj = RC_G, which.data = "SpecAbund", label.by = "ann", appendFactors = TRUE)
Error in which(row.names(ramclustObj$ExpDes$design) == "fact1name"):(which(row.names(ramclustObj$ExpDes$design) == :
argument of length 0

Thank you in advance!!

cbroeckl · 2023-01-18T15:28:26Z

Did you run MSFinder? You need to run this program manually using the exported .mat files as input, then run import.msfinder.formulas. If MSFinder ran, it should have written directories for each compound which contain formula results which ramclustR imports. At this time there are no R-based tools which perform a comparable set up steps, so we are reliant on running external programs (MSFinder or Sirius are the ones i have used and have import functions for, currently) for the actual MS/MS spectrum annotation.

arpita-007 · 2023-01-20T05:01:24Z

Thank you @cbroeckl!! I will do as you suggested.

arpita-007 · 2023-01-27T10:55:55Z

Hi @cbroeckl

I was using the same flow again for a different experiment and the same error appeared. I did as you suggested but it is not working.

pheno <- RAMClustR::defineExperiment()
RC <- RAMClustR::rc.get.xcms.data(xcmsObj = fill_GRP,

                              taglocation = "pathGRP",

                              MStag = NULL,

                              MSMStag = NULL,

                              ExpDes = pheno,

                              mzdec = 3,

                              ensure.no.na = TRUE)

RC_B <- rc.feature.replace.na(

ramclustObj = RC,
replace.int = 0.1,
replace.noise = 0.1,
replace.zero = TRUE)
replaced 445885 of 1032504 total feature values ( 43 % )

RC_C <- rc.feature.filter.blanks(ramclustObj = RC_B,

                             qc.tag = c("QC", "sample.names.sample_group"),

                             blank.tag = c("Blank", "sample.names.sample_group"),

                             sn = 3, remove.blanks = TRUE)

41.1% of features move forward
df phenoData
ma MSdata
Features which failed to demonstrate signal intensity of at least 3 fold greater in QC samples than in blanks were removed from the feature dataset. 25336 of 43021 features were removed.

RC_D <- rc.feature.normalize.tic(ramclustObj = RC_C)
RC_E <- rc.feature.filter.cv(ramclustObj = RC_D, qc.tag = c("QC", "sample.names.sample_group"),

```
                         max.cv = 0.3)
```

MSdata : 5477 passed the CV filter
Features were filtered based on their qc sample CV values. Only features with CV vaules less than or equal to 0.3 in MSdata set were retained. 12208 of 17685 features were removed.

RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)
calculating ramclustR similarity: nblocks = 6
1 2 3 4 5 6 RAMClust feature similarity matrix calculated and stored:
RAMClust distances converted to distance object
fastcluster based clustering complete
dynamicTreeCut based pruning complete
RAMClust has condensed 5477 features into 851 spectra
collapsing feature into spectral signal intensities
Error in RAMClustR::rc.ramclustr(ramclustObj = RC_E) :
this appears to be an older format ramclustR object and does not have a "phenoData" slot with sample names

I created an experiment design. You were telling about phenotype data. If I am not wrong, phenotype data and phenoData (shown in error) are different.
I am not sure what to do in this case.

Thank you

cbroeckl · 2023-01-27T15:00:59Z

@arpita-007 - what does this show:

RC_F$ExpDes

RC_F$phenoData

fill_GRP@phenoData

the @phenoData slot from the xcms object should be brought to the RAMClustR object - this error suggests that this isn't happening, at least not in the way i anticipated.

arpita-007 · 2023-01-27T15:12:38Z

Then what can be done to bring the phenoData to the RAMClustR object?

cbroeckl · 2023-01-27T15:21:42Z

show me the output of these:

head(RC_F$ExpDes)

head(RC_F$phenoData)

head(fill_GRP@phenoData)

arpita-007 · 2023-01-27T15:35:08Z

RC_F is not yet created because of the error. Here is the RC_E:

head(RC_E$ExpDes)
$design
Value Description
Experiment GRP experiment name, no spaces
Species Homo sapiens species name
Sample Serum sample type
Contributor Arpita individual and/or organizational affiliation
platform LC-MS GC-MS or LC-MS

$instrument
value
chrominst Dionex 3000
msinst Orbitrap fusion
column Acquity HSS T3
solvA Water
solvB Methanol
CE1 30 V
CE2
mstype Orbi
msmode Positive
ionization ESI
colgas Helium
msscanrange 50-1500 Da
conevolt 30 V
MSlevs 2

head(RC_E$phenoData)
sample.names.sample_name sample.names.sample_group filenames
2 A2_QC1 QC A2_QC1.mzML
4 A4_A_1 Sample A4_A_1.mzML
5 A5_A_2 Sample A5_A_2.mzML
7 A7_C_1 Sample A7_C_1.mzML
8 A8_C_2 Sample A8_C_2.mzML
10 B1_D_1 Sample B1_D_1.mzML
filepaths
2 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A2_QC1.mzML
4 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A4_A_1.mzML
5 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A5_A_2.mzML
7 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A7_C_1.mzML
8 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\A8_C_2.mzML
10 C:\Users\Metabolomics\OneDrive\Desktop\Arpita_Mani\GR_raw data\GR_XCMS_pos\B1_D_1.mzML
head(fill_GRP@phenoData

)
An object of class 'NAnnotatedDataFrame'
rowNames: 1 2 ... 6 (6 total)
varLabels: sample_name sample_group
varMetadata: labelDescription
Multiplexing: 1 - Single run

cbroeckl · 2023-01-27T15:41:11Z

what does this return?

is.null(RC_E$phenoData$sample.names)

cbroeckl · 2023-01-27T15:42:23Z

and this:

names(RC_E$phenoData)

arpita-007 · 2023-01-27T15:42:54Z

is.null(RC_E:$phenoData$sample.names)
Error: unexpected '$' in "is.null(RC_E:$"

arpita-007 · 2023-01-27T15:43:16Z

names(RC_E$phenoData)
[1] "sample.names.sample_name" "sample.names.sample_group" "filenames"
[4] "filepaths"

arpita-007 · 2023-01-27T15:43:50Z

I tried this too:

is.null(RC_E:$phenoData$sample.names)
Error: unexpected '$' in "is.null(RC_E:$"
is.null(RC_E:$phenoData$sample.names.sample_name)
Error: unexpected '$' in "is.null(RC_E:$"

cbroeckl · 2023-01-27T15:46:09Z

i think the issue is that the first column of your RC_E$phenoData data frame is supposed to be 'sample.names' but for some reason is isn't. Try this:

names(RC_E$phenoData)[1] <- "sample.names"
RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)

arpita-007 · 2023-01-27T15:53:06Z

Resolved I guess!

names(RC_E$phenoData)[1] <- "sample.names"
RC_F <- RAMClustR::rc.ramclustr(ramclustObj = RC_E)
calculating ramclustR similarity: nblocks = 6
1 2 3 4 5 6 RAMClust feature similarity matrix calculated and stored:
RAMClust distances converted to distance object
fastcluster based clustering complete
dynamicTreeCut based pruning complete
RAMClust has condensed 5477 features into 854 spectra
collapsing feature into spectral signal intensities
RC_F

Call:
fastcluster::hclust(d = tmp.ramclustObj, method = linkage)

Cluster method : average
Distance : RAMClustR
Number of objects: 5477

cbroeckl · 2023-01-27T19:37:09Z

I am not sure why this happened - i will have to some more homework, but this gets you moving forward.

arpita-007 · 2023-01-27T19:39:42Z

@cbroeckl Thanks a lot again :)

arpita-007 · 2023-03-11T12:54:39Z

Sorry to bother you again, but
can you please tell in
rc.get.xcms.data(xcmsObj = fill_GDMHCP,
taglocation = "phenoData[,1]",
MStag = NULL,
MSMStag = NULL,
ExpDes = pheno,
mzdec = 4,
ensure.no.na = FALSE)

what file should be given in MStag?

Thanks

hechth · 2023-03-13T07:34:19Z

@arpita-007 The MStag parameter is not a file - how do you indicate which files are MS1 and which are MS2? Or do only use MS1 data?

arpita-007 · 2023-03-13T07:38:24Z

@hechth We do not have separate files for MS1 and MS2. We use single files for both. Though we have MS2 data written in mgf. format by XCMS, can we use that?

hechth · 2023-03-13T07:45:19Z

@arpita-007 the idea behind RAMClustR is to extract MS1 and MS2 info from the files individually and run XCMS on those and then in the peak alignment step to align the feature tables, representing MS1 and MS2 as different samples.

If you have MS2 data in mgf format from XCMS, can you check if the MS2 data is also contained in the XCMS object used in R?

cbroeckl · 2023-03-20T14:42:25Z

@arpita-007 - if you have only MS1, if i recall you can just leave it as NULL and the processing will proceed appropriately. RAMClustR doesn't currently deal with DDA-like MS/MS data.

arpita-007 · 2023-03-21T19:32:25Z

@hechth I could not locate the XCMS object containing the MS2 data. But as @cbroeckl suggested, I proceeded with MS1 only.
Thanks to both of you for solving all my doubts and making it easier for me.
Thank you :)

arpita-007 · 2023-03-29T06:04:49Z

Hi,
Can you please help me to understand this error? I am getting t his for a particular file only. I ran same code for 3 different mode files (RP pos, RP neg, HILIC pos) but I am seeing this error for my 4th file.

library(RAMClustR)

pheno <- RAMClustR::defineExperiment()
path2 <- file.path("E:/Placenta_final files/RAMClustR_clustering/PHCN_input_clustering_after corr.csv")
path2
[1] "E:/Placenta_final files/RAMClustR_clustering/PHCN_input_clustering_after corr.csv"
RC_PHCN <- ramclustR(ms = path2,

```
             featdelim = "_", 
```
```
             st = 5, 
```
```
             ExpDes = pheno, 
```
```
             sampNameCol = 1)
```
organizing dataset
normalizing dataset
Calculating ramclustR similarity using 3 nblocks.
1 2 3 Error in ramclustObj[startv:stopv] <- column :
replacement has length zero

cbroeckl · 2023-03-29T14:25:32Z

@arpita-007 - can you send me the file you are using as input? cbroeckl at colostate dot edu.

arpita-007 · 2023-03-30T06:13:31Z

PHCN file is giving error while PHCP processed successfully with same codes.

PHCN_input_clustering_after corr.csv
PHCP_input_clustering_after corr.csv

cbroeckl · 2023-03-30T14:37:55Z

@arpita-007 - i think this is a rare event coupled with imperfect code. the file that fails has exactly 2000 features, which happens to be what the default blocksize setting is. try setting the option in the ramclustr function: blocksize = 1200. i suspect it will run fine. let me know if this fixes it please!

arpita-007 · 2023-03-31T05:47:30Z

@cbroeckl Yes, it fixed the issue. Thank you.

hechth · 2023-04-14T11:48:32Z

@cbroeckl thanks for the proposed solution - we will implement a bugfix for that!

hechth · 2023-04-14T11:51:59Z

@arpita-007 and @cbroeckl I think we can maybe close this issue as most things have been adressed and resolved?

I created issues for the things which still have to be taken care of.

Most other things are adressed in the open PR #39

arpita-007 · 2023-04-16T08:44:48Z

@hechth Yes sure. Thank you!

hechth self-assigned this Feb 23, 2023

zargham-ahmad mentioned this issue Mar 6, 2023

Added sample_names attribute and fixed test RECETOX/RAMClustR#33

Merged

hechth mentioned this issue Mar 10, 2023

Refactored ramclustR.R and added a sample_names attribute to the ramclustObj. #39

Merged

hechth mentioned this issue Apr 14, 2023

Fix edge case where number of features coincides with block size #40

Closed

cbroeckl closed this as completed in #39 May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor main ramclust.R function #29

Refactor main ramclust.R function #29

hechth commented Aug 1, 2022 •

edited

Loading

arpita-007 commented Jan 17, 2023

cbroeckl commented Jan 17, 2023

arpita-007 commented Jan 18, 2023

cbroeckl commented Jan 18, 2023

arpita-007 commented Jan 20, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023 •

edited

Loading

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

arpita-007 commented Mar 11, 2023 •

edited

Loading

hechth commented Mar 13, 2023

arpita-007 commented Mar 13, 2023 •

edited

Loading

hechth commented Mar 13, 2023

cbroeckl commented Mar 20, 2023

arpita-007 commented Mar 21, 2023

arpita-007 commented Mar 29, 2023

cbroeckl commented Mar 29, 2023

arpita-007 commented Mar 30, 2023

cbroeckl commented Mar 30, 2023

arpita-007 commented Mar 31, 2023

hechth commented Apr 14, 2023

hechth commented Apr 14, 2023

arpita-007 commented Apr 16, 2023

Refactor main ramclust.R function #29

Refactor main ramclust.R function #29

Comments

hechth commented Aug 1, 2022 • edited Loading

arpita-007 commented Jan 17, 2023

cbroeckl commented Jan 17, 2023

arpita-007 commented Jan 18, 2023

cbroeckl commented Jan 18, 2023

arpita-007 commented Jan 20, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023 • edited Loading

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

cbroeckl commented Jan 27, 2023

arpita-007 commented Jan 27, 2023

arpita-007 commented Mar 11, 2023 • edited Loading

hechth commented Mar 13, 2023

arpita-007 commented Mar 13, 2023 • edited Loading

hechth commented Mar 13, 2023

cbroeckl commented Mar 20, 2023

arpita-007 commented Mar 21, 2023

arpita-007 commented Mar 29, 2023

cbroeckl commented Mar 29, 2023

arpita-007 commented Mar 30, 2023

cbroeckl commented Mar 30, 2023

arpita-007 commented Mar 31, 2023

hechth commented Apr 14, 2023

hechth commented Apr 14, 2023

arpita-007 commented Apr 16, 2023

hechth commented Aug 1, 2022 •

edited

Loading

cbroeckl commented Jan 27, 2023 •

edited

Loading

arpita-007 commented Mar 11, 2023 •

edited

Loading

arpita-007 commented Mar 13, 2023 •

edited

Loading