# **MetaboTandem**

# A. Data Pre-processing

# 1. Data import

This Notebook allows to import LC-MS/MS spectra using the R package *xcms*.

## 1.1 Load required libraries

In [1]:
dyn.load("/opt/ohpc/pub/libs/gnu8/openmpi3/hdf5/1.10.5/lib/libhdf5_hl.so.100")
library(xcms)
library(tidyverse)
source('pre_processing_functions.R')

Loading required package: BiocParallel

Loading required package: MSnbase

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'


The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs


The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which.max, which.min


Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Loading required package: mzR

Loading required package: Rcpp

Loading requir

## 1.2 User inputs required <====

Enter below the required data according to the description.
- **data_dir :** Path to the directory containing the data files in `.mzML` or `.mzXML` format. *Important:* Make sure that the spectra data is in **centroid** mode.
- **metadata_file:** Path to the metadatafile containing the paths to each of the spectra files and other sample information

In [2]:
data_dir <- '/xdisk/tfaily/cayalaortiz/arid_ecosystem/data/RP/RP_lab_nat/'
metadata_file <- '/xdisk/tfaily/cayalaortiz/arid_ecosystem/data/metadata_rp.csv'

## 1.3 Import data
Importing spectra data to be used in the MetaboTandem pipeline.

In [3]:
data <- load_spectra_data(data_dir, metadata_file, format = 'mzML')

[1mRows: [22m[34m36[39m [1mColumns: [22m[34m4[39m
[36m--[39m [1mColumn specification[22m [36m--------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): Filename, SampleID, treatment, time

[36mi[39m Use `spec()` to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
Reading 15443 spectra from file RP_S37.mzML

Reading 15442 spectra from file RP_S38.mzML

Reading 15437 spectra from file RP_S39.mzML

Reading 15454 spectra from file RP_S40.mzML

Reading 15453 spectra from file RP_S41.mzML

Reading 15446 spectra from file RP_S42.mzML

Reading 15454 spectra from file RP_S43.mzML

Reading 15443 spectra from file RP_S44.mzML

Reading 15452 spectra from file RP_S45.mzML

Reading 15466 spectra from file RP_S46.mzML

Reading 15472 spectra from file RP_S47.mzML

Reading 15463 spectra from file RP_S48.mzML

Reading 15468 spectra from file RP_S49

Checking that data is correctly load

In [4]:
show(data)

$spectra_data
MSn experiment data ("OnDiskMSnExp")
Object size in memory: 155.92 Mb
- - - Spectra data - - -
 MS level(s): 1 2 
 Number of spectra: 556162 
 MSn retention times: 0:00 - 21:00 minutes
- - - Processing information - - -
Data loaded [Thu Apr 21 09:13:20 2022] 
 MSnbase version: 2.20.4 
- - - Meta data  - - -
phenoData
  rowNames: 1 2 ... 3 (36 total)
  varLabels: Filename SampleID treatment time
  varMetadata: labelDescription
Loaded from:
  [1] RP_S37.mzML...  [36] RP_S72.mzML
  Use 'fileNames(.)' to see all files.
protocolData: none
featureData
  featureNames: F01.S00001 F01.S00002 ... F36.S15430 (556162 total)
  fvarLabels: fileIdx spIdx ... spectrum (35 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'

$metadata
[90m# A tibble: 36 x 4[39m
   Filename                                             SampleID treatment time 
   [3m[90m<chr>[39m[23m                                                [3m[90m<chr>[39m[23m    [3m[90m<ch

### Checking if data is centroided

If data is not centroided it will be transformed

In [5]:
data$spectra_centroid <- centroid_check(data$spectra_data, transform = TRUE)

[1] "Data is not centroided"
[1] "Transforming data"


## 1.4 Exporting data
Saving data as an R object (`.RData`) so you it can be used directly for following analysis

In [6]:
save(data, file = file.path(data_dir, 'imported_data.RData'))