# **MetaboTandem**

# 1. Data import

This Notebook allows to import LC-MS/MS spectra using the R package *xcms*.

## 1.1 Load required libraries

In [8]:
library(xcms)
library(tidyverse)

-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.5     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.6     [32mv[39m [34mdplyr  [39m 1.0.7
[32mv[39m [34mtidyr  [39m 1.1.4     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 2.1.0     [32mv[39m [34mforcats[39m 0.5.1

-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mcollect()[39m    masks [34mxcms[39m::collect()
[31mx[39m [34mdplyr[39m::[32mcombine()[39m    masks [34mMSnbase[39m::combine(), [34mBiobase[39m::combine(), [34mBiocGenerics[39m::combine()
[31mx[39m [34mtidyr[39m::[32mexpand()[39m     masks [34mS4Vectors[39m::expand()
[31mx[39m [34mdplyr[39m::[32mfilter()[39m     masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39

## 1.2 Paths to the data (User inputs required) <====

Enter below the required data according to the description.

### Path to the directory containing the LC-MS/MS spectra
Please put all spectra to be analyzed in the `home/jovyan/work/data` directory. All spectra must be in **.mzXML** format. If data is loaded in a different directory, please enter its path below.

***Important:*** Make sure that the spectra data is in **centroid** mode.

In [3]:
data_dir <- '../../data/'

### Metadata file
Please enter the path of the file with information about the samples.
The first column of the metadata must have the paths to the files for each of the samples.

In [7]:
metadata_file <- '../../data/metadata.csv'

## 1.3 Import data
Importing spectra data to be used in the MetaboTandem pipeline.

In [23]:
# Open metadata
metadata <- read_csv(metadata_file, show_col_types = FALSE)

# Get list of mass spectra files
ms_files <- list.files(data_dir, full.names = TRUE, pattern = '*.mzXML')

# Read data as an `OnDiskMSnExp` object from xcms
data <- readMSData(ms_files, 
                   pdata = new('NAnnotatedDataFrame', metadata),
                   mode = 'onDisk', 
                   verbose = TRUE)

Reading 2522 spectra from file Control-T1.mzXML

Reading 2495 spectra from file Control-TF.mzXML

Reading 2557 spectra from file Inoculum-T1.mzXML

Reading 2538 spectra from file Inoculum-TF.mzXML



Checking that data is correctly load

In [25]:
show(data)

MSn experiment data ("OnDiskMSnExp")
Object size in memory: 3.05 Mb
- - - Spectra data - - -
 MS level(s): 1 2 
 Number of spectra: 10112 
 MSn retention times: 0:00 - 60:03 minutes
- - - Processing information - - -
Data loaded [Mon Feb 14 15:05:33 2022] 
 MSnbase version: 2.18.0 
- - - Meta data  - - -
phenoData
  rowNames: 1 2 3 4
  varLabels: FileName SampleID treatment time
  varMetadata: labelDescription
Loaded from:
  [1] Control-T1.mzXML...  [4] Inoculum-TF.mzXML
  Use 'fileNames(.)' to see all files.
protocolData: none
featureData
  featureNames: F1.S0001 F1.S0002 ... F4.S2538 (10112 total)
  fvarLabels: fileIdx spIdx ... spectrum (35 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'


Checking if data is centroided

In [20]:
unique(fData(data)$centroided)

## 1.4 Exporting data
Saving data as an R object (`.RData`) so you it can be used directly for following analysis

In [29]:
save(data, metadata, file = '../../data/imported_data.RData')