/
creating-backend-classes.Rmd
1730 lines (1388 loc) · 60 KB
/
creating-backend-classes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Creating new `ChromBackend` classes for Chromatograms"
output:
BiocStyle::html_document:
toc_float: true
vignette: >
%\VignetteIndexEntry{Creating new `ChromBackend` class for Chromatograms}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignettePackage{Chromatograms}
%\VignetteDepends{Chromatograms,BiocStyle,S4Vectors,IRanges}
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Package**: `r Biocpkg("Chromatograms")`<br />
**Authors**: `r packageDescription("Chromatograms")[["Author"]] `<br />
**Compiled**: `r date()`
```{r, echo = FALSE, message = FALSE}
library(Chromatograms)
library(BiocStyle)
```
# Introduction
Similar to the `r Biocpkg("Spectra")` package, the `r Biocpkg("Chromatograms")`
also separates the user-faced functionality to process and analyze
chromatographic mass spectrometry (MS) data from the code for storage and
*representation* of the data. The latter functionality is provided by
implementations of the `ChromBackend` class, further on called *backends*. This
vignette describes the `ChromBackend` class and illustrates on a simple example
how a backend extending this class could be implemented.
Contributions to this vignette (content or correction of typos) or requests for
additional details and information are highly welcome, ideally *via* pull
requests or *issues* on the package's [github repository](https://github.com/RforMassSpectrometry/Chromatograms).
# What is a `ChromBackend`?
The purpose of a backend class extending the virtual `ChromBackend` is to
provide the chromatographic MS data to the `Chromatograms` object, which is used
by the user to interact with - and analyze the data. The `ChromBackend` defines
the API that new backends need to provide so that they can be used with
`Chromatograms`. This API defines a set of methods to access the data. For many
functions default implementations exist and a dedicated implementation for a new
backend is only needed if necessary (e.g. if the data is stored in a way that a
different access to it would be better). In addition, a core set of variables
(data fields), the so called *core* chromatogram variables, is defined to
describe the chromatographic data. Each backend needs to provide these, but can
in also define additional data fields. Before implementing a new backend it is
highly suggested to carefully read the following *Conventions and definitions*
section.
## Conventions and definitions
General conventions for chromatographic MS data of a `Chromatograms` are:
- One `Chromatograms` object is designed to contain multiple chromatographic
data (not data from a single chromatogram).
- retention time values within each chromatogram are expected to be sorted
increasingly.
- Missing values (`NA`) for retention time values are not supported.
- Properties (data fields) of a spectrum are called *chromatogram
variables*. While backends can define their own properties, a minimum required
set of chromatogram variables **must** be provided by each backend (even if
their values are empty). These *core chromatogram variables* are listed (along
with their expected data type) by the `coreChromVariables()` function.
- `dataStorage` and `dataOrigin` are two special variables that define
for each chromatogram where the data is (currently) stored and from where the
data derived, respectively. Both are expected to be of
type`character`. Missing values for `dataStorage` are not allowed.
- `ChromBackend` implementations can also represent purely *read-only* data
resources. In this case only data accessor methods need to be implemented but
not data replacement methods (i.e. `<-` methods that would allow to add or set
variables. Read-only backends should implement the `isReadOnly()` method, that
should then return `TRUE`. Note that backends for purely read-only resources
could also implement a *caching* mechanism to (temporarily) store changes to
the data locally within the object (and hence in memory). See information on
the `MsBackendCached` in the `r Biocpkg("Spectra")` package for more details.
## Notes on parallel and chunk-wise processing
For parallel processing, `Chromatograms` splits the backend based on a defined
`factor` and processes each in parallel (or *in serial* if a `SerialParam` is
used). The splitting `factor` can be defined for `Chromatograms` by setting the
parameter `processingChunkSize`. Alternatively, through the
`backendParallelFactor()` method the backend can also *suggest* a `factor` that
should/could be used for splitting and parallel processing. The default
implementation for `backendParallelFactor()` is to return an empty `factor`
(`factor()`) hence not suggesting any preferred splitting.
Besides parallel processing, for on-disk backends (i.e., backends that don't
keep all of the data in memory), this chunk-wise processing can also reduce the
memory demand for operations, because only the peak data of the current chunk
needs to be realized in memory.
# API
The `ChromBackend` class defines core methods that have to be implemented by a
MS *backend* as well as *optional* methods for which a default implementation is
already available. These functions are described in sections *Required methods*
and *Optional methods*, respectively.
To create a new backend a class extending the virtual `ChromBackend` needs to be
implemented. In the example below we create thus a simple class with a
`data.frame` for general properties (*chromatogram variables*) and two slots for
the retention time and intensity values, representing the actual chromatographic
MS data. We store these values as `list`, each list element representing values
for one chromatogram, since the number of values (*peaks*) can be different
between chromatograms. We also define a simple constructor function that returns
an empty instance of our new class.
```{r, message = FALSE}
library(Chromatograms)
#' Definition of the backend class extending ChromBackend
setClass("ChromBackendTest",
contains = "ChromBackend",
slots = c(
chromVars = "data.frame",
rtime = "list",
intensity = "list"
),
prototype = prototype(
spectraVars = data.frame(),
rtime = list(),
intensity = list()
))
#' Simple constructor function
ChromBackendTest <- function() {
new("ChromBackendTest")
}
```
The 3 slots `@chromVars`, `@rtime` and `@intensity` will be used to store our MS
data: each row in `chromVars` will contain data for one chromatogram with the
columns being the different *chromatogram variables* (i.e. additional properties
of a chromatogram such as its m/z value or MS level) and each element in
`@rtime` and `@intensity` a `numeric` vector with the retention times and
intensity values representing thus the *peaks* data of the respective
chromatogram. This is only one of the possibly many ways chromatographic data
might be represented.
We should ideally also add some basic validity function that ensures the data to
be correct (valid). The function below simply checks that the number of rows of
the `@chromVars` slot matches the length of the `@rtime` and `@intensity` slots.
```{r, message = FALSE}
#' Basic validation function
setValidity("ChromBackendTest", function(object) {
if (length(object@rtime) != length(object@intensity) ||
length(object@rtime) != nrow(object@chromVars))
return("length of 'rtime' and 'intensity' has to match the number of ",
"rows of 'chromVars'")
NULL
})
```
We can now create an instance of our new class with the `ChromBackendTest()`
function.
```{r}
#' Create an empty instance of ChromBackendTest
be <- ChromBackendTest()
be
```
A `show()` method would allow for a more convenient way how general information
of our object is displayed. Below we add an implementation of the `show()`
method.
```{r}
#' implementation of show for ChromBackendTest
setMethod("show", "ChromBackendTest", function(object) {
cd <- object@chromVars
cat(class(object), "with", nrow(cd), "chromatograms\n")
})
be
```
## Required methods
Methods listed in this section **must** be implemented for a new class extending
`ChromBackend`. Methods should ideally also be implemented in the order they are
listed here. Also, it is strongly advised to write dedicated unit tests for each
newly implemented method or function already **during** the development.
### `dataStorage()`
The `dataStorage` chromatogram variable provides information how or where the
data is stored. The `dataStorage()` method should therefore return a `character`
vector of length equal to the number of chromatograms that are represented by
the object. The values for `dataStorage` can be any character value, except
`NA`. For our example backend we define a simple `dataStorage()` method that
simply returns the column `"dataStorage"` from the `@chromVars` (as a
`character`).
```{r}
#' dataStorage method to provide information *where* data is stored
setMethod("dataStorage", "ChromBackendTest", function(object) {
as.character(object@chromVars$dataStorage)
})
```
Calling `dataStorage()` on our example backend will thus return an empty
`character` (since the object created above does not contain any data).
```{r}
dataStorage(be)
```
### `length()`
`length()` is expected to return an `integer` of length 1 with the total number
of chromatograms that are represented by the backend. For our example backend we
simply return the number of rows of the `data.frame` stored in the `@chromVars`
slot.
```{r}
#' length to provide information on the number of chromatograms
setMethod("length", "ChromBackendTest", function(x) {
nrow(x@chromVars)
})
length(be)
```
### `backendInitialize()`
The `backendInitialize()` method is expected to be called after creating an
instance of the backend class and should prepare (initialize) the backend with
data. This method can take any parameters needed by the backend to get
loaded/initialized with data (which can be file names from which to load the
data, a database connection or object(s) containing the data). During
`backendInitialize()` it is also suggested to set the special spectra variables
`dataStorage` and `dataOrigin` are set.
Below we define a `backendInitialize()` method that takes as arguments a
`data.frame` with spectra variables and two `list`s with the retention time and
intensity values for each spectrum.
```{r}
#' backendInitialize method to fill the backend with data.
setMethod(
"backendInitialize", "ChromBackendTest",
function(object, chromVars, rtime, intensity) {
if (!is.data.frame(chromVars))
stop("'chromVars' needs to be a 'data.frame' with the general",
"chromatogram variables")
## Defining dataStorage and dataOrigin, if not available
if (is.null(chromVars$dataStorage))
chromVars$dataStorage <- "<memory>"
if (is.null(chromVars$dataOrigin))
chromVars$dataOrigin <- "<user provided>"
object@chromVars <- chromVars
object@rtime <- rtime
object@intensity <- intensity
validObject(object)
object
})
```
In addition to adding the data to object, the function also defined the
`dataStorage` and `dataOrigin` spectra variables. The purpose of these two
variables is to provide some information on where the data is currently stored
(*in memory* as in our example) and from where the data is originating.
We can now create an instance of our backend class and fill it with data. We
thus first define our MS data and pass this to the `backendInitialize()` method.
```{r}
#' A data.frame with chromatogram variables.
cvars <- data.frame(msLevel = c(1L, 1L, 1L),
mz = c(112.2, 123.3, 134.4))
#' retention time values for each chromatogram.
rts <- list(c(12.4, 12.8, 13.2, 14.6),
c(45.1, 46.2),
c(64.4, 64.8, 65.2))
#' intensity values for each chromatogram.
ints <- list(c(123.3, 153.6, 2354.3, 243.4),
c(100, 80.1),
c(12.3, 135.2, 100))
#' Create and initialize the backend
be <- backendInitialize(ChromBackendTest(),
chromVars = cvars, rtime = rts, intensity = ints)
be
```
While this method works and is compliant with the `MsBackend` API (because there
is no requirement on the input parameters for the `backendInitialize()` method),
it would be good practice for backends to support an additional parameter `data`
that would allow passing the *complete* MS data (including retention time and
intensity values) to the function as a `DataFrame`. This would simplify the
implementation of some replacement methods and would in addition also allow to
change the backend of a `Chromatograms` using the `setBackend()` function to our
new backend. Also, it is highly suggested to check the validity of the input
data within the initialize method. The advantage of performing these validity
checks in `backendInitialize()` over adding them with `setValidity()` is that
eventually computationally expensive operations/checks would only performed
once instead of each time values within the object are changed (e.g. by
subsetting or similar), which would be the case with validation functionality
registered with `setValidity()`.
We thus re-implement the `backendInitialize()` method supporting also the `data`
parameter mentioned above and add additional validity checks. These validity
checks verify that only numeric values are provided with `rtime` and
`intensity`, that the number of retention time and intensity values matches for
each chromatogram. We also use the `validChromData()` function that checks that
provided core chromatogram variables have the correct data type.
```{r}
#' Reimplementation of backendInitialize with a `data` parameter and
#' additional input validation
setMethod(
"backendInitialize", "ChromBackendTest",
function(object, chromVars, rtime, intensity, data) {
## Extract relevant information from a parameter `data` if provided
if (!missing(data)) {
chromVars <- as.data.frame(
data[, !colnames(data) %in% c("rtime", "intensity")])
if (any(colnames(data) == "rtime"))
rtime <- data$rtime
if (any(colnames(data) == "intensity"))
intensity <- data$intensity
}
## Check that provided variables have the correct data type
validChromData(chromVars)
n <- nrow(chromVars)
## Validate rtime and intensity
if (missing(rtime))
rtime <- vector("list", n)
if (missing(intensity))
intensity <- vector("list", n)
if (length(rtime) != length(intensity) || length(rtime) != n)
stop("lengths of 'rtime' and 'intensity' need to match the ",
"number of chromatograms (i.e., nrow of 'chromVars'")
if (any(lengths(rtime) != lengths(intensity)))
stop("the number of data values in 'rtime' and 'intensity' have ",
"to match")
if (!all(vapply(rtime, is.numeric, logical(1))))
stop("'rtime' has to be a list of numeric values")
if (!all(vapply(intensity, is.numeric, logical(1))))
stop("'intensity' has to be a list of numeric values")
## If rtime or itensity is of type NumericList convert to list
if (inherits(rtime, "NumericList"))
rtime <- as.list(rtime)
if (inherits(intensity, "NumericList"))
intensity <- as.list(intensity)
## Setting dataStorage and dataOrigin
chromVars$dataStorage <- rep("<memory>", n)
if (is.null(chromVars$dataOrigin))
chromVars$dataOrigin <- rep("<user provided>", n)
## Fill object with data
object@chromVars <- as.data.frame(chromVars)
object@rtime <- rtime
object@intensity <- intensity
validObject(object)
object
})
```
This extended `backendInitialize()` implementation would now also assure data
validity and integrity. Below we use this function again to create our backend
instance.
```{r}
#' Create and initialize the backend
be <- backendInitialize(ChromBackendTest(),
chromVars = cvars, rtime = rts,
intensity = ints)
be
```
The `backendInitialize()` method that we implemented for our backend class
expects the user to provide the full MS data. It would alternatively also be
possible to implement a method that takes data file names as input from which
the function can then import the data. The purpose of the `backendInitialize()`
method is to *initialize* and prepare the data in a way that it can be accessed
by a `Chromatograms` object. Whether the data is actually loaded into memory or
simply referenced and loaded upon request does not matter as long as the backend
is able to provide the data though its accessor methods when requested by the
`Chromatograms` object.
### `chromVariables()`
The `chromVariables()` method should return a `character` vector with the names
of all available chromatogram variables of the backend. While a backend class
should support defining and providing their own variables, each `ChromBackend`
class **must** provide also the *core chromatogram variables* (in the correct
data type). These can be listed by the `coreChromVariables()` function:
```{r}
#' List core chromatogram variables along with data types.
coreChromVariables()
```
A typical `chromVariables()` method for a `ChromBackend` class will thus be
implemented similarly to the one for our `ChromBackendTest` test backend: it
will return the union of the core chromatogram variables and the names for all
available spectra variables within the backend object.
```{r}
#' Accessor for available chromatogram variables
setMethod("chromVariables", "ChromBackendTest", function(object) {
union(names(coreChromVariables()), colnames(object@chromVars))
})
chromVariables(be)
```
### `chromData()`
The `chromData` method should return the **full** chromatogram data within a
backend as a `DataFrame` object (defined in the `r Biocpkg("S4Vectors")`
package). A parameter `columns` should allow to define the names of the
variables that should be returned. Each row in this data frame should represent
one chromatogram, each column a chromatogram variable. Columns `"rtime"` and
`"intensity"` (if requested) have to contain each a `NumericList` with the
retention time and intensity values of the chromatograms. The `DataFrame`
**must** provide values (even if they are `NA`) for **all** requested spectra
variables of the backend (**including** the core chromatogram variables). The
`fillCoreChromVariables()` function from the *Chromatograms* package allows to
*complete* (fill) a provided `data.frame` with eventually missing core
chromatogram variables (columns):
```{r}
#' Get the data.frame with the available chrom variables
be@chromVars
#' Complete this data.frame with missing core variables
fillCoreChromVariables(be@chromVars)
```
We can thus use this function to add eventually missing core chromatogram
variables in the `chromData` implementation for our backend:
```{r}
#' function to extract the full chrom data; we would need to import the
#' `DataFrame()` function from the S4Vectors package and the `NumericList`
#' from the IRanges package.
setMethod(
"chromData", "ChromBackendTest",
function(object, columns = chromVariables(object)) {
if (!all(columns %in% chromVariables(object)))
stop("Some of the requested variables are not available")
res <- S4Vectors::DataFrame(object@chromVars)
## Add rtime and intensity values to the result; would need to
## import the `NumericList()` function from the IRanges package
res$rtime <- IRanges::NumericList(object@rtime, compress = FALSE)
res$intensity <- IRanges::NumericList(
object@intensity, compress = FALSE)
## Fill with eventually missing core variables
res <- fillCoreChromVariables(res)
res[, columns, drop = FALSE]
})
```
We can now use `chromData()` to either extract the full chromatogram data from
the backend, or only the data for selected variables.
```{r}
#' Extract the full data
chromData(be)
#' Selected variables
chromData(be, c("rtime", "mz", "msLevel"))
#' Only missing core spectra variables
chromData(be, c("collisionEnergy", "mzMin"))
```
### `peaksData()`
The `peaksData()` method extracts the chromatographic data (*peaks*), i.e., the
chromatograms' retention time and intensity values. This data is returned as a
`list` of arrays, with one array per chromatogram with columns being the *peaks
variables* (retention time and intensity values) and rows the individual data
pairs. Each backend must provide retention times and intensity values with this
method, but additional peaks variables (columns) are also supported.
Below we implement the `peaksData()` method for our backend. Due to the way we
stored the retention time and intensity values within our object we need to loop
over the respective lists (in `@rtime` and `intensity`) and combine the values
of each chromatogram to an array (`matrix`). Since our backend does not allow
any additional other peaks variables we allow `columns` to be only `c("rtime",
"intensity")`, and also only in that specific order.
```{r}
#' method to extract the full chromatographic data as list of arrays
setMethod(
"peaksData", "ChromBackendTest",
function(object, columns = c("rtime", "intensity")) {
if (length(columns) != 2 && columns != c("rtime", "intensity"))
stop("'columns' supports only \"rtime\" and \"intensity\"")
mapply(rtime = object@rtime, intensity = object@intensity,
FUN = cbind, SIMPLIFY = FALSE, USE.NAMES = FALSE)
})
```
And with this method we can now extract the peaks data from our backend.
```{r}
#' Extract the *peaks* data (i.e. intensity and retention times)
peaksData(be)
```
Since the `peaksData()` method is the main function used by a `Chromatograms` to
retrieve data from the backend (and further process the values), this method
should be implemented in an efficient way. Due to the way we store the data
within our example backend we need to loop over the `@rtime` and `@intensity`
slots. A different implementation that stores the peaks data already as a `list`
of arrays would be more efficient for this operation (but eventually slower for
some other operations, such as extracting peaks variables separately with the
`rtime()` or `intensity()` functions.
### `[`
The `[` method allows to subset `ChromBackend` objects. This operation is
expected to reduce a `ChromBackend` object to the selected chromatograms without
changing values for the subset chromatograms. The method should support to
subset by indices or logical vectors and should also support duplicating
elements (i.e., when duplicated indices are used) as well as to subset in
arbitrary order. An error should be thrown if indices are out of bounds, but the
method should also support returning an empty backend with `[integer()]`. The
`MsCoreUtils::i2index` function can be used to check and convert the provided
parameter `i` (defining the subset) to an integer vector.
Below we implement a possible `[` for our test backend class. We ignore the
parameters `j` from the definition of the `[` generic, since we treat our data
to be one-dimensional (with each chromatogram being one element).
```{r}
#' Main subset method.
setMethod("[", "ChromBackendTest", function(x, i, j, ..., drop = FALSE) {
i <- MsCoreUtils::i2index(i, length = length(x))
x@chromVars <- x@chromVars[i, ]
x@rtime <- x@rtime[i]
x@intensity <- x@intensity[i]
x
})
```
We can now subset our backend to the last two chromatograms.
```{r}
a <- be[2:3]
chromData(a)
```
Or extracting the second chromatogram multiple times.
```{r}
a <- be[c(2, 2, 2)]
chromData(a)
```
### `$`
The `$` method is expected to extract a single chromatogram variable from a
backend. Parameter `name` should allow to name the chromatogram variable to
return. Each `ChromBackend` **must** support extracting the core chromatogram
variables with this method (even if no data might be available for that
variable). In our example implementation below we make use of the `chromData()`
method, but more efficient implementations might be possible as well (that would
not require to first subset/create a `DataFrame` with the full data and to then
subset that again to an individual column). Also, the `$` method should check if
the requested spectra variable is available and should throw an error otherwise.
```{r}
#' Access a single chromatogram variable
setMethod("$", "ChromBackendTest", function(x, name) {
chromData(x, columns = name)[, 1L]
})
```
With this we can now extract the MS levels
```{r}
be$msLevel
```
or a core spectra variable without values in our example backend.
```{r}
be$precursorMz
```
or also the intensity values
```{r}
be$intensity
```
### `backendMerge()`
The `backendMerge()` method merges (combines) `ChromBackend` objects (of the
same type!) into a single instance. For our test backend we thus need to combine
the values in the `@chromVars`, `@rtime` and `@intensity` slots. To support also
merging of `data.frame`s with different sets of columns we use the
`MsCoreUtils::rbindFill` function instead of a simple `rbind` (this function
joins data frames making an union of all available columns filling eventually
missing columns with `NA`).
```{r}
#' Method allowing to join (concatenate) backends
setMethod("backendMerge", "ChromBackendTest", function(object, ...) {
res <- object
object <- unname(c(list(object), list(...)))
res@rtime <- do.call(c, lapply(object, function(z) z@rtime))
res@intensity <- do.call(c, lapply(object, function(z) z@intensity))
res@chromVars <- do.call(MsCoreUtils::rbindFill,
lapply(object, function(z) z@chromVars))
validObject(res)
res
})
```
Testing the function by merging the example backend instance with itself.
```{r}
a <- backendMerge(be, be[2], be)
a
```
## Data replacement methods
As stated in the general description, `ChromBackend` implementations can also be
purely *read-only* resources allowing to just access, but not to replace
data. For these backends `isReadOnly()` should return `FALSE`. Data replacement
methods listed in this section would not need to be implemented. Our example
backend stores the full data in memory, within the object, and hence we can
easily change and replace values.
Since we support replacing values we also implement the `isReadOnly()` method
for our example implementation to return `FALSE` (instead of the default
`TRUE`).
```{r}
#' Default for backends:
isReadOnly(be)
```
```{r}
#' Implementation of isReadOnly for ChromBackendTest
setMethod("isReadOnly", "ChromBackendTest", function(object) FALSE)
isReadOnly(be)
```
All data replacement function are expected to return an instance of the same
backend class that was used as input.
### `chromData<-`
The main replacement method is `chromData<-` which should allow to replace the
content of a backend with new data. This data is expected to be provided as a
`DataFrame` (similar to the one returned by `chromData()`). Also the method is
expected to replace the **full** data within the backend, i.e., all chromatogram
and peaks variables. While values can be replaced, the number of chromatograms
before and after a call to `chromData<-` has to be the same. For our example
implementation of `chromData<-` we can re-use the `backendInitialize()` method
defined before, with the `data` parameter.
```{r}
#' Replacement method for the full chromatogram data
setReplaceMethod("chromData", "ChromBackendTest", function(object, value) {
if (!inherits(value, "DataFrame"))
stop("'value' is expected to be a 'DataFrame'")
if (length(object) && length(object) != nrow(value))
stop("'value' has to be a 'DataFrame' with ", length(object), " rows")
object <- backendInitialize(ChromBackendTest(), data = value)
object
})
```
To test this new method we extract the full chromatogram data from our example
data set, add an additional column (chromatogram variable) and use `chromData<-`
to replace the data of the backend.
```{r}
d <- chromData(be)
d$new_col <- c("a", "b", "c")
chromData(be) <- d
```
Check that we have now also the new column available.
```{r}
be$new_col
```
### `$<-`
The `$<-` method should allow to replace values for an existing chromatogram
variable or to add an additional variable to the backend. As with all
replacement methods, the `length` of `value` has to match the number of
chromatograms represented by the backend. For replacement of retention time or
intensity values we need also to ensure that the data would be correct after the
operation, i.e., that the number of retention time and intensity values per
chromatogram are the identical and that all retention time and intensity values
are numeric. Finally, we use the `validChromData()` function to ensure that,
after replacement, all core chromatogram variables have the correct data type.
```{r}
#' Replace or add a single chromatogram variable.
setReplaceMethod("$", "ChromBackendTest", function(x, name, value) {
if (length(value) != length(be))
stop("length of 'value' needs to match the number of chromatograms ",
"in object.")
if (name %in% c("rtime", "intensity")) {
## In case retention time or intensity values are provided as
## NumericList convert to a list.
if (is(value, "NumericList"))
value <- as.list(value)
## Ensure number of retention time and intensity values match
if (!all(lengths(value) == lengths(x@intensity)))
stop("Number of retention time values needs to match number of ",
"intensity values.")
## Ensure all values are numeric
if (!all(vapply(value, is.numeric, logical(1))))
stop("For replacement of retention time or intensity values, ",
"'value' is expected to be a list of numeric vectors.")
if (name == "rtime")
x@rtime <- value
if (name == "intensity")
x@intensity <- value
} else
x@chromVars[[name]] <- value
## Check that data types are correct after replacement
validChromData(x@chromVars)
x
})
```
We can thus replace an existing chromatogram variable, such as `msLevel`:
```{r}
#' Values before replacement
be$msLevel
#' Replace MS levels
be$msLevel <- c(3L, 2L, 1L)
#' Values after replacement
be$msLevel
```
We can also add a new chromatogram variables:
```{r}
#' Add a new chromatogram variable
be$name <- c("A", "B", "C")
be$name
```
Or also replace intensity values. Below we replace the intensity values by
adding a value of +3 to each.
```{r}
#' Replace intensity values
be$intensity <- be$intensity + 3
be$intensity
```
### `selectChromVariables()`
The `selectChromVariables()` function should subset the content of a backend to
the selected chromatogram variables, that can be specified with parameter
`chromVariables`. As a result the input backend should be returned, but reduced
to the selected chromatogram variables. This function thus adds a subset
operation that reduces the data in a backend by *columns*, dropping all
chromatogram variables other than the ones specified with the `chromVariables`
parameter. In the implementation we need to give special care to variables
`"rtime"` and `"intensity"`. If both are about to be removed we need to
initialize the `@rtime` and `@intensity` slots with empty lists matching the
number of chromatograms in our backend. If only `"intensity"` values are to be
removed we replace them with `NA_real_` while removing only `"rtime"` is not
supported (also because retention time values of `NA` are not allowed).
```{r}
#' Method to *subset* a backend by chromatogram variables (columns)
setMethod(
"selectChromVariables", "ChromBackendTest",
function(object, chromVariables = chromVariables(object)) {
keep <- colnames(object@chromVars) %in% chromVariables
object@chromVars <- object@chromVars[, keep, drop = FALSE]
## If neither "rtime" and "intensity" is in chromVariables: initialize
## with empty vectors.
if (!any(c("rtime", "intensity") %in% chromVariables)) {
object@rtime <- vector("list", length(object))
object@intensity <- vector("list", length(object))
} else {
## intensity not in chromVariables: replace intensity values with NA
if (!"intensity" %in% chromVariables)
object@intensity <- lapply(object@intensity,
function(z) rep(NA_real_, length(z)))
## removal of only rtime is not supported
if (!"rtime" %in% chromVariables)
stop("Exclusive removal of retention times is not supported. ",
"Retention times can only be removed if also intensity ",
"values are removed.")
}
validObject(object)
object
})
```
We can now restrict the data set to only selected chrom variables:
```{r}
#' keep only dataStorage and msLevel
be_2 <- selectChromVariables(be, c("dataStorage", "msLevel"))
chromData(be_2)
```
Replacing/removing intensity values would be possible:
```{r}
#' Keep dataStorage, msLevel, mz and rtime
be_2 <- selectChromVariables(be, c("dataStorage", "msLevel", "mz", "rtime"))
chromData(be_2)
```
All intensity values are thus NA. Removing only intensity values would (should)
throw an error.
### `peaksData<-`
The `peaksData<-` method should allow to replace the full peaks data (retention
time and intensity value pairs) of all chromatograms in a backend. As `value` a
`list` of arrays (e.g. two column `numeric` matrices) should be provided with
columns names `"rtime"` and `"intensity"`. Because the full peaks data is
provided at once, this method can (and should) support changing also the number
of peaks per chromatogram (while the methods like `rtime<-` or `$rtime` would
not allow). In our implementation we need to ensure that a) the provided `list`
is of length equal to the number of chromatograms and b) each element is a
`numeric` matrix with `"rtime"` and `"intensity"` columns from which we can
extract the values.
```{r}
#' replacement method for peaks data
setReplaceMethod("peaksData", "ChromBackendTest", function(object, value) {
if (!(is.list(value) || inherits(value, "SimpleList")))
stop("'value' has to be a list-like object")
if (!length(value) == length(object))
stop("The length of the provided list has to match the number of ",
"chromatograms in 'object'")
## First loop to check also for validity of the matrices, i.e. each element
## has to be a `numeric` `matrix` with columns named "rtime" and "intensity"
object@rtime <- lapply(value, function(z) {
if (!is.matrix(z) || !is.numeric(z))
stop("'value' is expected to be a 'list' of numeric matrices")
if (!all(c("rtime", "intensity") %in% colnames(z)))
stop("All matrices in 'value' need to have columns named ",
"\"rtime\" and \"intensity\"")
z[, "rtime"]
})
object@intensity <- lapply(value, "[", , "intensity")
validObject(object)
object
})
```
With this method we can now replace the peaks data of a backend:
```{r}
#' Create a list with peaks matrices; our backend has 3 chromatograms
#' thus our `list` has to be of length 3
tmp <- list(
cbind(rtime = c(12.3, 14.4, 15.4, 16.4),
intensity = c(200, 312, 354.1, 232)),
cbind(rtime = c(14.4),
intensity = c(13.4)),
cbind(rtime = c(223.2, 223.8, 234.1, 234.5, 234.9),
intensity = c(12.3, 45.3, 65.3, 51.1, 29.3))
)
#' Assign this peaks data to one of our test backends
peaksData(be_2) <- tmp
#' Evaluate that we properly added the peaks data
peaksData(be_2)
```
## Methods with available default implementations
Default implementations for the `ChromBackend` class are available for a large
number of methods. Thus, any backend extending this class will automatically
inherit these default implementations. Alternative, class-specific, versions
can, but don't need to be developed. The default versions are defined in the
*R/ChromBackend.R* file, and also listed in this section. If alternative
versions are implemented it should be ensured that the expected data type is
always used for core chromatogram variables. Use `coreChromVariables()` to list
these mandatory data types.
### `backendParallelFactor()`
The `backendParallelFactor()` function allows a backend to suggest a preferred
way it could be split for parallel processing. The default implementation
returns `factor()` (i.e. a `factor` of length 0) hence not suggesting any
specific splitting setup.
```{r, eval = FALSE}
#' Is there a specific way how the object could be best split for
#' parallel processing?
setMethod("backendParallelFactor", "ChromBackend", function(object, ...) {
factor()
})
```
```{r}
backendParallelFactor(be)
```
### `chromVariables()`
The `chromVariables()` function is expected to return the names of all available
chromatogram variables (which should include the *core* chromatogram
variables). The default implementation is:
```{r, eval = FALSE}
#' get the available chromatogram variables.
setMethod("chromVariables", "ChromBackend", function(object) {
colnames(chromData(object))
})
```
The result from calling the default implementation on our test backend:
```{r}
chromVariables(be)
```
### `chromIndex()`
The `chromIndex()` function should return the value for the `"chromIndex"`
chromatogram variable. As a result, an `integer` of length equal to the number
of chromatograms in `object` needs to be returned. The default implementation
is:
```{r, eval = FALSE}
#' get the values for the chromIndex chromatogram variable
setMethod("chromIndex", "ChromBackend",
function(object, columns = chromVariables(object)) {
chromData(object, columns = "chromIndex")[, 1L]
})
```
The result of calling this method on our test backend:
```{r}
chromIndex(be)
```
### `collisionEnergy()`
The `collisionEnergy()` function should return the value for the
`"collisionEnergy"` chromatogram variable. As a result, a `numeric` of length
equal to the number of chromatograms has to be returned. The default
implementation is:
```{r, eval = FALSE}
#' get the values for the collisionEnergy chromatogram variable
setMethod("collisionEnergy", "ChromBackend", function(object) {
chromData(object, columns = "collisionEnergy")[, 1L]
})
```
The result of calling this method on our test backend:
```{r}
collisionEnergy(be)
```
The default replacement method for the `collisionEnergy` chromatogram variable
is: