vignettes/workflow_2.Rmd

---
title: "Workflow 2: Extract Fluorescence Distributions"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Workflow 2: Extract Fluorescence Distributions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
#
library(flowdex)
src <- "https://github.com/bpollner/data/raw/main/flowdex_tutorial/flowdex_tutorial.zip"
td <- tempdir()
# this is downloading and unzipping the tutorial data if not already present:
flowdex::check_download_data(where = td, data_source = src, dsname = "flowdex_tutorial")
#
exp_home <- paste0(td, "/tap_water_home") 
dir.create(exp_home, showWarnings = FALSE)
flowdex::genfs(exp_home) # create the required folder structure in 'tap_water_home'
#
knitr::opts_knit$set(root.dir = paste0(tempdir(), "/tap_water_home"))
```

There are two basic scenarios when using `flowdex` that further define the appropriate workflow:

1.  Gating strategy and polygon gates are **not yet defined**.\
    Here, the focus is on [establishing the gating strategy](https://bpollner.github.io/flowdex/articles/workflow_1.html) and manually drawing the polygon gates.

2.  Gating strategy and polygon gates are **already defined**.\
    Here, the focus is on using the previously established gating strategy to extract fluorescence distributions, visualise them and export them to file. All this (except visualisation) is conveniently done via `flowdexit()`. Look at [Quickstart](https://bpollner.github.io/flowdex/articles/flowdex.html#quickstart) for an immediate example.

------------------------------------------------------------------------

## Workflow 2

If not already done, set up the tutorial data as described in [Get Started](https://bpollner.github.io/flowdex/articles/flowdex.html).\
And, if not already done either, copy these files from the tutorial folder in the experiment-home folder:

```{r}
td <- tempdir()
fromFcs <- list.files(paste0(td, "/flowdex_tutorial/fcsFiles"), full.names=TRUE)
fromDict <- list.files(paste0(td, "/flowdex_tutorial/dictionary"), full.names=TRUE)
fromGating <- list.files(paste0(td, "/flowdex_tutorial/gating"), full.names=TRUE)
toFcs <- paste0(td, "/tap_water_home/fcsFiles")
toDict <- paste0(td, "/tap_water_home/dictionary")
toGating <- paste0(td, "/tap_water_home/gating")
#
all(file.copy(fromFcs, toFcs, overwrite=TRUE))
file.copy(fromDict, toDict, overwrite=TRUE)
file.copy(fromGating, toGating, overwrite=TRUE)
```
Set the working directory to 'tap_water_home':
```{r}
setwd(paste0(td, "/tap_water_home"))
```

## Flowdexit

Using `flowdexit()` is the most straight-forward way to extract fluorescence distributions when the gating strategy and all the gate definitions are checked and in place. (Read [Workflow 1](https://bpollner.github.io/flowdex/articles/workflow_1.html) for learning how to write a gating strategy and how to define polygon gates.)\
At this point we assume that we

-   Provided a structured ID character in the sample ID field of each sample in the GUI of our FCM-machine,

-   completed our data acquisition and have all our fcs files in the folder called 'fcsFiles',

-   have a dictionary file where all the abbreviations in the structured ID character are listed in the folder called 'dictionary',

-   have a gating strategy file along with its gate definitions in the folder called 'gating'.

If all that requirements are met, we can call:

```{r}
fdmat1 <- flowdexit() # this might take a few seconds
fdmat1@pData[1:5,] # to inspect volume and sample ID data
fdmat1@cyTags[1:5,] # to inspect class- and numerical variables assigned to each sample
fdist <- fdmat1[[1]] # to inspect fluorescence distribution
fdist[1:5, 130:134] # pick some random fluorescence intensities
```

This will use the default gating strategy called 'gateStrat' (where one (1) gate is defined), extract the fluorescence distributions along the 'FITC.A' channel (in that example), **re-calculate** them to **events per ml** and save the resulting data in an xlsx file (along with some metadata etc. in additional sheets).\
The resulting R-object is saved to disc as well. (See also `fd_save()` and `fd_load()` for how to manually save and load the resulting 'fdmat' object.)

When calling `flowdexit()`, the gating set built on the way to extract the fluorescence distributions gets assigned to an extra environment. It can be accessed via:

```{r}
gs1 <- gsenv$gatingSet
length(flowWorkspace::sampleNames(gs1))
flowWorkspace::plot(gs1) # inspect the gating hierarchy
```

Lets be a bit more ambitious and try the (slightly non-sensical) gating strategy where multiple nested gates are defined, called 'gateStrat_2':

```{r}
fdmat2 <- flowdexit(gateStrat = "gateStrat_2")
#
str(fdmat2, max.level = 2)
fdmat2@metadata
fdmat2@gateStrat
#
str(fdmat2[[1]], max.level = 2)
str(fdmat2[[2]], max.level = 2)
#
```

Again, the gating set that was produced on the way can be accessed via:

```{r}
gs2 <- gsenv$gatingSet
flowWorkspace::plot(gs2) # inspect the gating hierarchy
```

Please observe that every call to `flowdexit()` will assign the object `gatingSet`to the environment `gsenv`. So, only one gating set, the **last one** produced when calling `flowdexit()`, can be accessed in that environment.

## Visualize Gates

To visualize the gated data call

``` r
plotgates(gs1, toPdf = FALSE) # too much on one graphic
```

This would plot *all* gated data from *all* samples in one page, what is a bit much -- making the single plot too small.\
To improve this, it is possible to define a column name from the cyTags (as can be found in the 'fdmat' object) that then will be used to split the data, resulting in only those samples being plotted in the same graphic that share the same value in the specified cyTags-column:

``` {r}
colnames(fdmat1@cyTags)
plotgates(gs1, spl = "Y_Time.d", toPdf = TRUE)
```

(The resulting pdf can be found in the folder 'tap_water_home/plots'; ('toPdf' defaults to TRUE)).\
This way we have only samples from the same day on one graphic - a more useful output, but still rather small individual plots.

A solution could be to read in only fcs files from the same day, then use `plotgates()` split by 'C_treatment'. Lets try this using the function `makeAddGatingSet()`:
```{r include=FALSE}
gs_d4 <- makeAddGatingSet(patt = "T4", gateStrat = "gateStrat_2")
plotgates(gs_d4, spl = "C_treatment", fns = "_day4")
gs_d5 <- makeAddGatingSet(patt = "T5", gateStrat = "gateStrat_2")
plotgates(gs_d5, spl = "C_treatment", fns = "_day5")
gs_d6 <- makeAddGatingSet(patt = "T6", gateStrat = "gateStrat_2")
plotgates(gs_d6, spl = "C_treatment", fns = "_day6")
```
(The resulting PDFs can again be found in 'tap_water_home/plots'.)
```{r}
# with only one gate (use default gating strategy) again, just to show it here in the vignette
plotgates(makeAddGatingSet(patt = "T6"), spl = "C_treatment", toPdf = FALSE)
```
This should give nice graphics of the gated data in each sample. Please observe that gated data get plotted for every gate in the gating strategy where 'keepData' is set to TRUE. \
Set the argument 'plotAll' in the function `plotgates()` to TRUE to override this and to plot gated data from **all** gates defined in the gating strategy, what can be very helpful in the process of checking / verifying the gating strategy:

```{r include=FALSE}
plotgates(gs_d6, spl = "C_treatment", fns = "_day6_allGates", plotAll = TRUE)
# again split by 'C_treatment', and exported to pdf
```

## Visualize Fluorescence Distribution
As the purpose of `flowdex` is to extract fluorescence distributions, it would be borderline obscene not to have at least a basic way to visually inspect the result. \ 
(Note: The functionality to visualize the fluorescence distribution is merely intended to give a first view of the data. It is not intended for data analysis or presentation.)\
``` {r}
plotFlscDist(fdmat1)
```
(Again, the default behavior is to export the plot to a pdf located in the folder 'plots'.)\
Fluorescence intensities of all samples along the FITC channel (in this example) are displayed **re-calculated to events per volume on the y-axis** .\
Again, too much information.\

So, for the purpose of this demonstration, we will look at a *subset* of the data -  only samples from day 4 and the first third of beakers. We do this by providing a pattern to read in only those filenames that match this pattern:
``` {r}
fdmat_s <- flowdexit(patt = "T4_th1")
fdmat_s # has now only 12 samples
gs_s <- gsenv$gatingSet # not required, just to keep it
#
plotFlscDist(fdmat_s, toPdf = FALSE) # much nicer
```

Again, as in `plotgates()`, it is possible to split the data by providing a column name of the cyTags to have only these samples in the graphic that have the same value in this cyTag:

``` {r}
colnames(fdmat_s@cyTags) # we choose "C_treatment"
plotFlscDist(fdmat_s, spl = "C_treatment",  ti="Day 4, first third",  toPdf = FALSE)
plotFlscDist(fdmat_s, spl = "C_treatment", fns="_d4_th1", ti="Day 4, first third") # export also to pdf
```

Please refer to the manual at `plotFlscDist()` to see the arguments for custom colors and custom linetypes.

Go on to learn about the [Accessory Functions](https://bpollner.github.io/flowdex/articles/accessory.html), e.g. about using the function `applyBandpass()` to apply a bandpass-like filter to the fluorescence intensities.