# Tutorial Introduction


[SCRuB](https://korem-lab.github.io/SCRuB/) is a tool designed to help researchers address the common issue of contamination in microbial studies. This package provides an easy to use framework to apply SCRuB to your projects. All you need to get started are a feature tables describing both your samples and negative controls, and a metadata files describing each sample. 

In this tutorial we use _SCRuB_ to decontaminate a dataset comparing the plasma samples of cancer and control subejcts published in [Poore et al](https://www.nature.com/articles/s41586-020-2095-1). This data can be downloaded with the following links:

* **Table** (table.qza) | [download](https://github.com/korem-lab/q2-SCRuB/q2-SCRuB/tutorials/data/table.qza')
* **Sample Metadata** (metadata.tsv) | [download](https://github.com/korem-lab/q2-SCRuB/q2-SCRuB/tutorials/data/plasma_metadata.tsv)


**Note**: This tutorial assumes you have installed [QIIME2](https://qiime2.org/) using one of the procedures in the [install documents](https://docs.qiime2.org/2020.2/install/). This tutorial also assumed you have installed [SCRuB](https://korem-lab.github.io/SCRuB/).

First, we will make a tutorial directory and download the data above and move the files to the `plasma-data` directory:

```bash
mkdir plasma-data
```

To run SCRuB we only need one single command. The inputs are:

1. `table`
    - The table is of type `FeatureTable[Frequency]` which is a table where the rows are features (e.g. ASVs/microbes), the columns are samples, and the entries are the number of sequences for each sample-feature pair.
2. `metadata`
    - This is a QIIME2 formatted [metadata](https://docs.qiime2.org/2020.2/tutorials/metadata/) (e.g. tsv format) where the rows are samples matched to the (1) table and the columns are different sample data (e.g. time point).  
3. ( _Optional_ ) `control_idx_column`
    - This is the name of the column in the (2) metadata that indicates the which samples should be treated as negative controls. If not specified, will identify negative controls by searching for a metadata column of 'empo_2' or 'qiita_empo_2', and identifying which entries contain the keyword 'negative'
4. ( _Optional_ ) `sample_type_column`
    - This is the name of the column in the (2) metadata that indicates the sample type, which specifies the groupings of negative controls SCRuB should use for decontamination. Default is 'sample_type'
5. ( _Optional_ ) `well_location_column`
    - This is the name of the column in the (2) metadata that indicates the well of each sample, which specifies the groupings of negative controls SCRuB should use for decontamination. Default is 'well_id'
6. ( _Optional_ ) `control_order`
    - specifies the ordering which the negative controls from `sample_type` should be run. Default uses the ordering in which the sample are found in the metadata table.

7. output-dir
    - The desired location of the output. We will cover each output independently below.  

In this tutorial our control_idx_column is `is_control`, our sample_type_column in `sample_type`, and our well_location_column is `well_id`. Now we are ready to SCRuB away the contamination:

In [1]:
!qiime SCRuB SCRuB \\
    --i-table plasma-data/table.qza \\
    --m-metadata-file plasma-data/metadata.tsv \\
    --p-control-idx-column is_control \\
    --p-sample-type-column sample_type \\
    --p-well-location-column well_id \\
    --p-control-order NA \\
    --o-scrubbed results/scrubbed.qza

Now we can compare the raw table and SCRuB's output:

In [2]:
import os
import warnings
import qiime2 as q2
# hide pandas Future/Deprecation Warning(s) for tutorial
warnings.filterwarnings("ignore", category=DeprecationWarning) 
warnings.simplefilter(action='ignore', category=FutureWarning)

# import table
table = q2.Artifact.load('plasma-data/table.qza')\
# import metadata
metadata = q2.Metadata.load('plasma-data/metadata.tsv')
# import SCRuB output
scrubbed = q2.Artifact.load('results/scrubbed.qza')


In [3]:
from qiime2.plugins.deicode.actions import rpca
from qiime2.plugins.emperor.actions import (plot, biplot)

In [4]:
# run RPCA and plot with emperor
rpca_biplot, rpca_distance = rpca(table)
rpca_biplot_emperor = biplot(rpca_biplot, metadata)
# make directory to store results
output_path = 'results'
if os.path.isdir(output_path)==False:
    os.mkdir(output_path)

# now we can save the plots
rpca_biplot_emperor.visualization.save(os.path.join(output_path, 'Raw-RPCA-biplot.qzv'))


'results/Raw-RPCA-biplot.qzv'


Now we can visualize the samples via RPCA    

![image.png](results/Raw-RPCA.png)




For comparison, we can observe the samples decontaminated by SCRuB:

In [5]:
# run RPCA and plot with emperor
rpca_biplot, rpca_distance = rpca(scrubbed)
rpca_biplot_emperor = biplot(rpca_biplot, metadata)

# save the plots
rpca_biplot_emperor.visualization.save(os.path.join(output_path, 'SCRuBbed-RPCA-biplot.qzv'))

'results/SCRuBbed-RPCA-biplot.qzv'

![image.png](results/SCRuBbed-RPCA.png)