## Step 1: SCRuB_Decontamination
**Goal: To run [SCRuB](https://www.nature.com/articles/s41587-023-01696-w) to remove any lab associated contamination**

Citation:
Austin, G.I., Park, H., Meydan, Y. et al. Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01696-w  

Qiime2 Install: https://forum.qiime2.org/t/q2-scrub-release/26609

### Imports

In [1]:
import pandas as pd

In [2]:
#Import prep and study info from pangenome filtered data on Qiita
prep = pd.read_csv('qiita_downloads/qiita13756_prep16010_pangenome/13756_prep_16010_20231009-093827.txt', sep = '\t')
meta = pd.read_csv('qiita_downloads/qiita13756_prep16010_pangenome/sample_information_from_prep_16010.tsv', sep = '\t')

### Prep for SCRuB

In [3]:
#Create metadata with SCRub specific headers
scrub_meta = pd.DataFrame()
scrub_meta['sampleid'] = meta['sample_name']
scrub_meta['is_control'] = meta['empo_1'].replace({'Host-associated': 0.0, 'Control': 1.0})
scrub_meta['sample_type'] = meta['qiita_sample_type']
scrub_meta['well_id'] = scrub_meta['sampleid'].map(prep.set_index('sample_name')['sample_well'])

scrub_meta.to_csv('processed_data/SCRuB/scrub_meta_pangenome.tsv', sep = '\t', index = False)
#scrub_meta[100:150]

### Decontaminate with SCRuB

In [4]:
#WOL2
! qiime SCRuB SCRuB \
--i-table qiita_downloads/qiita13756_prep16010_pangenome/183373_feature-table_WoLr2.qza \
--m-metadata-file processed_data/SCRuB/scrub_meta_pangenome.tsv \
--p-control-idx-column is_control \
--p-sample-type-column sample_type \
--p-well-location-column well_id \
--p-control-order "control blank" \
--o-scrubbed processed_data/SCRuB/183373_WoLr2_pangenome_scrubbed.qza

[32mSaved FeatureTable[Frequency] to: processed_data/SCRuB/183373_WoLr2_pangenome_scrubbed.qza[0m
[0m

In [5]:
#RS210
! qiime SCRuB SCRuB \
--i-table qiita_downloads/qiita13756_prep16010_pangenome/183319_feature-table_RS210.qza \
--m-metadata-file processed_data/SCRuB/scrub_meta_pangenome.tsv \
--p-control-idx-column is_control \
--p-sample-type-column sample_type \
--p-well-location-column well_id \
--p-control-order "control blank" \
--o-scrubbed processed_data/SCRuB/183319_RS210_pangenome_scrubbed.qza

[32mSaved FeatureTable[Frequency] to: processed_data/SCRuB/183319_RS210_pangenome_scrubbed.qza[0m
[0m