## AGP analysis complementing sociodemographic associations with childhood gut microbiomes

This is an analysis of AGP data from healthy individuals

This analysis was run in QIIME2-2021.4.

### Importing data

The AGP data was imported separately from the childhood datasets, as different sequencing protocols and primer sets were used across all studies. Code for importing the AGP data is below:

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/AGP_comp

cd ${project}

qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path ${project}/Manifest.txt \
    --output-path agp-single-end-demux.qza --input-format SingleEndFastqManifestPhred33V2
    
cd

qiime demux summarize --i-data ${project}/agp-single-end-demux.qza \
    --o-visualization ${project}/agp-single-end-demux.qzv

### Running DADA2

AGP sequences were run through DADA2 separate from the childhood datasets, per the QIIME developers' suggestions. 

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/AGP_comp

qiime dada2 denoise-single --i-demultiplexed-seqs ${project}/agp-single-end-demux.qza \
    --p-trunc-len 150 --p-trim-left 19 --p-n-threads 2 \
    --o-table ${project}/agp-table.qza \
    --o-representative-sequences ${project}/agp-rep-seqs.qza \
    --o-denoising-stats ${project}/agp-dada2-stats.qza

qiime metadata tabulate --m-input-file ${project}/agp-dada2-stats.qza \
    --o-visualization ${project}/agp-dada2-stats.qzv

qiime feature-table summarize --i-table ${project}/agp-table.qza \
    --o-visualization ${project}/agp-table.qzv \
    --m-sample-metadata-file ${project}/agp-metadata.txt

### Taxonomic classification 

Taxonomy was assigned using a Naive-Bayesian classifier trained on the Greengenes 13_8 99% OTU full-length 16S sequence database.

Pre-trained classifiers were obtained from the QIIME2 website: Bokulich, N.A., Robeson, M., Dillon, M.R. bokulich-lab/RESCRIPt. Zenodo. http://doi.org/10.5281/zenodo.3891931 Bokulich, N.A., Kaehler, B.D., Rideout, J.R. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018). https://doi.org/10.1186/s40168-018-0470-z

Mitochondria and chloroplast sequences were removed from feature tables and tables were collapsed at the level of genera.

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/AGP_comp

qiime feature-classifier classify-sklearn --i-classifier gg-13-8-99-nb-classifier.qza \
    --i-reads ${project}/agp-rep-seqs.qza --o-classification ${project}/agp-taxonomy.qza
    
qiime taxa filter-table --i-table ${project}/agp-table.qza \
    --i-taxonomy ${project}/agp-taxonomy.qza --p-exclude mitochondria,chloroplast \
    --o-filtered-table ${project}/agp-table-nomito-nochloro.qza

qiime taxa collapse --i-table ${project}/agp-table-nomito-nochloro.qza \
    --i-taxonomy ${project}/agp-taxonomy.qza \
    --p-level 6 --o-collapsed-table ${project}/agp-table-nomito-nochloro-genus.qza

qiime tools export --input-path ${project}/agp-table-nomito-nochloro-genus.qza \
    --output-path ${project}/exported

biom convert -i ${project}/exported/feature-table.biom -o ${project}/feature-table-genus.tsv --to-tsv

Adult and childhood tables were also merged and collapsed at the level of genera prior to random forest analysis in R.

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children

qiime feature-table merge \
    --i-tables ${project}/AGP_comp/agp-table-nomito-nochloro-genus.qza \
    ${project}/merged-table-nomito-nochloro-new-genus.qza \
    --o-merged-table ${project}/AGP_comp/merged-adult-child-table-genus.qza

qiime tools export --input-path ${project}/AGP_comp/merged-adult-child-table-genus.qza \
    --output-path ${project}/AGP_comp/exported

biom convert -i ${project}/AGP_comp/exported/feature-table.biom \
    -o ${project}/AGP_comp/merged-adult-child-feature-table-genus.tsv \
    --to-tsv

### Random forest classification

We then used the random forest model created using the childhood datasets to see if the same taxa that are associated with race in children are associated with race in adults.

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/

qiime sample-classifier predict-classification \
  --i-table ${project}/AGP_comp/agp-table.qza \
  --i-sample-estimator ${project}/merged-table-filtered-race-classifier/sample_estimator.qza \
  --o-predictions ${project}/merged-table-filtered-race-classifier/agp-predictions.qza \
  --o-probabilities ${project}/merged-table-filtered-race-classifier/agp-probabilities.qza

qiime sample-classifier confusion-matrix \
  --i-predictions ${project}/merged-table-filtered-race-classifier/agp-predictions.qza \
  --i-probabilities ${project}/merged-table-filtered-race-classifier/agp-probabilities.qza \
  --m-truth-file ${project}/AGP_comp/agp-metadata.txt \
  --m-truth-column race \
  --o-visualization ${project}/merged-table-filtered-race-classifier/agp-confusion-matrix.qzv
  
qiime sample-classifier predict-classification \
  --i-table ${project}/AGP_comp/agp-table.qza \
  --i-sample-estimator ${project}/merged-table-filtered-race-classifier-90/sample_estimator.qza \
  --o-predictions ${project}/merged-table-filtered-race-classifier-90/agp-predictions.qza \
  --o-probabilities ${project}/merged-table-filtered-race-classifier-90/agp-probabilities.qza

qiime sample-classifier confusion-matrix \
  --i-predictions ${project}/merged-table-filtered-race-classifier-90/agp-predictions.qza \
  --i-probabilities ${project}/merged-table-filtered-race-classifier-90/agp-probabilities.qza \
  --m-truth-file ${project}/AGP_comp/agp-metadata.txt \
  --m-truth-column race \
  --o-visualization ${project}/merged-table-filtered-race-classifier-90/agp-confusion-matrix.qzv
  
qiime sample-classifier predict-classification \
  --i-table ${project}/AGP_comp/agp-table.qza \
  --i-sample-estimator ${project}/merged-table-filtered-312month-race-classifier/sample_estimator.qza \
  --o-predictions ${project}/merged-table-filtered-312month-race-classifier/agp-predictions.qza \
  --o-probabilities ${project}/merged-table-filtered-312month-race-classifier/agp-probabilities.qza

qiime sample-classifier confusion-matrix \
  --i-predictions ${project}/merged-table-filtered-312month-race-classifier/agp-predictions.qza \
  --i-probabilities ${project}/merged-table-filtered-312month-race-classifier/agp-probabilities.qza \
  --m-truth-file ${project}/AGP_comp/agp-metadata.txt \
  --m-truth-column race \
  --o-visualization ${project}/merged-table-filtered-312month-race-classifier/agp-confusion-matrix.qzv

In reverse...

In [None]:
qiime sample-classifier classify-samples \
  --i-table ${project}/AGP_comp/agp-table.qza --m-metadata-file ${project}/AGP_comp/agp-metadata.txt \
  --m-metadata-column race --p-missing-samples ignore --p-optimize-feature-selection \
  --p-parameter-tuning --p-estimator RandomForestClassifier --p-random-state 123 \
  --output-dir ${project}/AGP_comp/agp-table-race-classifier
  
qiime sample-classifier classify-samples \
  --i-table ${project}/AGP_comp/agp-table.qza --m-metadata-file ${project}/AGP_comp/agp-metadata.txt \
  --m-metadata-column race --p-missing-samples ignore --p-optimize-feature-selection \
  --p-parameter-tuning --p-estimator RandomForestClassifier --p-random-state 123 --p-test-size 0.9 \
  --output-dir ${project}/AGP_comp/agp-table-race-classifier-90