## AGP analysis complementing sociodemographic associations with childhood gut microbiomes

This is an analysis of AGP data from healthy individuals

This analysis was run in QIIME2-2021.4.

### Importing data

The AGP data was imported separately from the childhood datasets, as different sequencing protocols and primer sets were used across all studies. Code for importing the AGP data is below:

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/AGP_comp

cd ${project}

qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path ${project}/Manifest.txt \
    --output-path agp-single-end-demux.qza --input-format SingleEndFastqManifestPhred33V2
    
cd

qiime demux summarize --i-data ${project}/agp-single-end-demux.qza \
    --o-visualization ${project}/agp-single-end-demux.qzv

### Running DADA2

AGP sequences were run through DADA2 separate from the childhood datasets, per the QIIME developers' suggestions. 

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/AGP_comp

qiime dada2 denoise-single --i-demultiplexed-seqs ${project}/agp-single-end-demux.qza \
    --p-trunc-len 150 --p-trim-left 19 --p-n-threads 2 \
    --o-table ${project}/agp-table.qza \
    --o-representative-sequences ${project}/agp-rep-seqs.qza \
    --o-denoising-stats ${project}/agp-dada2-stats.qza

qiime metadata tabulate --m-input-file ${project}/agp-dada2-stats.qza \
    --o-visualization ${project}/agp-dada2-stats.qzv

qiime feature-table summarize --i-table ${project}/agp-table.qza \
    --o-visualization ${project}/agp-table.qzv \
    --m-sample-metadata-file ${project}/agp-metadata.txt

### Random forest classification

We then used the random forest model created using the childhood datasets to see if the same taxa that are associated with race in children are associated with race in adults.

In [None]:
project=/Users/elizabethmallott/Dropbox/Projects/VMI/children/

qiime sample-classifier predict-classification \
  --i-table AGP_comp/agp-table.qza \
  --i-sample-estimator merged-table-filtered-race-classifier/sample_estimator.qza \
  --o-predictions merged-table-filtered-race-classifier/agp-predictions.qza \
  --o-probabilities merged-table-filtered-race-classifier/agp-probabilities.qza

qiime sample-classifier confusion-matrix \
  --i-predictions merged-table-filtered-race-classifier/agp-predictions.qza \
  --i-probabilities merged-table-filtered-race-classifier/agp-probabilities.qza \
  --m-truth-file AGP_comp/agp-metadata.txt \
  --m-truth-column Race \
  --o-visualization merged-table-filtered-race-classifier/agp-confusion-matrix.qzv
  
qiime sample-classifier predict-classification \
  --i-table AGP_comp/agp-table.qza \
  --i-sample-estimator merged-table-filtered-312month-race-classifier/sample_estimator.qza \
  --o-predictions merged-table-filtered-312month-race-classifier/agp-predictions.qza \
  --o-probabilities merged-table-filtered-312month-race-classifier/agp-probabilities.qza

qiime sample-classifier confusion-matrix \
  --i-predictions merged-table-filtered-312month-race-classifier/agp-predictions.qza \
  --i-probabilities merged-table-filtered-312month-race-classifier/agp-probabilities.qza \
  --m-truth-file AGP_comp/agp-metadata.txt \
  --m-truth-column Race \
  --o-visualization merged-table-filtered-312month-race-classifier/agp-confusion-matrix.qzv