# Tutorial Assignment1: Atacama soil microbiome analysis

For this homework, you will be analyzing soil samples from the Atacama Desert in northern Chile. Despite extreme aridity in this region, there are still various microbes living in the soil here. The soil microbiomes you'll be analyzing here follow two east-west transects, Baquedano and Yungay, across which average soil relative humidity is positively correlated with elevation (higher elevations are less arid and thus have higher average soil relative humidity). Along these transects, pits were dug at each site and soil samples were collected from three depths in each pit.

Additional logistics:
- Please perform the analysis in bio-datahub and open a terminal to use the git clone command to copy the entire GitHub repository. The "TutorialHW1.ipynb" notebook is located in the week5 folder.
- When you open the TutorialHW1.ipynb notebook, remember to select the "Python [conda env:qiime2]" kernel before running any commands.
- Some questions below will require you to add your own cell blocks to this file and write a few commands.
- Please include this file as well as a PDF file in your submission on bCourses. You can obtain a PDF with File > Download as > PDF via LaTeX.


<br/>**This tutorial was adapted from the Atacama soil microbiome tutorial in the qiime2 documentation: https://docs.qiime2.org/2021.8/tutorials/atacama-soils/

In [None]:
import qiime2 as q2

In [None]:
!mkdir qiime2-atacama-hw
!mkdir qiime2-atacama-hw/emp-paired-end-sequences
%cd qiime2-atacama-hw

### Import required data

In [None]:
!wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2021.8/tutorials/atacama-soils/sample_metadata.tsv"
!wget \
  -O "emp-paired-end-sequences/forward.fastq.gz" \
  "https://data.qiime2.org/2021.8/tutorials/atacama-soils/10p/forward.fastq.gz"
!wget \
  -O "emp-paired-end-sequences/reverse.fastq.gz" \
  "https://data.qiime2.org/2021.8/tutorials/atacama-soils/10p/reverse.fastq.gz"
!wget \
  -O "emp-paired-end-sequences/barcodes.fastq.gz" \
  "https://data.qiime2.org/2021.8/tutorials/atacama-soils/10p/barcodes.fastq.gz"


### Prepare the data- import as QIIME 2 artifact and demultiplex

In [None]:
!qiime tools import \
   --type EMPPairedEndSequences \
   --input-path emp-paired-end-sequences \
   --output-path emp-paired-end-sequences.qza

In [None]:
!qiime demux emp-paired \
  --m-barcodes-file sample-metadata.tsv \
  --m-barcodes-column barcode-sequence \
  --p-rev-comp-mapping-barcodes \
  --i-seqs emp-paired-end-sequences.qza \
  --o-per-sample-sequences demux-full.qza \
  --o-error-correction-details demux-details.qza

Use what you've learned from the in-class tutorial to fill in an appropriate value for the --p-trim and --p-trunc parameters

In [None]:
%%time
!qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-full.qza \
 --p-trim-left-f \
  --p-trim-left-r \
  --p-trunc-len-f \
  --p-trunc-len-r \
  --o-representative-sequences rep-seqs.qza \
  --o-table table.qza \
  --o-denoising-stats stats-dada2.qza

In [None]:
!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences rep-seqs.qza \
  --o-alignment aligned-rep-seqs.qza \
  --o-masked-alignment masked-aligned-rep-seqs.qza \
  --o-tree unrooted-tree.qza \
  --o-rooted-tree rooted-tree.qza

Use what you've learned from the in-class tutorial to fill in an appropriate value for --p-sampling-depth

In [None]:
!mkdir core-metrics-results-atacama

In [None]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny rooted-tree.qza \
  --i-table table.qza \
  --p-sampling-depth \
  --m-metadata-file sample-metadata.tsv \
  --output-dir core-metrics-results-atacama

### ANCOM

In [None]:
#add zero
!qiime composition add-pseudocount \
  --i-table table.qza \
  --o-composition-table comp-table.qza

Note that this step will take ~30 min to run

In [None]:
!qiime composition ancom \
  --i-table comp-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-column vegetation \
  --o-visualization ancom-vegetation.qzv

### Taxonomic analysis
Import our classifier and obtain taxonomic classification of our samples

In [None]:
!wget \
  -O "gg-13-8-99-515-806-nb-classifier.qza" \
  "https://data.qiime2.org/2021.8/common/gg-13-8-99-515-806-nb-classifier.qza"

In [None]:
!qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza

In [None]:
!qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv