# Tutorial Assignment 1: Atacama soil microbiome analysis

For this homework, we will be working with soil samples from the Atacama Desert in northern Chile. The Atacama Desert is one of the most arid locations on Earth, with some areas receiving less than a millimeter of rain per decade. Despite this extreme aridity, there are microbes living in the soil. The soil microbiomes profiled in this study follow two east-west transects, Baquedano and Yungay, across which average soil relative humidity is positively correlated with elevation (higher elevations are less arid and thus have higher average soil relative humidity). Along these transects, pits were dug at each site and soil samples were collected from three depths in each pit.

**Additional notes**:
- Continue to use the QIIME2 kernel at bio-datahub. As in the tutorial, use git-clone to clone the tutorial 1 repo. Work in the  "TutorialHW1.ipynb" notebook.
- Remember to select the "Python [conda env:qiime2]" kernel
- Some questions below will require you to add your own cell blocks to this file and write a few commands.
- For this homework, upload both a PDF of answers as well as your completed .ipynb. 

<br/>**This tutorial was adapted from the Atacama soil microbiome tutorial in the qiime2 documentation: https://docs.qiime2.org/2021.8/tutorials/atacama-soils/

## 1) Load the dataset into QIIME

**load the metadata**

In [None]:
from qiime2 import Metadata
from urllib import request
## download the metadata file from the online source
url = 'https://data.qiime2.org/2021.8/tutorials/atacama-soils/sample_metadata.tsv'
fn = 'sample-metadata.tsv'
request.urlretrieve(url, fn)
## load into qiime2
sample_metadata_md = Metadata.load(fn)

**load the sequences**

In [None]:
import os
## again, use the requests library to download from the online source
## three urls this time for forward and reverse reads and barcodes
urls = ['https://data.qiime2.org/2021.8/tutorials/atacama-soils/10p/forward.fastq.gz',
        'https://data.qiime2.org/2021.8/tutorials/atacama-soils/10p/reverse.fastq.gz',
        'https://data.qiime2.org/2021.8/tutorials/atacama-soils/10p/barcodes.fastq.gz']
fns = ['emp-paired-end-sequences/forward.fastq.gz',
       'emp-paired-end-sequences/reverse.fastq.gz',
       'emp-paired-end-sequences/barcodes.fastq.gz']
os.makedirs('emp-paired-end-sequences', exist_ok=True)

for url, fn in zip(urls, fns):
    request.urlretrieve(url, fn)

## load into an artifact
from qiime2 import Artifact
emp_paired_end_sequences = Artifact.import_data(
    'EMPPairedEndSequences',
    'emp-paired-end-sequences',
)

## 2) Demultiplex the sequences and processes reads into ASVs

**demultiplex**

Slightly different from the tutorial, since we have paired end data

In [None]:
import qiime2.plugins.demux.actions as demux_actions
# pull out the barcode sequence from the metadata
barcode_seqs = sample_metadata_md.get_column('barcode-sequence')
## now feed it to qiime's demultiplexing function
demux, demux_details = demux_actions.emp_paired(
    seqs=emp_paired_end_sequences,
    barcodes=barcode_seqs,
    rev_comp_mapping_barcodes=True
)
## summarize the demultiplexing results
demux_viz, = demux_actions.summarize(
    data=demux,
)
## save for export to qiime2 view
demux_viz.save('demux.qzv')

**Use what you learned in the tutorial to select trimming parameters for DADA2**

Remember that because the reads are paired end, we use separate forward and reverse trimming parameters.

*Note: DADA2 can be a little slow. Go make a coffee. If this takes >>30 min, contact Mark*

In [None]:
import qiime2.plugins.dada2.actions as dada2_actions
# input: the demultiplexed sequences, trimming parameters
# output: a feature table (samples x ASVs), ASV sequences, and denoising stats (stuff like ASVs/sample)
table, rep_seqs, stats = dada2_actions.denoise_paired(
    demultiplexed_seqs=demux,
    trim_left_f= 13,
    trim_left_r= 13,
    trunc_len_f= 150,
    trunc_len_r= 150,
)

## 3) Build a tree and taxonomize 

*Just placeholder cells from here down. You should be able to use your code from the tutorial*

**Step 1: tree building**

**Step 2: taxonomizing**

In [None]:
## Download the classifier
url = 'https://moving-pictures-tutorial.readthedocs.io/en/latest/data/moving-pictures/gg-13-8-99-515-806-nb-classifier.qza'
fn = 'gg-13-8-99-515-806-nb-classifier.qza'
request.urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier = Artifact.load(fn)

**Step 3: taxa barplot**

## 4) Diversity analyses

**Step 1: create an alpha rarefaction plot**

**Step 2: calculate all diversity metrics on the downsampled dataset**

**Step 3: extract the metrics that we're interested in**

**Step 4: test for associations with metadata**

## 5) Differential abundance testing

**Step 1: create genus and class-level tables**

**Step 2: run ANCOM**