# QIIME 2 Workflow for OGT paper:
Adapted from Vanja's Qiime2_Workflow_MicrobialMatt_August2020 python notebook



# This workflow was generated from LangilleLab workflow and Qiime Tutorials

## Qiime 2 workflow

For this particular dataset, argonne - I am using different primers from EMP.

used Qiime2 and Langille lab tutorials to piece this together. There's definitely room for improvement.

Qiime2 uses two different types of files that contain the data and metadata for the analysis (.qza and .qzv files). To see what type of data is contained in a data file, use the command qiime tools peek filename.qza. The files will contain basic info (name, universally unique identifier, data type and dataformat). the raw data in these files can be accessed using the command qiime tools export

updated and ran in June 2020 using qiime2-2020.2 to process Danielle's ECHO 16S samples (130 of them)   


In [7]:
'''Activate qiime env'''
# conda activate qiime2-2020.6 prior to opening jupyter notebook

!qiime --version

q2cli version 2020.6.0
Run `qiime info` for more version details.


### “EMP protocol” multiplexed paired-end fastq

Format description

one forward.fastq.gz file that contains the forward sequence reads,

one reverse.fastq.gz file that contains the reverse sequence reads,

one barcodes.fastq.gz file that contains the associated barcode reads

In [13]:
'''Importing raw sequence files based on "EMP protocol" \
multiplexed paired-end fastq'''

# NileshMeta_Undetermined_S0_L001_R1_001.fastq.gz --> forward.fastq.gz
# NileshMeta_Undetermined_S0_L001_R2_001.fastq.gz --> reverse.fastq.gz
# NileshMehta_Undetermined_S0_L001_I1_001.fastq.gz --> barcodes.fastq.gz


!qiime tools import \
   --type EMPPairedEndSequences \
   --input-path CHBoral_PairedEndSeq_rawdata/ \
   --output-path CHBoral_PairedEndSeq.qza
   

[32mImported CHBoral_PairedEndSeq_rawdata/ as EMPPairedEndDirFmt to CHBoral_PairedEndSeq.qza[0m


## Demultiplex:
#### We must demultiplex these reads to determine which sample each read came from.

Demultiplexed using demux emp-paired using reverse primer sequence that has the barcode embedded.

In [4]:
'''Demultiplex paired end sequences'''
# metadata file was renamed and reformated from SampleInfo_Nilesh Mehta 2Feb2015.xlsx

!qiime demux emp-paired \
  --m-barcodes-file mappingfile_NileshMehta_with_Metadata_062715_corrected.txt \
  --m-barcodes-column BarcodeSequence \
  --p-rev-comp-mapping-barcodes True \
  --i-seqs CHBoral_PairedEndSeq.qza \
  --o-per-sample-sequences CHBoral_PairedEndSeq_demux-full.qza \
  --o-error-correction-details CHBoral_PairedEndSeq_demux-details.qza


#  --p-rev-comp-mapping-barcodes True \


# !qiime demux emp-paired \
#   --m-barcodes-file MicrobialMat.tsv \
#   --m-barcodes-column BarcodeSequence \
#   --p-rev-comp-mapping-barcodes \
#   --i-seqs emp-paired-end-sequences.qza \
#   --o-per-sample-sequences demux-full.qza \
#   --o-error-correction-details demux-details.qza

[32mSaved SampleData[PairedEndSequencesWithQuality] to: CHBoral_PairedEndSeq_demux-full.qza[0m
[32mSaved ErrorCorrectionDetails to: CHBoral_PairedEndSeq_demux-details.qza[0m


In [6]:
'''Summarize demultiplexed outputs'''

!qiime demux summarize \
--i-data CHBoral_PairedEndSeq_demux-full.qza \
--o-visualization CHBoral_PairedEndSeq_demux-full_qualities.qzv


# !qiime demux summarize \
# --i-data demux-full.qza \
# --o-visualization qualities.qzv

[32mSaved Visualization to: CHBoral_PairedEndSeq_demux-full_qualities.qzv[0m


In [18]:
'''View summary of demultiplexed qualities'''

!qiime tools view CHBoral_PairedEndSeq_demux-full_qualities.qzv


Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

#### Based on fastq stats all sequences are of >= 248bp this should allow for DADA2 run to be successful. This would provide ~31bp overlap between the 314F and 806R reads.

However, given how slow DADA2 will run and that it overlap region of at least 12bp between the forward and reverse reads, we will subsample to ensure its success. 

In [10]:
"Subsample to ensure DADA2 run is succesful"

# will result in ~10,000 seqs per sample

!qiime demux subsample-paired \
  --i-sequences CHBoral_PairedEndSeq_demux-full.qza \
  --p-fraction 0.08 \
  --o-subsampled-sequences CHBoral_PairedEndSeq_demux-subsample.qza


[32mSaved SampleData[PairedEndSequencesWithQuality] to: CHBoral_PairedEndSeq_demux-subsample.qza[0m


In [11]:
'''Summarize subsampled demultiplexed qualities'''

!qiime demux summarize \
  --i-data CHBoral_PairedEndSeq_demux-subsample.qza \
  --o-visualization CHBoral_PairedEndSeq_demux-subsample.qzv

[32mSaved Visualization to: CHBoral_PairedEndSeq_demux-subsample.qzv[0m


In [63]:
'''View summary of subsampled demultiplexed qualities'''

!qiime tools view CHBoral_PairedEndSeq_demux-subsample.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

#### Given that all sequences are of >= 248bp this should allow for DADA2 run to be successful. This would provide ~31bp overlap between the 314F and 806R reads.

### Running DADA2 workflow which does the following:
1. filter and trim the reads
2. find the most likely original sequences in the sample (ASVs)
3. remove chimeras:
--p-chimera-method 'consensus': Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.  
4. count the abundances

Full length of the reads (>=~248bp) will give ~31 bp overlap 

For quality trimming in DADA2, a read length of ~>=240bp must be maintained to get a 15bp overlap. DADA2 requires at least 12 bp overlap over it will not work.

Primer 806R is 20bp and primer 341F is 17bp. DADA2 expects primer-free reads and hence 20bp will be trimmed for forward and reverse reads. Higher number of bp is trimmed to reduce inaccuracies for ASV/OTUs.
However, given that the quality of the first ~20 bp for forward and reverse reads is very high, I will trim the first 5bp only.

In [21]:
#!qiime dada2 denoise-paired --i-demultiplexed-seqs reads_qza/reads_trimmed.qza \
 #                          --p-trunc-len-f 270 \
  #                         --p-trunc-len-r 210 \
   #                        --p-max-ee-f 2 \
    #                       --p-max-ee-r 3 \
     #                      --p-n-threads 4 \
      #                     --output-dir dada2_output --verbose


!qiime dada2 denoise-paired \
  --i-demultiplexed-seqs CHBoral_PairedEndSeq_demux-subsample.qza \
  --p-trim-left-f 5 \
  --p-trim-left-r 5 \
  --p-trunc-len-f 240 \
  --p-trunc-len-r 240 \
  --o-table DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-table.qza \
  --o-representative-sequences DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-rep-seqs.qza \
  --o-denoising-stats DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-denoising-stats.qza

[32mSaved FeatureTable[Frequency] to: CHBoral_PairedEndSeq_demux-subsample-table.qza[0m
[32mSaved FeatureData[Sequence] to: CHBoral_PairedEndSeq_demux-subsample-rep-seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: CHBoral_PairedEndSeq_demux-subsample-denoising-stats.qza[0m


In [39]:
'''Generate summaries for DADA2 output'''

!qiime feature-table summarize \
  --i-table DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-table.qza \
  --o-visualization DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-table.qzv \
  --m-sample-metadata-file mappingfile_NileshMehta_with_Metadata_062715_corrected.txt

!qiime feature-table tabulate-seqs \
  --i-data DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-rep-seqs.qza \
  --o-visualization DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-rep-seqs.qzv

!qiime metadata tabulate \
  --m-input-file DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-denoising-stats.qza \
  --o-visualization DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-denoising-stats.qzv


[32mSaved Visualization to: DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-table.qzv[0m
[32mSaved Visualization to: DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-rep-seqs.qzv[0m
[32mSaved Visualization to: DADA2_subsample/CHBoral_PairedEndSeq_demux-subsample-denoising-stats.qzv[0m


In [25]:
'''View dada2 stats for subsampled demultiplexed seqs'''

!qiime tools view CHBoral_PairedEndSeq_demux-subsample-denoising-stats.qzv



Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

#### Given that DADA2 was successful in execution for subsampled demultiplexed sequences, will not move forward with all the sequences.

In [33]:
'''Run DADA2 with same parameters as used for the subsampled seq'''

!qiime dada2 denoise-paired \
  --i-demultiplexed-seqs CHBoral_PairedEndSeq_demux-full.qza \
  --p-trim-left-f 5 \
  --p-trim-left-r 5 \
  --p-trunc-len-f 240 \
  --p-trunc-len-r 240 \
  --o-table DADA2_full/CHBoral_PairedEndSeq_demux-full-table.qza \
  --o-representative-sequences DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qza \
  --o-denoising-stats DADA2_full/CHBoral_PairedEndSeq_demux-full-denoising-stats.qza


[32mSaved FeatureTable[Frequency] to: DADA2_full/CHBoral_PairedEndSeq_demux-full-table.qza[0m
[32mSaved FeatureData[Sequence] to: DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: DADA2_full/CHBoral_PairedEndSeq_demux-full-denoising-stats.qza[0m


In [38]:
'''Generate summaries for DADA2 output'''

!qiime feature-table summarize \
  --i-table DADA2_full/CHBoral_PairedEndSeq_demux-full-table.qza \
  --o-visualization DADA2_full/CHBoral_PairedEndSeq_demux-full-table.qzv \
  --m-sample-metadata-file mappingfile_NileshMehta_with_Metadata_062715_corrected.txt

!qiime feature-table tabulate-seqs \
  --i-data DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qza \
  --o-visualization DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qzv

!qiime metadata tabulate \
  --m-input-file DADA2_full/CHBoral_PairedEndSeq_demux-full-denoising-stats.qza \
  --o-visualization DADA2_full/CHBoral_PairedEndSeq_demux-full-denoising-stats.qzv


[32mSaved Visualization to: DADA2_full/CHBoral_PairedEndSeq_demux-full-table.qzv[0m
[32mSaved Visualization to: DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qzv[0m
[32mSaved Visualization to: DADA2_full/CHBoral_PairedEndSeq_demux-full-denoising-stats.qzv[0m


In [67]:
'''View dada2 stats for demultiplexed seqs'''

!qiime tools view DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qzv



Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

Overall, after the DADA2 processing, average of ~45% of the sequences were retained per sample. Given the tutorials output with DADA2, our data did pretty well. This means that on average the samples have 61000 read pairs (min is ~13000 pairs and max is ~171000).

### Filter features based on sequence length

Filter features based on short sequence length: spot checked sequences by blasting short length sequences. This yielded irrelevant taxonomic assignments (i.e. human sequences). Filter out features with sequence length less than 390. 

To determine whether filtering based on feature frequency should be done, we will assign taxonomy first. Based on taxonomic classification of the rare ASVs, we can determine whether biologically speaking these features should be excluded or not.

At this point, o filtering of features based on frequency will be performed. Rare ASVs (i.e. based on illumina MiSeq error rate of 0.1%) are those with features with frequency less than 0.001 of teh mean of all the feature frequencies. 

In the case of this of data, any features less than (2,480.3201483312732* 0.001) would be considered as rare ASVs. This means that features that occur less than 2.4 times (i.e. 53 out of 809 features) will be considered as rare ASVs. This is 6% of the features, which could have a huge impact on downstream data interpretations (pariticularly for diversity analyses).

In [64]:
'''Filter for features with sequence length >390'''

!qiime feature-table filter-seqs \
    --i-data DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qza \
    --m-metadata-file DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs.qza \
    --p-where 'length(sequence) > 390' \
    --o-filtered-data DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qza

'''View new feature table for filtered seqs'''

!qiime feature-table tabulate-seqs \
  --i-data DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qza \
  --o-visualization DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qzv



[32mSaved FeatureData[Sequence] to: DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qza[0m
[32mSaved Visualization to: DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qzv[0m


In [65]:
'''View dada2 stats for demultiplexed seqs'''

!qiime tools view DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qzv


Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Assign taxonomy

In [78]:
"Assign taxonomy based on 16S V4/V5 region specific to 341F and 806R primers"

!qiime feature-classifier classify-sklearn \
  --i-reads DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qza \
  --i-classifier taxa_classifiers/classifier_silva_132_16S_v3v4.qza \
  --p-n-jobs 1 \
  --output-dir DADA2_full/taxa  



^C

Aborted!


### Generate a tree for phylogenetic diversity analyses

In [68]:
'''Build phylogentic tree using fasttree and mafft alignment \
for downstream analyses related to diversity'''

!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences DADA2_full/CHBoral_PairedEndSeq_demux-full-L390-filtered-seqs.qzv \
  --o-alignment DADA2_full/tree_outdir/CHBoral_PairedEndSeq_demux-full-L390-filtered-aligned-rep-seqs.qza \
  --o-masked-alignment DADA2_full/tree_outdir/CHBoral_PairedEndSeq_demux-full-L390-filtered-masked-aligned-rep-seqs.qza \
  --o-tree DADA2_full/tree_outdir/CHBoral_PairedEndSeq_demux-full-L390-filtered-unrooted-tree.qza \
  --o-rooted-tree DADA2_full/tree_outdir/CHBoral_PairedEndSeq_demux-full-L390-filtered-rooted-tree.qza



Usage: [34mqiime phylogeny align-to-tree-mafft-fasttree[0m [OPTIONS]

  This pipeline will start by creating a sequence alignment using MAFFT,
  after which any alignment columns that are phylogenetically uninformative
  or ambiguously aligned will be removed (masked). The resulting masked
  alignment will be used to infer a phylogenetic tree and then subsequently
  rooted at its midpoint. Output files from each step of the pipeline will
  be saved. This includes both the unmasked and masked MAFFT alignment from
  q2-alignment methods, and both the rooted and unrooted phylogenies from
  q2-phylogeny methods.

[1mInputs[0m:
  [34m[4m--i-sequences[0m ARTIFACT [32mFeatureData[Sequence][0m
                          The sequences to be used for creating a fasttree
                          based rooted phylogenetic tree.           [35m[required][0m
[1mParameters[0m:
  [34m--p-n-threads[0m VALUE [32mInt % Range(1, None) | Str % Choices('auto')[0m
           

### Generate rarefaction curves

Filter out features with Convert stats QZA to TSV to downselect for those features that have a seuqence length >390bp.

In [60]:
!ls DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs


descriptive_stats.tsv    [34mjs[m[m                       sequences.fasta
index.html               [34mq2templateassets[m[m         seven_number_summary.tsv


In [62]:
!less DADA2_full/CHBoral_PairedEndSeq_demux-full-rep-seqs/seven_number_summary.tsv


Quantile        Value
0.02    252
0.09    394
0.25    398
0.5     414
0.75    419
0.91    419
0.98    420
[K[7m(END)[m[K_PairedEndSeq_demux-full-rep-seqs/seven_number_summary.tsv (END)[m[K

Summarize output table -- You should use this visualization to decide how to filter your table and also if you want to rarefy your data (which simplifies your analyses, but throws out data!) you can choose a cut-off based on this file.

In [30]:
!qiime feature-table summarize --i-table dada2_output/table.qza --o-visualization dada2_output/table_summary.qzv



[32mSaved Visualization to: dada2_output/table_summary.qzv[0m


Filtering out rare ASVs -- Based on the above summary visualization you can choose a cut-off for how frequent a variant needs to be (and optionally how many samples need to have the variant) for it to be retained. One possible choice would be to remove all ASVs that have a frequency of less than 0.1% of the mean sample depth. This cut-off excludes ASVs that are likely due to MiSeq bleed-through between runs (reported by Illumina to be 0.1% of reads).


In [31]:
!qiime feature-table filter-features --i-table dada2_output/table.qza \
                                    --p-min-frequency 2 \
                                    --p-min-samples 2 \
                                    --o-filtered-table dada2_output/table_filt.qza

[32mSaved FeatureTable[Frequency] to: dada2_output/table_filt.qza[0m


Since the ASVs will be in a separate FASTA file you can exclude the low frequency ASVs from the sequence file with this command:

In [35]:
!qiime feature-table filter-seqs --i-data dada2_output/rep-seqs.qza \
                                --i-table dada2_output/table_filt.qza \
                                --o-filtered-data dada2_output/rep_seqs_filt.qza

[32mSaved FeatureData[Sequence] to: dada2_output/rep_seqs_filt.qza[0m


In [36]:
!qiime feature-table summarize --i-table dada2_output/table_filt.qza --o-visualization dada2_output/table_filt_summary.qzv




[32mSaved Visualization to: dada2_output/table_filt_summary.qzv[0m


## Assign taxonomy
You can assign taxonomy to your ASVs using a Naive-Bayes approach implemented in the scikit learn Python library and the SILVA database. Note that we have trained classifiers for a few different amplicon regions already (which are available in the /home/shared/taxa_classifiers folder), but you will need to generate your own if your region of interest isn't there. The classifier filename below is for the V6/V8 B969F-BA1406R region. The regions that we have trained classifiers for are:


CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.




In [37]:
!qiime feature-classifier classify-sklearn --i-reads dada2_output/rep_seqs_filt.qza \
                                          --i-classifier /Users/vklepacc/classifiers/classifier_silva_132_99_16S_V4.V5_515F_926R.qza \
                                          --p-n-jobs 4 \
                                          --output-dir taxa  

[32mSaved FeatureData[Taxonomy] to: taxa/classification.qza[0m


As with all QZA files, you can export the output file to take a look at the classifications and confidence scores:

In [38]:
!qiime tools export \
   --input-path taxa/classification.qza --output-path taxa

[32mExported taxa/classification.qza as TSVTaxonomyDirectoryFormat to directory taxa[0m


In [42]:
!qiime feature-table tabulate-seqs --i-data dada2_output/rep-seqs.qza \
                                  --o-visualization dada2_output/representative-seqs.qzv

[32mSaved Visualization to: dada2_output/representative-seqs.qzv[0m


A more useful output is the interactive stacked bar-charts of the taxonomic abundances across samples, which can be output with this command:

In [43]:
!qiime taxa barplot --i-table dada2_output/table_filt.qza \
                   --i-taxonomy taxa/classification.qza \
                   --m-metadata-file MicrobialMat2.tsv \
                   --o-visualization taxa/taxa_barplot.qzv

[32mSaved Visualization to: taxa/taxa_barplot.qzv[0m


## Build quick phylogeny with FastTree
1. make multiple-sequence alignment using MAFFT

In [44]:
!mkdir tree_out

In [45]:
!qiime alignment mafft --i-sequences dada2_output/rep_seqs_filt.qza \
                      --p-n-threads 4 \
                      --o-alignment tree_out/rep_seqs_filt_aligned.qza

[32mSaved FeatureData[AlignedSequence] to: tree_out/rep_seqs_filt_aligned.qza[0m


filtering multiple-sequence alignment

In [47]:
!qiime alignment mask --i-alignment tree_out/rep_seqs_filt_aligned.qza \
  --o-masked-alignment tree_out/rep_seqs_filt_aligned_masked.qza

[32mSaved FeatureData[AlignedSequence] to: tree_out/rep_seqs_filt_aligned_masked.qza[0m


## Running FastTree
Finally FastTree can be run on this masked multiple-sequence alignment:

In [48]:
!qiime phylogeny fasttree --i-alignment tree_out/rep_seqs_filt_aligned_masked.qza \
                         --p-n-threads 4 \
                         --o-tree tree_out/rep_seqs_filt_aligned_masked_tree

[32mSaved Phylogeny[Unrooted] to: tree_out/rep_seqs_filt_aligned_masked_tree.qza[0m


Add root to tree
FastTree returns an unrooted tree. One basic way to add a root to a tree is to add it add it at the midpoint of the largest tip-to-tip distance in the tree, which is done with this command:

In [49]:
!qiime phylogeny midpoint-root --i-tree tree_out/rep_seqs_filt_aligned_masked_tree.qza \
                              --o-rooted-tree tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza

[32mSaved Phylogeny[Rooted] to: tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza[0m


Generate rarefaction curves
A key quality control step is to plot rarefaction curves for all of your samples to determine if you performed sufficient sequencing. The below command will generate these plots (X is a placeholder for the maximum depth in your dataset, which you can determine by running the summarize command above).
## remember to change x

In [56]:
!qiime diversity alpha-rarefaction -h

Usage: [34mqiime diversity alpha-rarefaction[0m [OPTIONS]

  Generate interactive alpha rarefaction curves by computing rarefactions
  between `min_depth` and `max_depth`. The number of intermediate depths to
  compute is controlled by the `steps` parameter, with n `iterations` being
  computed at each rarefaction depth. If sample metadata is provided,
  samples may be grouped based on distinct values within a metadata column.

[1mInputs[0m:
  [34m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                          Feature table to compute rarefaction curves from.
                                                                    [35m[required][0m
  [34m--i-phylogeny[0m ARTIFACT  Optional phylogeny for phylogenetic metrics.
    [32mPhylogeny[Rooted][0m                                               [35m[optional][0m
[1mParameters[0m:
  [34m[4m--p-max-depth[0m INTEGER   The maximum rarefaction depth. Must be greater than
    [32mRange(

In [59]:
!qiime diversity alpha-rarefaction --i-table dada2_output/table_filt.qza \
                                  --p-max-depth 2800 \
                                  --p-steps 10 \
                                  --i-phylogeny tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza --m-metadata-file MicrobialMat2.tsv \
                                  --o-visualization rarefaction_curves.qzv

[32mSaved Visualization to: rarefaction_curves.qzv[0m


## Calculating diversity metrics and generating ordination plots
Common alpha and beta-diversity metrics can be calculated with a single command in QIIME2. In addition, ordination plots (such as PCoA plots for weighted UniFrac distances) will be generated automatically as well. This command will also rarefy all samples to the sample sequencing depth before calculating these metrics (X is a placeholder for the lowest reasonable sample depth; samples with depth below this cut-off will be excluded).

In [61]:
!qiime diversity core-metrics-phylogenetic --i-table dada2_output/table_filt.qza \
                                          --i-phylogeny tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza \
                                          --p-sampling-depth 2800 \
                                          --m-metadata-file MicrobialMat2.tsv \
                                          --p-n-jobs 4 \
                                          --output-dir diversity

[32mSaved FeatureTable[Frequency] to: diversity/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: diversity/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: diversity/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: diversity/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: diversity/evenness_vector.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: diversity/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: diversity/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: diversity/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: diversity/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: diversity/unweighted_unifrac_pcoa_results.qza[0m
[32mSaved PCoAResults to: diversity/weighted_unifrac_pcoa_results.qza[0m
[32mSaved PCoAResults to: diversity/jaccard_pcoa_results.qza[0m
[3

We can also use the diversity plugin to check if there are differences in alpha diversity between groups:

In [62]:
!qiime diversity alpha-group-significance \
    --i-alpha-diversity diversity/shannon_vector.qza \
    --m-metadata-file MicrobialMat2.tsv \
    --o-visualization diversity/alpha_groups.qzv

[32mSaved Visualization to: diversity/alpha_groups.qzv[0m


## Identifying differentially abundant features with ANCOM
ANCOM is one method to test for difference in the relative abundance of features between sample groupings. It is a compositional approach that makes no assumptions about feature distributions. However, it requires that all features have non-zero abundances so a pseudocount of 1 first needs to be added:

In [63]:
!qiime composition add-pseudocount --i-table dada2_output/table_filt.qza \
                                  --o-composition-table dada2_output/table_filt_pseudocount.qza

[32mSaved FeatureTable[Composition] to: dada2_output/table_filt_pseudocount.qza[0m


Then ANCOM can be run with this command:

In [64]:
!qiime composition ancom --i-table dada2_output/table_filt_pseudocount.qza \
                        --m-metadata-file MicrobialMat2.tsv \
                        --m-metadata-column Tank \
                        --output-dir ancom_output

Usage: [34mqiime composition ancom[0m [OPTIONS]

  Apply Analysis of Composition of Microbiomes (ANCOM) to identify features
  that are differentially abundant across groups.

[1mInputs[0m:
  [34m[4m--i-table[0m ARTIFACT [32mFeatureTable[Composition][0m
                       The feature table to be used for ANCOM computation.
                                                                    [35m[required][0m
[1mParameters[0m:
  [34m[4m--m-metadata-file[0m METADATA
  [34m[4m--m-metadata-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                       The categorical sample metadata column to test for
                       differential abundance across.               [35m[required][0m
  [34m--p-transform-function[0m TEXT [32mChoices('sqrt', 'log', 'clr')[0m
                       The method applied to transform feature values before
                       generating volcano plots.              [35m[default: 'clr'][0m
  [34m--p

Exporting the final abundance and sequence files
Lastly, to get the BIOM file and FASTA file for your dataset to plug into other programs you can use these commands:

In [None]:
qiime tools export dada2_output/table_filt.qza --output-dir dada2_output_exported
qiime tools export dada2_output/rep_seqs_filt.qza --output-dir dada2_output_exported