**Author**: Justine Debelius (justine.debelius@ki.se)<br>
**Date**: Summer/Fall 2021<br>
**Conda enviroment**: `micc-2021.11`<br>
**Python version**: 3.6.10<br>
**Python packages**: `pystan` (v. 2.19); `patsy` (0.5.1); <br>
**QIIME 2 version**: 2020.6<br>
**QIIME 2 plugins**: `gemeilli` (v. 0.0.7); `deicode` (v. 0.2.4)'; `empress` (v 1.1.0.dev); `songbird` (v. 1.0.4)<br>

This notebook will build a feature phylogenetic tree and calculate alpha and beta diversity.

We start by building an insertion tree to allow for UniFrac distance.

In [1]:
mkdir -p data/tree/
qiime fragment-insertion sepp \
 --i-representative-sequences data/tables/rep_seqs.qza \
 --i-reference-database ../ipynb_clean-2020.6/data/reference/sepp-refs-silva-128.qza \
 --p-threads 4 \
 --o-tree data/tree/tree_silva128.qza \
 --o-placements data/tree/placements_silva128.qza

And then, we'll filter the table to drop anything that didn'it insert into the tree.

In [2]:
qiime phylogeny filter-table \
 --i-table data/tables/phylum_defined_table.qza \
 --i-tree data/tree/tree_silva128.qza \
 --o-filtered-table data/tables/phylum_defined_table.qza

[32mSaved FeatureTable[Frequency] to: data/tables/phylum_defined_table.qza[0m
[0m

Then, we'll rarefy the data to 2500 sequences/sample (to match the shallowest sample at 2500 sequences/sample)

In [3]:
qiime feature-table rarefy \
 --i-table data/tables/phylum_defined_table.qza \
 --p-sampling-depth 2500 \
 --o-rarefied-table data/tables/dada2_2500.qza

[32mSaved FeatureTable[Frequency] to: data/tables/dada2_2500.qza[0m
[0m

For alpha diversity, we'll look at observed features, shannon diversity, and simpson's evenness.

In [4]:
mkdir -p data/diversity/alpha

qiime diversity alpha \
 --i-table data/tables/dada2_2500.qza \
 --p-metric observed_features \
 --o-alpha-diversity data/diversity/alpha/observed_features.qza
 
qiime diversity alpha \
 --i-table data/tables/dada2_2500.qza \
 --p-metric shannon \
 --o-alpha-diversity data/diversity/alpha/shannon.qza
 
qiime diversity alpha \
 --i-table data/tables/dada2_2500.qza \
 --p-metric simpson \
 --o-alpha-diversity data/diversity/alpha/simpson.qza

[32mSaved SampleData[AlphaDiversity] to: data/diversity/alpha/observed_features.qza[0m
[0m[32mSaved SampleData[AlphaDiversity] to: data/diversity/alpha/shannon.qza[0m
[0m[32mSaved SampleData[AlphaDiversity] to: data/diversity/alpha/simpson.qza[0m
[0m

And then, we'll look at the rarefied distance matrices. Bray Curtis and Jaccard are non-phylogenetic; UniFrac metrics account for shared evolutionary history.

In [5]:
mkdir -p data/diversity/beta

qiime diversity beta \
 --i-table data/tables/dada2_2500.qza \
 --p-metric braycurtis \
 --o-distance-matrix data/diversity/beta/braycurtis.qza

qiime diversity beta \
--i-table data/tables/dada2_2500.qza \
 --p-metric jaccard \
 --o-distance-matrix data/diversity/beta/jaccard.qza

qiime diversity beta-phylogenetic \
--i-table data/tables/dada2_2500.qza \
 --i-phylogeny data/tree/tree_silva128.qza \
 --p-metric unweighted_unifrac \
 --o-distance-matrix data/diversity/beta/unweighted-unifrac.qza \
 --verbose
 
qiime diversity beta-phylogenetic \
 --i-table data/tables/dada2_2500.qza \
 --i-phylogeny data/tree/tree_silva128.qza \
 --p-metric weighted_unifrac \
 --o-distance-matrix data/diversity/beta/weighted-unifrac.qza

[32mSaved DistanceMatrix to: data/diversity/beta/braycurtis.qza[0m
[0m[32mSaved DistanceMatrix to: data/diversity/beta/jaccard.qza[0m
[0m[32mSaved DistanceMatrix to: data/diversity/beta/unweighted-unifrac.qza[0m
[0m[32mSaved DistanceMatrix to: data/diversity/beta/weighted-unifrac.qza[0m
[0m

I'd also like to use Aitchison distanne, a non-phylogenetic metric 

In [6]:
qiime diversity beta \
 --i-table data/tables/phylum_defined_table.qza \
 --p-metric aitchison \
 --p-pseudocount 1 \
 --o-distance-matrix data/diversity/beta/aitchison.qza


[32mSaved DistanceMatrix to: data/diversity/beta/aitchison.qza[0m
[0m

In [7]:
qiime feature-table filter-features \
 --i-table data/tables/phylum_defined_table.qza \
 --p-min-frequency 100 \
 --p-min-samples 20 \
 --o-filtered-table data/tables/abundant_table.qza 

[32mSaved FeatureTable[Frequency] to: data/tables/abundant_table.qza[0m
[0m

In [8]:
rm -r data/diversity/ctf
qiime gemelli ctf \
 --i-table data/tables/abundant_table.qza \
 --m-sample-metadata-file data/metadata_paired.tsv \
 --p-individual-id-column 'host_subject_id' \
 --p-state-column 'tissue_num' \
 --p-min-feature-count 50 \
 --output-dir data/diversity/ctf \
 --verbose

[32mSaved PCoAResults % Properties('biplot') to: data/diversity/ctf/subject_biplot.qza[0m
[32mSaved PCoAResults % Properties('biplot') to: data/diversity/ctf/state_biplot.qza[0m
[32mSaved DistanceMatrix to: data/diversity/ctf/distance_matrix.qza[0m
[32mSaved SampleData[SampleTrajectory] to: data/diversity/ctf/state_subject_ordination.qza[0m
[32mSaved FeatureData[FeatureTrajectory] to: data/diversity/ctf/state_feature_ordination.qza[0m
[0m

In [9]:

qiime diversity beta-phylogenetic \
--i-table data/tables/dada2_2500.qza \
 --i-phylogeny data/tree/tree_silva128.qza \
 --p-metric unweighted_unifrac \
 --o-distance-matrix data/diversity/beta/unweighted-unifrac.qza \
 --verbose
 


[32mSaved DistanceMatrix to: data/diversity/beta/unweighted-unifrac.qza[0m
[0m

In [10]:
qiime diversity beta-phylogenetic \
 --i-table data/tables/dada2_2500.qza \
 --i-phylogeny data/tree/tree_silva128.qza \
 --p-metric weighted_unifrac \
 --o-distance-matrix data/diversity/beta/weighted-unifrac.qza

[32mSaved DistanceMatrix to: data/diversity/beta/weighted-unifrac.qza[0m
[0m