# Week 7: Beta Diversity Analysis, Rarefaction and Significance Tests


[1. Download and import datasets](#sec1)                
[2. Beta diversity](#sec2)              
[3. Bonus: fun with pandas](#sec3)              

In [6]:
import os
import pandas as pd
import qiime2 as q2
from skbio import OrdinationResults
from qiime2 import Visualization
import matplotlib.pyplot as plt
from seaborn import scatterplot

%matplotlib inline

In [None]:
data_dir = 

<a id='sec1'></a>

# 1. Import metadata

Metadata of refers to students additional informations we collected into the file "cleaned_sample_meta_data.tsv".

In [None]:
df_meta = pd.read_csv('cleaned_sample_meta_data.tsv', sep='\t')

# 2.Compute diversity

#### **a)** PCoA plot inspection

Beta diversity measures the similarity between samples or groups of samples.        
To inspect groupings of beta diversity metrics across metadata categories, we will start by inspecting the principal coordinates (PCoA) plots created with the `qiime diversity core-metrics-phylogenetic` method before. 

In [None]:
# command already done in week 6
! qiime diversity core-metrics-phylogenetic \
  --i-table $data_dir/feature-table.qza \
  --i-phylogeny $data_dir/insertion-tree.qza \
  --m-metadata-file $data_dir/cleaned_sample_meta_data.tsv \
  --p-sampling-depth 1500 \
  --output-dir $data_dir/core-metrics-results

**Unweighted unifrac emperor**

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/unweighted_unifrac_emperor.qzv')

**Bray curtis emperor**

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/bray_curtis_emperor.qzv')

**Weighted unifrac emperor**

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/weighted_unifrac_emperor.qzv')

**Jaccard emperor**

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/jaccard_emperor.qzv')

#### **b)** Permanova testing of categorical variables associations

Associations between beta diversity and categorical variables can be statistically tested using a PERMANOVA test. This is a non-parametric statistical test that checks the null hypothesis that the distances between samples of one group are equivalent to distances to samples of another group. If this null hypothesis is rejected, we can infer that the distances between samples of one group differ significantly from the distances to samples in at least one other group. We can perform a PERMANOVA test checking whether the observed categories are significantly grouped in QIIME 2 with the `qiime diversity beta-group-significance` method: 

**Unweighted unifrac emperor**

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $data_dir/core-metrics-results/unweighted_unifrac_distance_matrix.qza \
    --m-metadata-file $data_dir/cleaned_sample_meta_data.tsv \
    --m-metadata-column env \ ### change variable
    --p-pairwise \
    --o-visualization $data_dir/core-metrics-results/uw_unifrac-env-significance.qzv

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/uw_unifrac-env-significance.qzv')

**Bray curtis emperor**

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $data_dir/core-metrics-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file $data_dir/cleaned_sample_meta_data.tsv\
    --m-metadata-column env \ ### change variable
    --p-pairwise \
    --o-visualization $data_dir/core-metrics-results/bray_curtis_env-significance.qzv

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/bray_curtis_env-significance.qzv')

**Weighted unifrac emperor**

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $data_dir/core-metrics-results/weighted_unifrac_distance_matrix.qza \
    --m-metadata-file $data_dir/cleaned_sample_meta_data.tsv \
    --m-metadata-column env \ ### change variable
    --p-pairwise \
    --o-visualization $data_dir/core-metrics-results/weighted_unifrac_env-significance.qzv

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/weighted_unifrac_env-significance.qzv')

**Jaccard emperor**

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix $data_dir/core-metrics-results/jaccard_distance_matrix.qza \
    --m-metadata-file $data_dir/cleaned_sample_meta_data.tsv\
    --m-metadata-column env \ ### change variable
    --p-pairwise \
    --o-visualization $data_dir/core-metrics-results/jaccard_env-significance.qzv

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/jaccard_env-significance.qzv')

**What does the performed PERMANOVA test tell us about the differences in beta diversity of "env" groupings? 
What specific pairs of environments are significantly different from each other?**   













#### **c)** Adonis implementation pf PERMANOVA tests 

The `adonis` implementation of PERMANOVA (part of the r-vegan package) accepts a formula as input, which can consist of one or more independent terms. This might be useful for testing which covariates explain the most variation in our datasets.

In [7]:
! qiime diversity adonis \
    --i-distance-matrix $data_dir/core-metrics-results/jaccard_distance_matrix.qza \ #change distance matrix
    --m-metadata-file $data_dir/cleaned_sample_meta_data.tsv \
    --p-formula "treatment*block" \ # change variables as well as + or *
    --o-visualization $data_dir/core-metrics-results/AD-jaccard-treatXbloc.qzv

Usage: [94mqiime diversity adonis[0m [OPTIONS]

  Determine whether groups of samples are significantly different from one
  another using the ADONIS permutation-based statistical test in vegan-R.
  The function partitions sums of squares of a multivariate data set, and is
  directly analogous to MANOVA (multivariate analysis of variance). This
  action differs from beta_group_significance in that it accepts R formulae
  to perform multi-way ADONIS tests; beta_group_signficance only performs
  one-way tests. For more details, consult the reference manual available on
  the CRAN vegan page: https://CRAN.R-project.org/package=vegan

[1mInputs[0m:
  [94m[4m--i-distance-matrix[0m ARTIFACT
    [32mDistanceMatrix[0m     Matrix of distances between pairs of samples.
                                                                    [35m[required][0m
[1mParameters[0m:
  [94m[4m--m-metadata-file[0m METADATA...
    (multiple          Sample metadata containing formula terms.
   

<a id='sec3'></a>