# Alpha Diversity Analysis

Measurement of within sample diversity.

In [1]:
import os
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

%matplotlib inline

In [2]:
# location of the data and all the results produced by this notebook 

data_dir = 'seq_data_new'

# 1. Data import

### 1.1 Metadata

In [3]:
metadata_df = pd.read_csv(f'project_data/sample_meta_data.tsv', sep='\t', index_col=0)

In [11]:
metadata_df.shape

(523, 56)

### 1.2 Feature Table

Load feature table visulaisation produced in Sequence_import script:

In [5]:
Visualization.load(f'seq_data_new/dada2_table.qzv')

### 1.3 Pre created phylogenetic tree

Load created phylogenetic tree _____ made in workbook XXX for our dataset. We need a pre-created phylogenetic tree for our dataset as some diversity metrics in our further analysis need to know the relatedness between sequences.

### here load the tree we make from HW 5!

In [6]:
! qiime empress tree-plot \
    --i-tree project_data/alpha-diversity-insertion-tree.qza \
    --o-visualization $data_dir/insertion-tree.qzv

[32mSaved Visualization to: seq_data_new/insertion-tree.qzv[0m
[0m

In [7]:
Visualization.load(f'{data_dir}/insertion-tree.qzv')

# 2. Alpha rarefaction

To decide the threshold for rarefying interactive alpha rarefaction curves are produced with the alpha-rarefaction action.

In [13]:
! qiime diversity alpha-rarefaction \
    --i-table $data_dir/dada2_table.qza \ #here insert the feature table from the sequence import script
    --i-phylogeny project_data/alpha-diversity-insertion-tree.qza \  #Here insert our created tree from our dataset
    --p-max-depth 20000 \ #set the max depth to a reasonable value so not too much data gets lost
    --m-metadata-file project_data/sample_meta_data.tsv \  #here load the metadata
    --o-visualization $data_dir/alpha-rarefaction.qzv

IndentationError: unexpected indent (3134595576.py, line 2)

In [9]:
Visualization.load(f'{data_dir}/alpha-rarefaction.qzv')

ValueError: seq_data_new/alpha-rarefaction.qzv does not exist.

The top plot in the visulaization shows the alpha diversity over different sequencing depth in our data. When teh curve reaches a plateau, higher sequencing depth would not result in a different estimated sample diversity metric.
The bottom plot visualized the remaining sample count, when the feature table is rarefied to the specific sample depth shown on the x-axis.
The goal is to select a sequencing depth for rarefaction at which sample loss is minimized while alpha diversity is maximized. This serves as rarefying threshold in the folowing step.

Here sample depht ___ is choosen as at this depth it reaches the plateau in alphadiversity while not too many samples are lost. By investigating the above created featrue table, ___ percent of the samples are lost at this sequencing depth. The main lost lies in group ____.


# 3. Diversity Analysis

Now the various diversity metrics at choosen raarefaction depth are outputted with the core-metrics-phylogenetic function. This function rarefies the feature table and at the same time calculated diversity metrics for it.

In [None]:
! qiime diversity core-metrics-phylogenetic \
  --i-table $data_dir/feature-table.qza \
  --i-phylogeny $data_dir/insertion-tree.qza \
  --m-metadata-file project_data/sample_meta_data.tsv \
  --p-sampling-depth 1500 \  #insert the choosen sequencing depth
  --output-dir $data_dir/core-metrics-results

## 3.1 Alpha diversity

### Association with categorical valiables

To test for significant differenced of alpha diversity we run the Kruskal-Wallis test to check which categorial valiebles form the metadata are strongly associated with the within sample diversity. (With the  `qiime diversity alpha-group-significance` function)

In [None]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir/core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file $data_dir/metadata_proc.tsv \
  --o-visualization $data_dir/core-metrics-results/faith-pd-group-significance.qzv

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/faith-pd-group-significance.qzv')

As per definition "the null hypothesis is that the medians of all groups are equal, and the alternative hypothesis is that at least one population median of one group is different from the population median of at least one other group. A significant Kruskal–Wallis test indicates that at least one sample stochastically dominates one other sample." (Wikipedia). As the columns ______ have a significant p-value, they are all associated with differences in microbial community richness.

### Association with numerical valiables

To make an additional test for significant differenced of alpha diversity we run the Spearman correlation test to check which numerical valiebles form the metadata are strongly associated with the within sample diversity. (with the `qiime diversity alpha-correlation` function)

In [None]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity $data_dir/core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file $data_dir/metadata_proc.tsv \
  --o-visualization $data_dir/core-metrics-results/faith-pd-group-significance-numeric.qzv

In [None]:
Visualization.load(f'{data_dir}/core-metrics-results/faith-pd-group-significance-numeric.qzv')

### ANOVA test