# Differential Abundance Analysis
### Load modules

In [1]:
import os
import sys
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Define the data directory
data_dir = './data'

### Filtering and Collapsing Feature Table
First, we filter the feature table. We only keep features which appear overall at least 25 times  and in at least 2 different samples. (Our feature table was already filtered after preprocessing of the metadata, therefore the input is called "filtered-feature-table.qza").  
We start with 977 features which we retrieved from Dada2 Denoising.

In [5]:
! qiime feature-table filter-features \
    --i-table ./data/filtered-feature-table.qza \
    --p-min-frequency 15 \
    --p-min-samples 2 \
    --o-filtered-table ./data/DA/table_abund.qza

[32mSaved FeatureTable[Frequency] to: ./data/DA/table_abund.qza[0m
[0m

#### Check filtered feature table

In [6]:
!qiime feature-table summarize\
    --i-table ./data/DA/table_abund.qza \
    --o-visualization ./data/DA/table_abund_summary.qzv

[32mSaved Visualization to: ./data/DA/table_abund_summary.qzv[0m
[0m

In [None]:
Visualization.load('./data/DA/table_abund_summary.qzv')

We have now 132 relevant features

Next we collapse our features at level 7. If we take a higher level (e.g. 6) we have not enough features left.

In [8]:
! qiime taxa collapse \
    --i-table ./data/DA/table_abund.qza \
    --i-taxonomy $data_dir/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza \
    --p-level 7 \
    --o-collapsed-table ./data/DA/table_abund_l7.qza

[32mSaved FeatureTable[Frequency] to: ./data/DA/table_abund_l7.qza[0m
[0m

Then we again check the feature table and see that we are now left with 71 features

In [13]:
!qiime feature-table summarize\
    --i-table ./data/DA/table_abund_l7.qza \
    --o-visualization ./data/DA/l7_table_abund_summary.qzv

[32mSaved Visualization to: ./data/DA/l7_table_abund_summary.qzv[0m
[0m

In [None]:
Visualization.load('./data/DA/l7_table_abund_summary.qzv')

# ANCOMBC
The next step will be the diffential abundance anlysis with ANCOMBC. We do a pariwise comparison for both diseases (gluten and IBD) as well between rural and urban living areas. We set the p-value to 0.1


### URBAN ANCOMBC

In [86]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula is_urban \
    --p-reference-levels 'is_urban::True' \
    --o-differentials ./data/DA/ancombc_urban_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_urban_differentials.qza \
    --p-significance-threshold 0.1 \
    --o-visualization ./data/DA/ancombc_urban_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_urban_differentials.qza \
    --o-visualization ./data/DA/ancombc_urban_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_urban_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_urban_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_urban_results.qzv[0m
[0m

In [None]:
Visualization.load("./data/DA/ancombc_urban_da_barplot.qzv")

### IBD ANCOMBC

In [77]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula ibd_symptoms \
    --p-reference-levels ibd_symptoms::symptoms \
    --o-differentials ./data/DA/ancombc_ibd_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_ibd_differentials.qza \
    --p-significance-threshold 0.05 \
    --o-visualization ./data/DA/ancombc_ibd_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_ibd_differentials.qza \
    --o-visualization ./data/DA/ancombc_ibd_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_ibd_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_ibd_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_ibd_results.qzv[0m
[0m

In [None]:
Visualization.load("./data/DA/ancombc_ibd_da_barplot.qzv")

### Gluten ANCOMBC

In [75]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula gluten_symptoms \
    --p-reference-levels gluten_symptoms::symptoms \
    --o-differentials ./data/DA/ancombc_gluten_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_gluten_differentials.qza \
    --p-significance-threshold 0.1 \
    --o-visualization ./data/DA/ancombc_gluten_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_gluten_differentials.qza \
    --o-visualization ./data/DA/ancombc_gluten_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_gluten_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_gluten_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_gluten_results.qzv[0m
[0m

In [None]:
Visualization.load("./data/DA/ancombc_gluten_da_barplot.qzv")

### Country (compare to USA)

In [9]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula country_sample \
    --p-reference-levels country_sample::USA \
    --o-differentials ./data/DA/ancombc_country_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_country_differentials.qza \
    --p-significance-threshold 0.1 \
    --o-visualization ./data/DA/ancombc_country_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_country_differentials.qza \
    --o-visualization ./data/DA/ancombc_country_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_country_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_country_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_country_results.qzv[0m
[0m

In [None]:
Visualization.load("./data/DA/ancombc_country_da_barplot.qzv")

### Diet type

In [15]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula diet_type_sample \
    --p-reference-levels diet_type_sample::Omnivore \
    --o-differentials ./data/DA/ancombc_diet_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_diet_differentials.qza \
    --p-significance-threshold 0.1 \
    --o-visualization ./data/DA/ancombc_diet_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_diet_differentials.qza \
    --o-visualization ./data/DA/ancombc_diet_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_diet_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_diet_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_diet_results.qzv[0m
[0m

In [None]:
Visualization.load("./data/DA/ancombc_diet_da_barplot.qzv")

### Age

In [17]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula age_group \
    --p-reference-levels age_group::Adult \
    --o-differentials ./data/DA/ancombc_age_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_age_differentials.qza \
    --p-significance-threshold 0.1 \
    --o-visualization ./data/DA/ancombc_age_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_age_differentials.qza \
    --o-visualization ./data/DA/ancombc_age_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_age_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_age_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_age_results.qzv[0m
[0m

In [21]:
Visualization.load("./data/DA/ancombc_age_da_barplot.qzv")

### BMI

In [19]:
# Run ANCOM-BC
! qiime composition ancombc \
    --i-table ./data/DA/table_abund_l7.qza \
    --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
    --p-formula bmi_category \
    --p-reference-levels 'bmi_category::Normal weight' \
    --o-differentials ./data/DA/ancombc_bmi_differentials.qza

# Generate a barplot of differentially abundant taxa between environments
! qiime composition da-barplot \
    --i-data ./data/DA/ancombc_bmi_differentials.qza \
    --p-significance-threshold 0.1 \
    --o-visualization ./data/DA/ancombc_bmi_da_barplot.qzv

# Generate a table of these same values for all taxa
! qiime composition tabulate \
    --i-data ./data/DA/ancombc_bmi_differentials.qza \
    --o-visualization ./data/DA/ancombc_bmi_results.qzv

[32mSaved FeatureData[DifferentialAbundance] to: ./data/DA/ancombc_bmi_differentials.qza[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_bmi_da_barplot.qzv[0m
[0m[32mSaved Visualization to: ./data/DA/ancombc_bmi_results.qzv[0m
[0m

In [23]:
Visualization.load("./data/DA/ancombc_bmi_da_barplot.qzv")

# Results
We did not find any significant differences between the disease groups and the urban_vs_rural group. This could be due to the small amount of features we have and because the symptoms_vs_noSymptom groups are very different in size.  
We found some significant differences between countries.

# PERMANOVA

In [None]:
!qiime diversity beta-group-significance --help

In [35]:
!qiime diversity beta-group-significance \
  --i-distance-matrix $data_dir/alpha_diversity/core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file $data_dir/metadata/fungut_metadata_processed.tsv \
  --m-metadata-column diet_type_sample \
  --p-method permanova \
  --p-pairwise \
  --o-visualization $data_dir/DA/permanova-results.qzv

[32mSaved Visualization to: ./data/DA/permanova-results.qzv[0m
[0m

In [36]:
Visualization.load("./data/DA/permanova-results.qzv")