# PoopPower: Differential abundance analysis

In the previous exercise we saw how to compare samples using alpha and beta diversity metrics. Now we will see with  _differential abundance_ how to test whether individual ASVs/taxa differ in abundance between samples' groups.
We are going to use the trustfull test  ANCOM : a compositionally aware alternative that allows to test for differentially abundant features. 

In [2]:
# importing all required packages & notebook extensions at the start of the notebook
import os
import matplotlib.pyplot as plt
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization
import seaborn as sns
from scipy.stats import shapiro, kruskal, f_oneway

In [3]:
div_dir = 'poop_data/Diversity'
phy_dir = 'poop_data/Phylogeny'
tox_dir = 'poop_data/Taxonomy'
den_dir= 'poop_data/Denoising'
diff_abu= 'poop_data/Differential_abundance'
data_dir = 'poop_data'

%matplotlib inline

##  ANCOM

Analyze this dataset given its limitations: 
1.We will start by filtering our feature table and only retain features that are present at some minimal frequency (30) and in at least 4 samples. This can improve resolution and limit FDR (false discovery rate) penalty on features that are too far below the noise threshhold to be applicable to a statistical test. We can use the `filter-features` action from the `feature-table` plugin. 

In [3]:
! qiime feature-table filter-features \
    --i-table $den_dir/dada2_table.qza \
    --p-min-frequency 5000 \
    --o-filtered-table $diff_abu/table_abund.qza

[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund.qza[0m
[0m

##  ANCOM: GEN_sex

In [4]:
! qiime feature-table filter-samples \
    --i-table $diff_abu/table_abund.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[GEN_sex]='male' or [GEN_sex]='female'" \
    --o-filtered-table $diff_abu/table_abund_female_male.qza

! qiime composition add-pseudocount \
    --i-table $diff_abu/table_abund_female_male.qza \
    --o-composition-table $diff_abu/table_abund_F_M_comp.qza
    
! qiime composition ancom \
    --i-table $diff_abu/table_abund_F_M_comp.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column GEN_sex \
    --p-transform-function log \
    --o-visualization $diff_abu/ancom_SEX.qzv

[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_female_male.qza[0m
[0m[32mSaved FeatureTable[Composition] to: poop_data/Differential_abundance/table_abund_F_M_comp.qza[0m
[0m[32mSaved Visualization to: poop_data/Differential_abundance/ancom_SEX.qzv[0m
[0m

In [5]:
Visualization.load(f'{diff_abu}/ancom_SEX.qzv')

By looking at the ANCOM Volcano we can see only one statistically significant ASV at the right corner of the plot. This is **77c2dc197e6b3dbebc4ee240c6a1c559** with a **W=140**. Then we look the same feature in the table of percentile abundances of features by group:
    **1)** in 75% of the samples in the female group, 18 or fewer sequences were observed and assigned to this feature
    **2)** in 75% of the samples in the male group, 1 sequences were assigned to this feature
That would suggest that this feature is more abundant in the female samples. To further investigate these results that we have done in an ASV level we are going to see the taxa down to the species and genus level.

In [6]:
! qiime taxa collapse \
    --i-table $diff_abu/table_abund.qza \
    --i-taxonomy $tox_dir/taxonomy.qza \
    --p-level 7 \
    --o-collapsed-table $diff_abu/table_abund_species.qza

! qiime feature-table filter-samples \
    --i-table $diff_abu/table_abund_species.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[GEN_sex]='male' or [GEN_sex]='female'" \
    --o-filtered-table $diff_abu/table_abund_F_M_spec.qza

! qiime composition add-pseudocount \
    --i-table $diff_abu/table_abund_F_M_spec.qza \
    --o-composition-table $diff_abu/table_abund_F_M_spec_comp.qza

! qiime composition ancom \
    --i-table $diff_abu/table_abund_F_M_spec_comp.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column GEN_sex \
    --o-visualization $diff_abu/ancom_GEN_sex_spec.qzv

[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_species.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_F_M_spec.qza[0m
[0m[32mSaved FeatureTable[Composition] to: poop_data/Differential_abundance/table_abund_F_M_spec_comp.qza[0m
[0m[32mSaved Visualization to: poop_data/Differential_abundance/ancom_GEN_sex_spec.qzv[0m
[0m

In [7]:
Visualization.load(f'{diff_abu}/ancom_GEN_sex_spec.qzv') # by species level no significant features were found

In [8]:
! qiime taxa collapse \
    --i-table $diff_abu/table_abund.qza \
    --i-taxonomy $tox_dir/taxonomy.qza \
    --p-level 6 \
    --o-collapsed-table $diff_abu/table_abund_genus.qza

! qiime feature-table filter-samples \
    --i-table $diff_abu/table_abund_genus.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[GEN_sex]='male' or [GEN_sex]='female'" \
    --o-filtered-table $diff_abu/table_abund_F_M_gen.qza

! qiime composition add-pseudocount \
    --i-table $diff_abu/table_abund_F_M_gen.qza \
    --o-composition-table $diff_abu/table_abund_F_M_gen_comp.qza

! qiime composition ancom \
    --i-table $diff_abu/table_abund_F_M_gen_comp.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column GEN_sex \
    --o-visualization $diff_abu/ancom_GEN_sex_gen.qzv

[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_genus.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_F_M_gen.qza[0m
[0m[32mSaved FeatureTable[Composition] to: poop_data/Differential_abundance/table_abund_F_M_gen_comp.qza[0m
[0m[32mSaved Visualization to: poop_data/Differential_abundance/ancom_GEN_sex_gen.qzv[0m
[0m

In [9]:
Visualization.load(f'{diff_abu}/ancom_GEN_sex_gen.qzv') 

Now by looking at the ANCOM Volcano we can see an higher tatistically significant difference between the two gender at the both corner of the plot. 
Also by looking to this Percentile abundances of features by group we can confirm the previous assumption that would suggest that this feature is more abundant in the female samples. ( Females eat more vegetables? )

##  ANCOM: GEN_bmi_cat

In [10]:
! qiime feature-table filter-samples \
    --i-table $diff_abu/table_abund_genus.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[GEN_bmi_cat]='Overweight' or [GEN_bmi_cat]='Normal'" \
    --o-filtered-table $diff_abu/table_abund_bmi_gen.qza

! qiime composition add-pseudocount \
    --i-table $diff_abu/table_abund_bmi_gen.qza \
    --o-composition-table $diff_abu/table_abund_bmi_gen_comp.qza

! qiime composition ancom \
    --i-table $diff_abu/table_abund_bmi_gen_comp.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column GEN_bmi_cat \
    --o-visualization $diff_abu/ancom_GEN_bmi_gen.qzv

[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_bmi_gen.qza[0m
[0m[32mSaved FeatureTable[Composition] to: poop_data/Differential_abundance/table_abund_bmi_gen_comp.qza[0m
[0m[32mSaved Visualization to: poop_data/Differential_abundance/ancom_GEN_bmi_gen.qzv[0m
[0m

In [11]:
Visualization.load(f'{diff_abu}/ancom_GEN_bmi_gen.qzv') 

No significant features found

##  ANCOM: HEA_...

In [22]:
df_metadata = pd.read_csv('poop_data/metadata.tsv', sep = '\t')
df_metadata.columns

Index(['sampleid', 'GEN_age_cat', 'GEN_age_corrected', 'GEN_bmi_cat',
       'GEN_bmi_corrected', 'GEN_cat', 'GEN_collection_timestamp',
       'GEN_country', 'GEN_dog', 'GEN_elevation', 'GEN_geo_loc_name',
       'GEN_height_cm', 'GEN_host_common_name', 'GEN_last_move',
       'GEN_last_travel', 'GEN_latitude', 'GEN_level_of_education',
       'GEN_longitude', 'GEN_race', 'GEN_sample_type', 'GEN_sex',
       'GEN_weight_kg', 'HEA_acid_reflux', 'HEA_add_adhd',
       'HEA_allergic_to_peanuts', 'HEA_antibiotic_history',
       'HEA_appendix_removed', 'HEA_autoimmune',
       'HEA_bowel_movement_frequency', 'HEA_bowel_movement_quality',
       'HEA_cancer', 'HEA_cancer_treatment', 'HEA_cardiovascular_disease',
       'HEA_cdiff', 'HEA_chickenpox', 'HEA_contraceptive', 'HEA_csection',
       'HEA_diabetes', 'HEA_exercise_frequency', 'HEA_ibd', 'HEA_ibs',
       'HEA_liver_disease', 'HEA_lung_disease', 'HEA_mental_illness',
       'HEA_migraine', 'HEA_seasonal_allergies', 'HEA_sibo',
     

In [23]:
! qiime taxa collapse \
    --i-table $diff_abu/table_abund.qza \
    --i-taxonomy $tox_dir/taxonomy.qza \
    --p-level 7 \
    --o-collapsed-table $diff_abu/table_abund_genus.qza

! qiime feature-table filter-samples \
    --i-table $diff_abu/table_abund_genus.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[HEA_cdiff]='True' or [HEA_cdiff]='False'" \
    --o-filtered-table $diff_abu/table_abund_cdiff.qza

! qiime composition add-pseudocount \
    --i-table $diff_abu/table_abund_cdiff.qza \
    --o-composition-table $diff_abu/table_abund_cdiff_comp.qza

! qiime composition ancom \
    --i-table $diff_abu/table_abund_cdiff_comp.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column HEA_cdiff \
    --o-visualization $diff_abu/ancom_cdiff.qzv

[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_genus.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: poop_data/Differential_abundance/table_abund_cdiff.qza[0m
[0m[32mSaved FeatureTable[Composition] to: poop_data/Differential_abundance/table_abund_cdiff_comp.qza[0m
[0m[32mSaved Visualization to: poop_data/Differential_abundance/ancom_cdiff.qzv[0m
[0m

In [1]:
Visualization.load(f'{diff_abu}/ancom_cdiff.qzv') 

NameError: name 'Visualization' is not defined

columns were significant differences were found:


columns were no significant differences were found:
HEA_migraine, mental illness, sibo, cdiff