# Alpha rarefaction & Diversity Analysis

In [None]:
import os # version: 3.9.19
import sys # version: 3.9.19
import pandas as pd # version: 2.2.2
import qiime2 as q2 # version: 2024.5.0
from qiime2 import Visualization
from skbio import OrdinationResults
import matplotlib.pyplot as plt # version: 3.8.4
import seaborn as sns # version: 0.12.2
%matplotlib inline

# Define the data directory
data_dir = '/data'

## Alpha rarefaction

In [None]:
!qiime diversity alpha-rarefaction \
  --i-table ./data/feature_tables_dada/filtered-feature-table.qza \
  --p-max-depth  30000 \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/diversity/alpha-rarefaction.qzv


In [None]:
Visualization.load('./data/diversity/alpha-rarefaction.qzv')

----------------------------------------------------------------------------------

## Diversity analysis
#### Computing core metrics 
We chose a sampling depth of 4000, as the shannon metric plateaus shortly before it  


In [None]:
! qiime diversity core-metrics \
  --i-table ./data/feature_tables_dada/filtered-feature-table.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --p-sampling-depth 4000 \
  --output-dir ./data/diversity/core-metrics-results

-----------------------------------------------------------------------------------------------------------------------

## Alpha Diversity Analysis: Group and Correlation Significance

### Shannon Diversity: Group Significance

#### Here we perform Kruskal-Wallis tests to compare Shannon diversity across groups (categorical data) in the metadata.

In [None]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity ./data/diversity/core-metrics-results/shannon_vector.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/diversity/core-metrics-results/shannon-group-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/shannon-group-significance.qzv')

### Pielou's Evenness: Group Significance
#### Here we perform Kruskal-Wallis tests to compare Pielou's evenness across groups (categorical data) in the metadata.

In [None]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity ./data/diversity/core-metrics-results/evenness_vector.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/diversity/core-metrics-results/evenness-group-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/evenness-group-significance.qzv')

### Shannon Diversity: Correlation with Continuous Variables
#### Here we perform a Spearman correlation between Shannon diversity and numerical variables in the metadata.

In [None]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity ./data/diversity/core-metrics-results/shannon_vector.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/diversity/core-metrics-results/shannon-group-significance-numeric.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/shannon-group-significance-numeric.qzv')

### Pielou's Evenness: Correlation with Continuous Variables
#### Here we perform a Spearman correlation between Pielou's evenness and numerical variables in the metadata.

In [None]:
! qiime diversity alpha-correlation \
  --i-alpha-diversity ./data/diversity/core-metrics-results/evenness_vector.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/diversity/core-metrics-results/evenness-group-significance-numeric.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/evenness-group-significance-numeric.qzv')

-----------------------------------------------------------------------------------------------------------------------

## Beta Diversity Analysis: PCoA & PERMANOVA

### Principal Coordinate Axis (PCoA)

#### Bray-Curtis Dissimilarity

In [None]:
Visualization.load('./data/diversity/core-metrics-results/bray_curtis_emperor.qzv')

There are more visible clusters than in alpha diversity -> visually speaking. Also, the axis 1 seems to be able to describe 19.89% of the variation in the dataset.

#### Jaccard Index

In [None]:
Visualization.load('./data/diversity/core-metrics-results/jaccard_emperor.qzv')

##### Results from PCoA (Visual inspection):
No clear clustering patterns found for any groupings in Bray-Curtis or Jaccard

-----------------------------------------------------------------------------------------------------------------------

## PERMANOVA

### Bray Curtis
##### gluten_symptoms

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column gluten_symptoms \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/bray-curtis-gluten-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/bray-curtis-gluten-significance.qzv')

##### ibd_symptoms

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column ibd_symptoms \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/bray-curtis-ibd-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/bray-curtis-ibd-significance.qzv')

##### age_group

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column age_group \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/bray-curtis-age-group-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/bray-curtis-age-group-significance.qzv')

##### diet_type

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column diet_type_sample \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/bray-curtis-diet-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/bray-curtis-diet-significance.qzv')

##### bmi_category

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/bray_curtis_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column bmi_category \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/bray-curtis-bmi-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/bray-curtis-bmi-significance.qzv')

-----------------------------------------------------------------------------------

### Jaccard Index
##### gluten_symptoms

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/jaccard_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column gluten_symptoms \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/jaccard-gluten-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/jaccard-gluten-significance.qzv')

##### ibd_symptoms

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/jaccard_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column ibd_symptoms \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/jaccard-ibd-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/jaccard-ibd-significance.qzv')

##### age_group

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/jaccard_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column age_group \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/jaccard-age-group-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/jaccard-age-group-significance.qzv')

Since the overall age had a significant result, it will be plotted using seaborn and pandas.

In [None]:
data_dir_pcoa = '/home/jovyan/FunGut2/full-pipeline/data/diversity/core-metrics-results/'
pcs = q2.Artifact.load(os.path.join(data_dir_pcoa, 'jaccard_pcoa_results.qza'))

# View as an OrdinationResults object
pcs = pcs.view(OrdinationResults)

# Take the first 3 columns (PCoA axes)
pcs_data = pcs.samples.iloc[:, :3]

# Rename the columns for clarity
pcs_data.columns = ['Axis 1', 'Axis 2', 'Axis 3']

In [None]:
pcs_data_age = pd.merge(pcs_data, meta_data[['ID', 'age_group']], left_index=True, right_on='ID')

In [None]:
# Create the scatter plot
plt.figure(figsize=(8, 6))  # Adjust figure size
sns.scatterplot(
    data=pcs_data_age,
    x='Axis 1',
    y='Axis 2',
    hue='age_group', palette="colorblind")

# Adjust the legend to not overlap the graph
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title='Age Group')

# Add labels and title
plt.xlabel('Axis 1')
plt.ylabel('Axis 2')
plt.title('PCoA Plot: Beta Diversity (Jaccard Distance) Age Group')

# Show the plot
plt.tight_layout()  # Ensure everything fits nicely

# Save the plot as a PNG file
plt.savefig("./data/diversity/pcoa_plot_beta_div_jaccard_age.png", dpi=300, bbox_inches='tight')
plt.show()

##### diet_type

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/jaccard_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column diet_type_sample \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/jaccard-diet-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/jaccard-diet-significance.qzv')

Since the diet type had some significant results, they will be plotted using seaborn and pandas.

In [None]:
meta_data = pd.read_csv('./data/metadata/fungut_metadata_processed.tsv', sep='\t')
pcs_data_diet = pd.merge(pcs_data, meta_data[['ID', 'diet_type_sample']], left_index=True, right_on='ID')

In [None]:
# Define a custom color palette for more distinct colors
custom_palette = {
    'Vegan': '#2ca02c',  # Green
    'Vegetarian': '#ff7f0e',  # Orange
    'Omnivore': '#1f77b4',  # Blue
    'Omnivore but do not eat red meat': '#d62728',  # Red
    'Vegetarian but eat seafood': '#9467bd',  # Purple
    'Not provided': '#8c564b'  # Brown
}

# Create the scatter plot
plt.figure(figsize=(8, 6))  # Adjust figure size
sns.scatterplot(
    data=pcs_data_diet,
    x='Axis 1',
    y='Axis 2',
    hue='diet_type_sample',
    palette=custom_palette
)

# Adjust the legend to not overlap the graph
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title='Diet Type')

# Add labels and title
plt.xlabel('Axis 1')
plt.ylabel('Axis 2')
plt.title('PCoA Plot: Beta Diversity (Jaccard Distance) Diet Type')

# Show the plot
plt.tight_layout()  # Ensure everything fits nicely

# Save the plot as a PNG file
plt.savefig("./data/diversity/pcoa_plot_beta_div_jaccard_diet.png", dpi=300, bbox_inches='tight')
plt.show()

##### bmi_category

In [None]:
! qiime diversity beta-group-significance \
    --i-distance-matrix ./data/diversity/core-metrics-results/jaccard_distance_matrix.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --m-metadata-column bmi_category \
    --p-pairwise \
    --o-visualization ./data/diversity/core-metrics-results/jaccard-bmi-significance.qzv

In [None]:
Visualization.load('./data/diversity/core-metrics-results/jaccard-bmi-significance.qzv')