 
# Alpha and Beta Diversity

Different higher-level measures are often used to describe the microbiome in a sample. These do not provide information on changes in the abundance of specific taxa but allow us to access a broader change or difference in the composition of microorganisms. Alpha and beta diversity are examples of such measures.

Different measures exist to estimate diversity within a single sample, jointly called alpha diversity. The different measures reflect the richness (number) or distribution (evenness) of a microbial sample or aim to reflect a combination of both properties.

Rarefaction curves are often used when calculating alpha diversity indices because increasing numbers of sequenced taxa allow increasingly accurate estimates of total population diversity. Rarefaction curves can therefore be used to estimate the full sample richness, as compared to the observed sample richness.

While alpha diversity is a measure of microbiome diversity applicable to a single sample, beta diversity is a measure of the similarity or dissimilarity of two communities. As for alpha diversity, many indices exist, each reflecting different aspects of community heterogeneity. Key differences relate to how the indices value variation in rare species if they consider presence/absence only or incorporate abundance, and how they interpret shared absence. Bray-Curtis dissimilarity is a popular measure that considers both size (overall abundance per sample) and shape (abundance of each taxon) of the communities (Bray, 1957). Beta diversity is an essential measure for many popular statistical methods in ecology, such as ordination-based methods, and is widely used for studying the association between environmental variables and microbial composition.

In summary, alpha diversity measures can be seen as a summary statistic of a single population (within-sample diversity), while beta diversity measures are estimates of similarity or dissimilarity between populations (between samples).

**Source**: (https://biomcare.com/info/key-terms-in-microbiome-projects/)

### STEP : Diversity Analysis

Using QIIME2 to create diversity analisys graphs and calculations.

- [QIIME2 Workflow Overview](https://docs.qiime2.org/2022.8/tutorials/overview/)


#### Methods
- [diversity](https://docs.qiime2.org/2022.8/plugins/available/diversity/)
- [diversity alpha](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha/)
- [diversity alpha_phylogenetic](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-phylogenetic/)
- [diversity beta](https://docs.qiime2.org/2022.8/plugins/available/diversity/beta/)
- [diversity core_metrics](https://docs.qiime2.org/2022.8/plugins/available/diversity/core-metrics/)
- [diversity alpha_group_significance](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-group-significance/)
- [diversity beta_group_significance](https://docs.qiime2.org/2022.8/plugins/available/diversity/beta-group-significance/)
- [feature_table core_features](https://docs.qiime2.org/2022.8/plugins/available/feature-table/core-features/)
- [feature_table summarize](https://docs.qiime2.org/2022.8/plugins/available/feature-table/summarize/)
- [taxa filter-table](https://docs.qiime2.org/2022.8/plugins/available/taxa/filter-table/)
- [taxa collapse](https://docs.qiime2.org/2022.8/plugins/available/taxa/collapse/)

## Setup and settings

In [1]:
# Importing packages
import os
import pandas as pd
from qiime2 import Artifact
from qiime2 import Visualization
from qiime2 import Metadata

from qiime2.plugins.phylogeny.pipelines import align_to_tree_mafft_fasttree

from qiime2.plugins.diversity.pipelines import alpha
from qiime2.plugins.diversity.pipelines import beta
from qiime2.plugins.diversity.pipelines import core_metrics
from qiime2.plugins.diversity.pipelines import alpha_phylogenetic

from qiime2.plugins.diversity.visualizers import alpha_group_significance
from qiime2.plugins.diversity.visualizers import beta_group_significance
from qiime2.plugins.diversity.visualizers import alpha_correlation
from qiime2.plugins.diversity.visualizers import beta_rarefaction

from qiime2.plugins.taxa.methods import filter_table
from qiime2.plugins.taxa.methods import collapse

from qiime2.plugins.feature_table.visualizers import tabulate_seqs
from qiime2.plugins.feature_table.visualizers import summarize
from qiime2.plugins.feature_table.visualizers import core_features
from qiime2.plugins.diversity.pipelines import core_metrics_phylogenetic

from qiime2.plugins.feature_table.methods import filter_samples
from qiime2.plugins.feature_table.methods import filter_seqs

from qiime2.plugins.alignment.methods import mafft


import matplotlib.pyplot as plt

%matplotlib inline

### Receiving the parameters

The following cell can receive parameters using the [papermill](https://papermill.readthedocs.io/en/latest/) tool.

In [2]:
base_dir = os.path.join('/', 'home')
metadata_file = os.path.abspath(os.path.join(base_dir, 'data', 'metadatada.tsv'))
experiment_name = ''
class_col = ''
replace_files = False

In [3]:
# Parameters
PAPERMILL_INPUT_PATH = "nb-templates/step-diversity-analysis.ipynb"
PAPERMILL_OUTPUT_PATH = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/experiments/ana-flavia-STD-NCxSTD-NR-trim/nb-executed-steps/step-diversity-analysis-ana-flavia-STD-NCxSTD-NR-trim.ipynb"
experiment_name = "ana-flavia-STD-NCxSTD-NR-trim"
base_dir = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri"
manifest_file = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/data/raw/manifest/manifest-ana-flavia-STD-NCxSTD-NR.csv"
metadata_file = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/data/raw/metadata/metadata-ana-flavia-STD-NCxSTD-NR.tsv"
class_col = "group-id"
classifier_file = "/home/lauro/nupeb/rede-micro/models/silva-138-99-nb-classifier.qza"
top_n = 20
replace_files = False
phred = 20
trunc_f = 0
trunc_r = 0
overlap = 12
threads = 6
trim = {
    "overlap": 8,
    "forward_primer": "CCTACGGGRSGCAGCAG",
    "reverse_primer": "GGACTACHVGGGTWTCTAAT",
}


In [4]:
experiment_folder = os.path.abspath(os.path.join(base_dir, 'experiments', experiment_name))
img_folder = os.path.abspath(os.path.join(experiment_folder, 'imgs'))

### Defining names, paths and flags

In [5]:
# QIIME2 Artifacts folder
qiime_folder = os.path.join(experiment_folder, 'qiime-artifacts')

# Input - DADA2 Artifacts
dada2_tabs_path = os.path.join(qiime_folder, 'dada2-tabs.qza')
dada2_reps_path = os.path.join(qiime_folder, 'dada2-reps.qza')
dada2_stat_path = os.path.join(qiime_folder, 'dada2-stat.qza')

# Input - Taxonaomic Artifacts
taxonomy_path = os.path.join(qiime_folder, 'metatax.qza')

# Create folder to store Alpha files
alpha_path = os.path.join(qiime_folder, 'alpha-analysis')
if not os.path.exists(alpha_path):
    os.makedirs(alpha_path)
    print(f'The new directory is created in {alpha_path}')
    
# Create folder to store Beta files
beta_path = os.path.join(qiime_folder, 'beta-analysis')
if not os.path.exists(beta_path):
    os.makedirs(beta_path)
    print(f'The new directory is created in {beta_path}')

# Output -Diversity Artifacts
alpha_diversity_path = os.path.join(alpha_path, 'alpha-diversity.qza')
alpha_diversity_view_path = os.path.join(alpha_path, 'alpha-diversity.qzv')
beta_diversity_path = os.path.join(beta_path, 'beta-diversity.qza')
beta_diversity_view_path = os.path.join(beta_path, 'beta-diversity.qzv')

In [6]:
def filter_and_collapse(tab, seqs, tax, meta, lvl, exclude=True, exclude_list='uncultured,unidentified,metagenome'):
    from qiime2.plugins.taxa.methods import collapse
    from qiime2.plugins.taxa.methods import filter_table
    from qiime2.plugins.feature_table.methods import filter_seqs
    from qiime2.plugins.feature_table.visualizers import summarize
    
    to_include = ('d', 'p', 'c', 'o', 'f', 'g', 's')[lvl-1]
    to_include += '__'
    to_exclude = exclude_list if exclude else None
    
    filtered_tabs = filter_table(
        table=tab, 
        taxonomy=tax,
        include=to_include,
        exclude=to_exclude,
        mode='contains').filtered_table
    
    filtered_seqs = filter_seqs(
        data = seqs,
        table = filtered_tabs,
    ).filtered_data
    
    collapsed_table = collapse(table=filtered_tabs, taxonomy=tax, level=lvl).collapsed_table
    collapsed_table_view = summarize(table=collapsed_table, sample_metadata=meta).visualization
    
    return collapsed_table, collapsed_table_view, filtered_seqs

## Step execution

### Load input files

This Step import the QIIME2 `FeatureTable[Frequency]` Artifact and the `Metadata` file.

In [7]:
#Load Metadata
metadata_qa = Metadata.load(metadata_file)

#Load FeatureTable[Frequency]
tabs = Artifact.load(dada2_tabs_path)
tabs_df = tabs.view(Metadata).to_dataframe().T

# FeatureData[Sequence]
reps = Artifact.load(dada2_reps_path)

# FeatureData[Taxonomy]
tax = Artifact.load(taxonomy_path)

In [8]:
# Filter FeatureTable[Frequency | RelativeFrequency | PresenceAbsence | Composition] based on Metadata sample ID values
tabs = filter_samples(
    table=tabs,
    metadata=metadata_qa,
).filtered_table
# Filter SampleData[SequencesWithQuality | PairedEndSequencesWithQuality | JoinedSequencesWithQuality] based on Metadata sample ID values; returns FeatureData[Sequence | AlignedSequence]
reps = filter_seqs(
    data=reps,
    table=tabs,
).filtered_data

{'min_frequency': 0, 'max_frequency': None, 'min_features': 0, 'max_features': None, 'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'where': None, 'exclude_ids': False, 'filter_empty_features': True, 'table': 2732 x 74 <class 'biom.table.Table'> with 12512 nonzero entries (6% dense)}
{'min_frequency': 0, 'max_frequency': None, 'min_features': 0, 'max_features': None, 'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'where': None, 'exclude_ids': False, 'filter_empty_features': True, 'table': 2732 x 74 <class 'biom.table.Table'> with 12512 nonzero entries (6% dense)}


{'metadata': None, 'where': None, 'exclude_ids': False, 'data': c688e3b1ada46ed57f5b7e7e0d56664f    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
bc798de4a9acd3ff7ba51c244523be14    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
fca5b1ccd94b107a3ddf3e99feaafb6f    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
5ba9e679e692fcbf933c554317e03c5f    (((T)), ((A)), ((G)), ((G)), ((G)), ((A)), ((A...
04581eba9e6fd12787fb5948fcef030f    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
                                                          ...                        
dbc83a2f59a61f2cc57971a85718406b    (((T)), ((G)), ((G)), ((G)), ((G)), ((A)), ((A...
4393f0be1354f7198d1969c44d34d44a    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
a86e1aaa4715133f8fa2c0f52e03e6d7    (((T)), ((G)), ((G)), ((G)), ((G)), ((A)), ((A...
aa37170c6d59f9077a2644f8c25df05d    (((T)), ((G)), ((G)), ((G)), ((G)), ((A)), ((A...
9c3f20cb7568feda60cc7776400d8d64    (((G)), ((A)), ((A)), ((A)), ((T)), ((G)

{'metadata': None, 'where': None, 'exclude_ids': False, 'data': c688e3b1ada46ed57f5b7e7e0d56664f    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
bc798de4a9acd3ff7ba51c244523be14    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
fca5b1ccd94b107a3ddf3e99feaafb6f    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
5ba9e679e692fcbf933c554317e03c5f    (((T)), ((A)), ((G)), ((G)), ((G)), ((A)), ((A...
04581eba9e6fd12787fb5948fcef030f    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
                                                          ...                        
dbc83a2f59a61f2cc57971a85718406b    (((T)), ((G)), ((G)), ((G)), ((G)), ((A)), ((A...
4393f0be1354f7198d1969c44d34d44a    (((T)), ((G)), ((A)), ((G)), ((G)), ((A)), ((A...
a86e1aaa4715133f8fa2c0f52e03e6d7    (((T)), ((G)), ((G)), ((G)), ((G)), ((A)), ((A...
aa37170c6d59f9077a2644f8c25df05d    (((T)), ((G)), ((G)), ((G)), ((G)), ((A)), ((A...
9c3f20cb7568feda60cc7776400d8d64    (((G)), ((A)), ((A)), ((A)), ((T)), ((G)

## Alpha diversity analysis

#### Reference
- [The Use and Types of Alpha-Diversity Metrics in Microbial NGS](https://www.cd-genomics.com/microbioseq/the-use-and-types-of-alpha-diversity-metrics-in-microbial-ngs.html)
- [Alpha diversity metrics](http://scikit-bio.org/docs/0.2.0/generated/skbio.diversity.alpha.html)

#### Methods
- [diversity alpha](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha/): Computes a user-specified alpha diversity metric for all samples in a
feature table.
- [diversity alpha_phylogenetic](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-phylogenetic/): Computes a user-specified phylogenetic alpha diversity metric for all
samples in a feature table.
- [diversity alpha_correlation](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-correlation/): Determine whether numeric sample metadata columns are correlated with alpha diversity.
- [diversity alpha_group_significance](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-group-significance/): Visually and statistically compare groups of alpha diversity values.

### Compute Alpha Diversity vectors
- [diversity alpha](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha/): Computes a user-specified alpha diversity metric for all samples in a feature table.
- [Alpha diversity metrics](http://scikit-bio.org/docs/0.2.0/generated/skbio.diversity.alpha.html)
 - Choices: ('ace', 'berger_parker_d', 'brillouin_d', 'chao1', 'chao1_ci', 'dominance', 'doubles', 'enspie', 'esty_ci', 'fisher_alpha', 'gini_index', 'goods_coverage', 'heip_e', 'kempton_taylor_q', 'lladser_pe', 'margalef', 'mcintosh_d', 'mcintosh_e', 'menhinick', 'michaelis_menten_fit', 'observed_features', 'osd', 'pielou_e', 'robbins', 'shannon', 'simpson', 'simpson_e', 'singles', 'strong')

In [9]:
metrics = ('ace', 'berger_parker_d', 'brillouin_d', 'chao1', 'chao1_ci', 'dominance', 'doubles', 'enspie', 'esty_ci', 'fisher_alpha', 'gini_index', 'goods_coverage', 'heip_e', 'kempton_taylor_q', 'lladser_pe', 'margalef', 'mcintosh_d', 'mcintosh_e', 'menhinick', 'michaelis_menten_fit', 'observed_features', 'osd', 'pielou_e', 'robbins', 'shannon', 'simpson', 'simpson_e', 'singles', 'strong')

# Sugestão de valores para Alpha diversity
# chao1 e observed_features (riqueza); shannon e simpson (diversidade - que levam em consideração riqueza e equitabilidade).
metrics = ('chao1', 'chao1_ci', 'observed_features', 'shannon', 'simpson', 'simpson_e')
alpha_diversities = dict()
for metric in metrics:
    print(f"Calculating alpha diversity: {metric}")
    try:
        alpha_diversity = alpha(table=tabs, metric=metric).alpha_diversity
        alpha_diversities[metric] = alpha_diversity
        # Save SampleData[AlphaDiversity] Artifact
        file_path = os.path.join(alpha_path, f'alpha-values-{metric}.qza')
        alpha_diversity.save(file_path)
        print(f"DONE: Calculating alpha diversity: {metric}")
    except Exception as e:
        print(f"ERROR: Calculating alpha diversity: {metric}")
        print(e)

Calculating alpha diversity: chao1
{'metric': 'chao1', 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'chao1', 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'chao1', 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
DONE: Calculating alpha diversity: chao1
Calculating alpha diversity: chao1_ci
{'metric': 'chao1_ci', 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'chao1_ci', 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'chao1_ci', 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
DONE: Calculating alpha diversity: chao1_ci
Calculating alpha diversity: observed_features
{'metric': 'observed_features', 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'table': 1263 x 14 <class 

DONE: Calculating alpha diversity: observed_features
Calculating alpha diversity: shannon
{'metric': 'shannon', 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'drop_undefined_samples': False, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'drop_undefined_samples': False, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
DONE: Calculating alpha diversity: shannon
Calculating alpha diversity: simpson
{'metric': 'simpson', 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'simpson', 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'simpson', 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
DONE: Calculating alpha diversity: simpson
Calculating alpha diversity: simpson_e
{'metric': 'simpson_e', 'table': <artifact: FeatureTable[Frequency] uuid: 

DONE: Calculating alpha diversity: simpson_e


### Create Phylogenetic inference

- [alignment align_to_tree_mafft_fasttree](https://docs.qiime2.org/2022.8/plugins/available/phylogeny/align-to-tree-mafft-fasttree/): Build a phylogenetic tree using fasttree and mafft alignment

This pipeline will start by creating a sequence alignment using MAFFT,
after which any alignment columns that are phylogenetically uninformative
or ambiguously aligned will be removed (masked). The resulting masked
alignment will be used to infer a phylogenetic tree and then subsequently
rooted at its midpoint. Output files from each step of the pipeline will be
saved. This includes both the unmasked and masked MAFFT alignment from
q2-alignment methods, and both the rooted and unrooted phylogenies from
q2-phylogeny methods.


Returns
- alignment : FeatureData[AlignedSequence] : The aligned sequences.
- masked_alignment : FeatureData[AlignedSequence] : The masked alignment.
- tree : Phylogeny[Unrooted] : The unrooted phylogenetic tree.
- rooted_tree : Phylogeny[Rooted] : The rooted phylogenetic tree.

In [10]:
mafft_alignment, mafft_masked_alignment, mafft_tree, mafft_rooted_tree = align_to_tree_mafft_fasttree(
    sequences=reps, n_threads=6, )

{'n_threads': 6, 'mask_max_gap_frequency': 1.0, 'mask_min_conservation': 0.4, 'parttree': False, 'sequences': <artifact: FeatureData[Sequence] uuid: 01b41415-fd85-460b-9ebd-61f1f4fa7246>}
{'n_threads': 6, 'parttree': False, 'sequences': <q2_types.feature_data._format.DNAFASTAFormat object at 0x7fdc84164070>}
{'n_threads': 6, 'parttree': False, 'sequences': <q2_types.feature_data._format.DNAFASTAFormat object at 0x7fdc84164070>}
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: mafft --preservecase --inputorder --thread 6 /tmp/qiime2-archive-he6qwmt9/01b41415-fd85-460b-9ebd-61f1f4fa7246/data/dna-sequences.fasta



inputfile = orig
1263 x 430 - 253 d
nthread = 6
nthreadpair = 6
nthreadtb = 6
ppenalty_ex = 0
stacksize: 8192 kb
generating a scoring matrix for nucleotide (dist=200) ... done
Gap Penalty = -1.53, +0.00, +0.00



Making a distance matrix ..
    1 / 1263 (thread    0)  101 / 1263 (thread    2)  201 / 1263 (thread    2)  301 / 1263 (thread    0)  401 / 1263 (thread    5)

  501 / 1263 (thread    1)  601 / 1263 (thread    3)  701 / 1263 (thread    3)  801 / 1263 (thread    0)  901 / 1263 (thread    5) 1001 / 1263 (thread    5) 1101 / 1263 (thread    1) 1201 / 1263 (thread    3)
done.

Constructing a UPGMA tree (efffree=0) ... 
    0 / 1263   10 / 1263   20 / 1263   30 / 1263   40 / 1263   50 / 1263   60 / 1263   70 / 1263   80 / 1263   90 / 1263  100 / 1263  110 / 1263  120 / 1263  130 / 1263  140 / 1263  150 / 1263  160 / 1263  170 / 1263  180 / 1263  190 / 1263  200 / 1263  210 / 1263  220 / 1263  230 / 1263  240 / 1263  250 / 1263  260 / 1263  270 / 1263  280 / 1263  290 / 1263  300 / 1263  310 / 1263  320 / 1263  330 / 1263  340 / 1263  350 / 1263  360 / 1263  370 / 1263  380 / 1263  390 / 1263  400 / 1263  410 / 1263  420 / 1263  430 / 1263  440 / 1263  450 / 1263  460 / 1263  470 / 1263  480 / 1263  490 / 1263  500 / 1263  510 / 1263  520 / 1263  530 / 1263  540 / 1263  550 / 1263  56

STEP   274 / 1262 (thread    1) fSTEP   275 / 1262 (thread    2) fSTEP   276 / 1262 (thread    3) fSTEP   277 / 1262 (thread    4) fSTEP   278 / 1262 (thread    5) fSTEP   279 / 1262 (thread    0) fSTEP   280 / 1262 (thread    1) fSTEP   281 / 1262 (thread    2) fSTEP   282 / 1262 (thread    3) fSTEP   283 / 1262 (thread    4) fSTEP   284 / 1262 (thread    0) fSTEP   285 / 1262 (thread    5) fSTEP   286 / 1262 (thread    1) fSTEP   287 / 1262 (thread    2) fSTEP   288 / 1262 (thread    3) fSTEP   289 / 1262 (thread    4) fSTEP   290 / 1262 (thread    0) fSTEP   291 / 1262 (thread    3) fSTEP   292 / 1262 (thread    4) fSTEP   293 / 1262 (thread    2) fSTEP   294 / 1262 (thread    0) fSTEP   295 / 1262 (thread    5) fSTEP   296 / 1262 (thread    1) fSTEP   297 / 1262 (thread    3) fSTEP   298 / 1262 (thread    4) fSTEP   299 / 1262 (thread    3) fSTEP   300 / 1262 (thread    0) fSTEP   301 / 1262 (thread  

STEP   901 / 1262 (thread    2) f
Reallocating..done. *alloclen = 1861
STEP  1001 / 1262 (thread    4) fSTEP  1101 / 1262 (thread    0) f

STEP  1201 / 1262 (thread    1) f


done.

Making a distance matrix from msa.. 
    0 / 1263 (thread    1)  100 / 1263 (thread    0)  200 / 1263 (thread    0)  300 / 1263 (thread    4)  400 / 1263 (thread    2)

  500 / 1263 (thread    0)  600 / 1263 (thread    1)  700 / 1263 (thread    3)  800 / 1263 (thread    0)  900 / 1263 (thread    5) 1000 / 1263 (thread    4) 1100 / 1263 (thread    1) 1200 / 1263 (thread    1)
done.

Constructing a UPGMA tree (efffree=1) ... 
    0 / 1263   10 / 1263   20 / 1263   30 / 1263   40 / 1263   50 / 1263   60 / 1263   70 / 1263   80 / 1263   90 / 1263  100 / 1263  110 / 1263  120 / 1263  130 / 1263  140 / 1263  150 / 1263  160 / 1263  170 / 1263  180 / 1263  190 / 1263  200 / 1263  210 / 1263  220 / 1263  230 / 1263  240 / 1263  250 / 1263  260 / 1263  270 / 1263  280 / 1263  290 / 1263  300 / 1263  310 / 1263  320 / 1263  330 / 1263  340 / 1263  350 / 1263  360 / 1263  370 / 1263  380 / 1263  390 / 1263  400 / 1263  410 / 1263  420 / 1263  430 / 1263  440 / 1263  450 / 1263  460 / 1263  470 / 1263  480 / 1263  490 / 1263  500 / 1263  510 / 1263  520 / 1263  530 / 1263  540 / 1263  550 / 1263  56

STEP   106 / 1262 (thread    5) fSTEP   107 / 1262 (thread    2) fSTEP   108 / 1262 (thread    4) fSTEP   109 / 1262 (thread    1) fSTEP   110 / 1262 (thread    0) fSTEP   111 / 1262 (thread    3) fSTEP   112 / 1262 (thread    2) fSTEP   113 / 1262 (thread    4) fSTEP   114 / 1262 (thread    5) fSTEP   115 / 1262 (thread    4) fSTEP   116 / 1262 (thread    1) fSTEP   117 / 1262 (thread    0) fSTEP   118 / 1262 (thread    3) fSTEP   119 / 1262 (thread    2) fSTEP   120 / 1262 (thread    5) fSTEP   122 / 1262 (thread    1) fSTEP   123 / 1262 (thread    2) fSTEP   124 / 1262 (thread    3) fSTEP   125 / 1262 (thread    4) fSTEP   126 / 1262 (thread    1) fSTEP   127 / 1262 (thread    2) fSTEP   128 / 1262 (thread    5) fSTEP   121 / 1262 (thread    0) fSTEP   130 / 1262 (thread    3) fSTEP   131 / 1262 (thread    2) fSTEP   132 / 1262 (thread    1) fSTEP   133 / 1262 (thread    2) fSTEP   134 / 1262 (thread  

STEP   901 / 1262 (thread    2) fSTEP  1001 / 1262 (thread    0) f
Reallocating..done. *alloclen = 1870
STEP  1101 / 1262 (thread    2) fSTEP  1201 / 1262 (thread    2) f


done.

disttbfast (nuc) Version 7.490
alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
6 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --leavegappyregion option.



{'max_gap_frequency': 1.0, 'min_conservation': 0.4, 'alignment': TabularMSA[DNA]
-----------------------------------------------------------------------
Stats:
    sequence count: 1263
    position count: 522
-----------------------------------------------------------------------
TGAGGAATATTGGTCAATGGAGGCAACTCTGAA ... GCTGAGGCTCGAAGGTGCGGGTATCGAACAGG-
TGAGGAATATTGGTCAATGGCCGGAAGGCTGAA ... GCTGAGGCACGAAAGTGCGGGGATCAAACAGG-
...
TGAGGAATATTGGTCAATGGGCGGAAGCCTGAA ... GCTGAGGCTCGAAAGCGTGGGGAGCAAACAGG-
TCGGGAATATTGCGCAATGGAGGAAACTCTGAC ... GTTGAGGCACGAAAGTGTGGGGAGCAAACAGG-}
{'max_gap_frequency': 1.0, 'min_conservation': 0.4, 'alignment': TabularMSA[DNA]
-----------------------------------------------------------------------
Stats:
    sequence count: 1263
    position count: 522
-----------------------------------------------------------------------
TGAGGAATATTGGTCAATGGAGGCAACTCTGAA ... GCTGAGGCTCGAAGGTGCGGGTATCGAACAGG-
TGAGGAATATTGGTCAATGGCCGGAAGGCTGAA ... GCTGAGGCACGAAAGTGCGGGGATCAAACAGG-
.

{'n_threads': 6, 'alignment': <q2_types.feature_data._format.AlignedDNAFASTAFormat object at 0x7fdc83ff5fd0>}
{'n_threads': 6, 'alignment': <q2_types.feature_data._format.AlignedDNAFASTAFormat object at 0x7fdc83ff5fd0>}
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: FastTreeMP -quote -nt /tmp/qiime2-archive-ji88t32h/80694e9d-fe44-4ef9-9611-b5d827b15b1b/data/aligned-dna-sequences.fasta



FastTree Version 2.1.10 Double precision (No SSE3), OpenMP (6 threads)
Alignment: /tmp/qiime2-archive-ji88t32h/80694e9d-fe44-4ef9-9611-b5d827b15b1b/data/aligned-dna-sequences.fasta
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
      0.14 seconds: Joined    100 of   1229


      0.33 seconds: Joined    300 of   1229
      0.44 seconds: Joined    400 of   1229


      0.57 seconds: Joined    600 of   1229
      0.74 seconds: Joined    800 of   1229


      0.89 seconds: Joined   1000 of   1229
      1.03 seconds: Joined   1200 of   1229
Initial topology in 1.04 seconds
Refining topology: 41 rounds ME-NNIs, 2 rounds ME-SPRs, 21 rounds ML-NNIs


      1.13 seconds: ME NNI round 3 of 41, 101 of 1230 splits, 1 changes (max delta 0.000)
      1.27 seconds: SPR round   1 of   2, 101 of 2462 nodes


      1.38 seconds: SPR round   1 of   2, 301 of 2462 nodes
      1.50 seconds: SPR round   1 of   2, 501 of 2462 nodes


      1.62 seconds: SPR round   1 of   2, 701 of 2462 nodes
      1.77 seconds: SPR round   1 of   2, 901 of 2462 nodes


      1.90 seconds: SPR round   1 of   2, 1101 of 2462 nodes
      2.04 seconds: SPR round   1 of   2, 1301 of 2462 nodes


      2.19 seconds: SPR round   1 of   2, 1501 of 2462 nodes
      2.34 seconds: SPR round   1 of   2, 1701 of 2462 nodes


      2.47 seconds: SPR round   1 of   2, 1901 of 2462 nodes
      2.59 seconds: SPR round   1 of   2, 2101 of 2462 nodes


      2.71 seconds: SPR round   1 of   2, 2301 of 2462 nodes
      2.81 seconds: ME NNI round 14 of 41, 801 of 1230 splits, 11 changes (max delta 0.005)


      2.96 seconds: SPR round   2 of   2, 201 of 2462 nodes
      3.12 seconds: SPR round   2 of   2, 501 of 2462 nodes


      3.24 seconds: SPR round   2 of   2, 701 of 2462 nodes
      3.36 seconds: SPR round   2 of   2, 901 of 2462 nodes


      3.47 seconds: SPR round   2 of   2, 1101 of 2462 nodes
      3.60 seconds: SPR round   2 of   2, 1301 of 2462 nodes


      3.73 seconds: SPR round   2 of   2, 1501 of 2462 nodes
      3.84 seconds: SPR round   2 of   2, 1701 of 2462 nodes


      3.94 seconds: SPR round   2 of   2, 1901 of 2462 nodes
      4.06 seconds: SPR round   2 of   2, 2101 of 2462 nodes


      4.20 seconds: SPR round   2 of   2, 2401 of 2462 nodes
      4.31 seconds: ME NNI round 28 of 41, 1101 of 1230 splits, 0 changes
Total branch-length 26.951 after 4.36 sec


      4.42 seconds: ML Lengths 201 of 1230 splits
      4.54 seconds: ML Lengths 601 of 1230 splits


      4.65 seconds: ML Lengths 1001 of 1230 splits
      4.80 seconds: ML NNI round 1 of 21, 101 of 1230 splits, 15 changes (max delta 1.486)


      4.96 seconds: ML NNI round 1 of 21, 301 of 1230 splits, 48 changes (max delta 8.254)
      5.11 seconds: ML NNI round 1 of 21, 501 of 1230 splits, 78 changes (max delta 13.854)


      5.28 seconds: ML NNI round 1 of 21, 701 of 1230 splits, 109 changes (max delta 13.854)
      5.45 seconds: ML NNI round 1 of 21, 901 of 1230 splits, 143 changes (max delta 13.854)


      5.61 seconds: ML NNI round 1 of 21, 1101 of 1230 splits, 176 changes (max delta 13.854)
ML-NNI round 1: LogLk = -70379.306 NNIs 199 max delta 13.85 Time 5.73
      5.75 seconds: Site likelihoods with rate category 1 of 20


      5.85 seconds: Site likelihoods with rate category 7 of 20
      5.96 seconds: Site likelihoods with rate category 13 of 20


      6.06 seconds: Site likelihoods with rate category 19 of 20
Switched to using 20 rate categories (CAT approximation)
Rate categories were divided by 1.163 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
      6.19 seconds: ML NNI round 2 of 21, 101 of 1230 splits, 8 changes (max delta 0.902)


      6.37 seconds: ML NNI round 2 of 21, 301 of 1230 splits, 27 changes (max delta 1.705)
      6.55 seconds: ML NNI round 2 of 21, 501 of 1230 splits, 36 changes (max delta 3.134)


      6.75 seconds: ML NNI round 2 of 21, 701 of 1230 splits, 64 changes (max delta 6.660)
      6.94 seconds: ML NNI round 2 of 21, 901 of 1230 splits, 86 changes (max delta 6.660)


      7.13 seconds: ML NNI round 2 of 21, 1101 of 1230 splits, 97 changes (max delta 6.660)
ML-NNI round 2: LogLk = -58430.970 NNIs 108 max delta 6.66 Time 7.27
      7.26 seconds: ML NNI round 3 of 21, 1 of 1230 splits


      7.44 seconds: ML NNI round 3 of 21, 201 of 1230 splits, 10 changes (max delta 1.542)
      7.62 seconds: ML NNI round 3 of 21, 401 of 1230 splits, 17 changes (max delta 3.596)


      7.81 seconds: ML NNI round 3 of 21, 601 of 1230 splits, 30 changes (max delta 3.596)
      8.00 seconds: ML NNI round 3 of 21, 801 of 1230 splits, 39 changes (max delta 13.883)


ML-NNI round 3: LogLk = -58383.141 NNIs 39 max delta 13.88 Time 8.02
      8.10 seconds: ML NNI round 4 of 21, 101 of 1230 splits, 2 changes (max delta 2.105)


      8.29 seconds: ML NNI round 4 of 21, 301 of 1230 splits, 4 changes (max delta 2.105)
      8.48 seconds: ML NNI round 4 of 21, 501 of 1230 splits, 11 changes (max delta 2.105)


ML-NNI round 4: LogLk = -58375.601 NNIs 11 max delta 2.11 Time 8.50
      8.59 seconds: ML NNI round 5 of 21, 101 of 1230 splits, 0 changes


ML-NNI round 5: LogLk = -58361.931 NNIs 7 max delta 4.71 Time 8.73
      8.73 seconds: ML NNI round 6 of 21, 1 of 1230 splits
ML-NNI round 6: LogLk = -58359.787 NNIs 1 max delta 0.00 Time 8.85
Turning off heuristics for final round of ML NNIs (converged)
      8.84 seconds: ML NNI round 7 of 21, 1 of 1230 splits


      9.02 seconds: ML NNI round 7 of 21, 201 of 1230 splits, 1 changes (max delta 0.000)
      9.19 seconds: ML NNI round 7 of 21, 401 of 1230 splits, 1 changes (max delta 0.000)


      9.36 seconds: ML NNI round 7 of 21, 601 of 1230 splits, 1 changes (max delta 0.000)
      9.55 seconds: ML NNI round 7 of 21, 801 of 1230 splits, 5 changes (max delta 0.674)


      9.73 seconds: ML NNI round 7 of 21, 1001 of 1230 splits, 6 changes (max delta 0.674)
      9.91 seconds: ML NNI round 7 of 21, 1201 of 1230 splits, 6 changes (max delta 0.674)


ML-NNI round 7: LogLk = -58355.983 NNIs 6 max delta 0.67 Time 9.95 (final)
     10.01 seconds: ML Lengths 301 of 1230 splits
     10.13 seconds: ML Lengths 801 of 1230 splits


Optimize all lengths: LogLk = -58355.571 Time 10.24
     10.39 seconds: ML split tests for    100 of   1229 internal splits


     10.53 seconds: ML split tests for    200 of   1229 internal splits
     10.67 seconds: ML split tests for    300 of   1229 internal splits


     10.81 seconds: ML split tests for    400 of   1229 internal splits
     10.95 seconds: ML split tests for    500 of   1229 internal splits


     11.10 seconds: ML split tests for    600 of   1229 internal splits
     11.24 seconds: ML split tests for    700 of   1229 internal splits


     11.38 seconds: ML split tests for    800 of   1229 internal splits
     11.52 seconds: ML split tests for    900 of   1229 internal splits


     11.66 seconds: ML split tests for   1000 of   1229 internal splits
     11.79 seconds: ML split tests for   1100 of   1229 internal splits


     11.93 seconds: ML split tests for   1200 of   1229 internal splits
Total time: 11.98 seconds Unique: 1232/1263 Bad splits: 0/1229


{'tree': <TreeNode, name: unnamed, internal node count: 1258, tips count: 1263>}
{'tree': <TreeNode, name: unnamed, internal node count: 1258, tips count: 1263>}


### Compute Alpha Diversity (Phylogeny)
- [diversity alpha_phylogenetic](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-phylogenetic/): Computes a user-specified phylogenetic alpha diversity metric for all samples in a feature table.
- Metrics: Choices ('faith_pd')

In [11]:
metrics = ('faith_pd', )
alpha_diversities_phylogenetic = dict()
for metric in metrics:
    print(f"Calculating alpha diversity: {metric}")
    try:
        alpha_diversity = alpha_phylogenetic(table=tabs, phylogeny=mafft_rooted_tree, metric=metric).alpha_diversity
        alpha_diversities_phylogenetic[metric] = alpha_diversity
        # Save Artifact
        file_path = os.path.join(alpha_path, f'alpha-phylogeny-{metric}.qza')
        alpha_diversity.save(file_path)
        print(f"DONE: Calculating alpha phylogeny: {metric}")
    except Exception as e:
        print(f"ERROR: Calculating alpha phylogeny: {metric}")

Calculating alpha diversity: faith_pd
{'metric': 'faith_pd', 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>, 'phylogeny': <artifact: Phylogeny[Rooted] uuid: 78c63eca-a413-45c9-a27c-195b681547e0>}
{'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc8429aa30>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc8363bf70>}
{'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc8429aa30>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc8363bf70>}
DONE: Calculating alpha phylogeny: faith_pd


- [core-metrics-phylogenetic](https://docs.qiime2.org/2023.7/plugins/available/diversity/core-metrics-phylogenetic/)

In [12]:
s_depth = int(tabs.view(pd.DataFrame).sum(axis=1).min())
results = core_metrics_phylogenetic(
    table = tabs,
    phylogeny = mafft_rooted_tree,
    sampling_depth = s_depth,
    metadata = metadata_qa,
    n_jobs_or_threads = 6,
)

{'sampling_depth': 3381, 'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'n_jobs_or_threads': 6, 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>, 'phylogeny': <artifact: Phylogeny[Rooted] uuid: 78c63eca-a413-45c9-a27c-195b681547e0>}
{'sampling_depth': 3381, 'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'with_replacement': False, 'n_jobs': 6, 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'sampling_depth': 3381, 'with_replacement': False, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero ent



{'n_jobs': 6, 'table': 1029 x 14 <class 'biom.table.Table'> with 2269 nonzero entries (15% dense)}
{'n_jobs': 6, 'table': 1029 x 14 <class 'biom.table.Table'> with 2269 nonzero entries (15% dense)}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc81dbafa0>}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc81dbafa0>}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83668820>}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83668820>}
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'custom_axes': None, 'ignore_missing_samples': False, 'ignore_pcoa_fea

{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'custom_axes': None, 'ignore_missing_samples': False, 'ignore_pcoa_features': False, 'pcoa': <skbio.stats.ordination._ordination_results.OrdinationResults object at 0x7fdc81dd6580>}


{'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc8366bd90>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc8366bdc0>}
{'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc8366bd90>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc8366bdc0>}
{'threads': 6, 'bypass_tips': False, 'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc836561f0>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc8366b9d0>}
{'threads': 6, 'bypass_tips': False, 'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc836561f0>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc8366b9d0>}
{'threads': 6, 'bypass_tips': False, 'table': <q2_types.feature_table._format.BIOMV210Format object at 0x7fdc81dba4f0>, 'phylogeny': <q2_types.tree._format.NewickFormat object at 0x7fdc81dbaeb0>}
{'threads': 6, 'bypass_tips': False, 'table': <q2_types.feature_table._format.BIOMV210Format

{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc8366b670>}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc8366b670>}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc81dd60a0>}
{'number_of_dimensions': None, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc81dd60a0>}
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'custom_axes': None, 'ignore_missing_samples': False, 'ignore_pcoa_features': False, 'pcoa': <skbio.stats.ordination._ordination_results.OrdinationResults object at 0x7fdc83619b80>}
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='

  warn(


In [13]:
results_info = [("rarefied_table", "FeatureTable[Frequency]", "The resulting rarefied feature table."),
("faith_pd_vector", "SampleData[AlphaDiversity]", "Vector of Faith PD values by sample."),
("observed_features_vector", "SampleData[AlphaDiversity]", "Vector of Observed Features values by sample."),
("shannon_vector", "SampleData[AlphaDiversity]", "Vector of Shannon diversity values by sample."),
("evenness_vector", "SampleData[AlphaDiversity]", "Vector of Pielou's evenness values by sample."),
("unweighted_unifrac_distance_matrix", "DistanceMatrix", "Matrix of unweighted UniFrac distances between pairs of samples."),
("weighted_unifrac_distance_matrix", "DistanceMatrix", "Matrix of weighted UniFrac distances between pairs of samples."),
("jaccard_distance_matrix", "DistanceMatrix", "Matrix of Jaccard distances between pairs of samples."),
("bray_curtis_distance_matrix", "DistanceMatrix", "Matrix of Bray-Curtis distances between pairs of samples."),
("unweighted_unifrac_pcoa_results", "PCoAResults", "PCoA matrix computed from unweighted UniFrac distances between samples."),
("weighted_unifrac_pcoa_results", "PCoAResults", "PCoA matrix computed from weighted UniFrac distances between samples."),
("jaccard_pcoa_results", "PCoAResults", "PCoA matrix computed from Jaccard distances between samples."),
("bray_curtis_pcoa_results", "PCoAResults", "PCoA matrix computed from Bray-Curtis distances between samples."),
("unweighted_unifrac_emperor", "Visualization", "Emperor plot of the PCoA matrix computed from unweighted UniFrac."),
("weighted_unifrac_emperor", "Visualization", "Emperor plot of the PCoA matrix computed from weighted UniFrac."),
("jaccard_emperor", "Visualization", "Emperor plot of the PCoA matrix computed from Jaccard."),
("bray_curtis_emperor", "Visualization", "Emperor plot of the PCoA matrix computed from Bray-Curtis.")]

In [14]:
distance_matrix = dict()
for i, info in enumerate(results_info):
    r_id, r_type, r_desc = info
    #print(i, r_id, r_type)
    file_name = f"{r_id}.qzv"
    if r_type == "FeatureTable[Frequency]":
        pass
    elif r_type == "DistanceMatrix":
        distance_matrix[r_id] = results[i]
    elif r_id.endswith('emperor'):
        print(i, r_id, r_type)
        print(f"--- {r_desc} ---")
        file_name = os.path.join(beta_path, file_name)
        print(f'Saving emperor file at: {file_name}\n')
        results[i].save(filepath=file_name)

13 unweighted_unifrac_emperor Visualization
--- Emperor plot of the PCoA matrix computed from unweighted UniFrac. ---
Saving emperor file at: /home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/experiments/ana-flavia-STD-NCxSTD-NR-trim/qiime-artifacts/beta-analysis/unweighted_unifrac_emperor.qzv

14 weighted_unifrac_emperor Visualization
--- Emperor plot of the PCoA matrix computed from weighted UniFrac. ---
Saving emperor file at: /home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/experiments/ana-flavia-STD-NCxSTD-NR-trim/qiime-artifacts/beta-analysis/weighted_unifrac_emperor.qzv



15 jaccard_emperor Visualization
--- Emperor plot of the PCoA matrix computed from Jaccard. ---
Saving emperor file at: /home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/experiments/ana-flavia-STD-NCxSTD-NR-trim/qiime-artifacts/beta-analysis/jaccard_emperor.qzv

16 bray_curtis_emperor Visualization
--- Emperor plot of the PCoA matrix computed from Bray-Curtis. ---
Saving emperor file at: /home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/experiments/ana-flavia-STD-NCxSTD-NR-trim/qiime-artifacts/beta-analysis/bray_curtis_emperor.qzv



### Alpha diversity correlation

This method only process `numeric` columns.


In [15]:
methods = ('spearman', 'pearson')
numerics_cols = metadata_qa.filter_columns(column_type='numeric')
if numerics_cols.column_count > 0:
    for metric, alpha_values in alpha_diversities.items():
        for method in methods:
            try:
                corr_view = alpha_correlation(alpha_diversity=alpha_values, metadata=numerics_cols, 
                                          method=method, intersect_ids=True).visualization
                view_path = os.path.join(alpha_path, f'alpha-correlation-{metric}-{method}.qzv')
                corr_view.save(view_path)
                corr_view
                print(f"DONE: Calculating alpha correlation: {metric} {method}")
            except Exception as e:
                print(f"ERROR: Calculating alpha correlation: {metric} {method}")

## Alpha diversity comparisons

Visually and statistically compare groups of alpha diversity values.

[diversity alpha_group_significance](https://docs.qiime2.org/2022.8/plugins/available/diversity/alpha-group-significance/)

In [16]:
for metric, alpha_values in alpha_diversities.items():
    print(f"Processing alpha_group_significance: {metric}")
    try:
        significance_view = alpha_group_significance(alpha_diversity=alpha_values, metadata=metadata_qa).visualization
        view_path = os.path.join(alpha_path, f'alpha-group-significance-{metric}.qzv')
        significance_view.save(view_path)
        significance_view
        print(f"DONE: Calculating alpha group significance: {metric}")
    except Exception as e:
        print(f"ERROR: Calculating alpha group significance: {metric}")

Processing alpha_group_significance: chao1
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'alpha_diversity': S210421121682    291.0
S210421121683    242.0
S210421121684    207.0
S210421121685    314.0
S210421121686    121.0
S210421121687    137.0
S210421121694    149.0
S210421121695    320.0
S210421121696    187.0
S210421121697    133.0
S210421121703     97.0
S210421121704     34.0
S210421121705    211.0
S210421121706    374.0
Name: chao1, dtype: float64}
DONE: Calculating alpha group significance: chao1
Processing alpha_group_significance: chao1_ci
ERROR: Calculating alpha group significance: chao1_ci
Processing alpha_group_significance: observed_features
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(

DONE: Calculating alpha group significance: observed_features
Processing alpha_group_significance: shannon
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'alpha_diversity': S210421121682    6.126624
S210421121683    6.103749
S210421121684    4.956948
S210421121685    6.145411
S210421121686    5.516588
S210421121687    5.179013
S210421121694    5.689236
S210421121695    6.177949
S210421121696    5.394384
S210421121697    5.432786
S210421121703    5.200061
S210421121704    3.786450
S210421121705    5.919258
S210421121706    6.381630
Name: shannon_entropy, dtype: float64}
DONE: Calculating alpha group significance: shannon
Processing alpha_group_significance: simpson
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnPro

DONE: Calculating alpha group significance: simpson
Processing alpha_group_significance: simpson_e
{'metadata': Metadata
--------
14 IDs x 3 columns
sample-name: ColumnProperties(type='categorical')
group-id:    ColumnProperties(type='categorical')
group-desc:  ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation., 'alpha_diversity': S210421121682    0.116034
S210421121683    0.153302
S210421121684    0.062924
S210421121685    0.097092
S210421121686    0.209445
S210421121687    0.113325
S210421121694    0.152243
S210421121695    0.102757
S210421121696    0.106649
S210421121697    0.180282
S210421121703    0.190735
S210421121704    0.209658
S210421121705    0.146909
S210421121706    0.088207
Name: simpson_e, dtype: float64}
DONE: Calculating alpha group significance: simpson_e


## Beta diversity analysis

#### Reference
- [diversity beta](https://docs.qiime2.org/2022.8/plugins/available/diversity/beta/): Computes a user-specified beta diversity metric for all pairs of samples in a feature table.
- [Beta diversity metrics](http://scikit-bio.org/docs/0.2.0/generated/skbio.diversity.beta.html)

- Metric Choices('aitchison', 'braycurtis', 'canberra', 'canberra_adkins', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule')

In [17]:
metrics = ('aitchison', 'braycurtis', 'canberra', 'canberra_adkins', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule')
metrics = ('euclidean', 'dice', 'braycurtis', 'correlation', 'cosine', 'matching', 'jaccard')
beta_diversities = dict()
for metric in metrics:
    print(f"Calculating beta diversity: {metric}")
    try:
        beta_diversity = beta(table=tabs, metric=metric, n_jobs=6, pseudocount=1).distance_matrix
        beta_diversities[metric] = beta_diversity
        # Save SampleData[BetaDiversity] Artifact
        file_path = os.path.join(beta_path, f'beta-values-{metric}.qza')
        beta_diversity.save(file_path)
        print(f"DONE: Calculating beta diversity: {metric}")
    except Exception as e:
        print(f"ERROR: Calculating beta diversity: {metric}")

Calculating beta diversity: euclidean
{'metric': 'euclidean', 'pseudocount': 1, 'n_jobs': 6, 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'euclidean', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'euclidean', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
DONE: Calculating beta diversity: euclidean
Calculating beta diversity: dice
{'metric': 'dice', 'pseudocount': 1, 'n_jobs': 6, 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'dice', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'dice', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
DONE: Calculating beta diversity: dice
Calculating b



DONE: Calculating beta diversity: braycurtis
Calculating beta diversity: correlation
{'metric': 'correlation', 'pseudocount': 1, 'n_jobs': 6, 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'correlation', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'correlation', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
ERROR: Calculating beta diversity: correlation
Calculating beta diversity: cosine
{'metric': 'cosine', 'pseudocount': 1, 'n_jobs': 6, 'table': <artifact: FeatureTable[Frequency] uuid: 58f5ac0d-b5f1-476a-b73a-80b954569b0c>}
{'metric': 'cosine', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (15% dense)}
{'metric': 'cosine', 'pseudocount': 1, 'n_jobs': 6, 'table': 1263 x 14 <class 'biom.table.Table'> with 2817 nonzero entries (

DONE: Calculating beta diversity: jaccard




### Beta group significance

- [diversity beta_group_significance](https://docs.qiime2.org/2022.8/plugins/available/diversity/beta-group-significance/): Determine whether groups of samples are significantly different from one another using a permutation-based statistical test.
- Marti J Anderson. A new method for non-parametric multivariate analysis of variance. Austral ecology, 26(1):32–46, 2001. doi:https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x.

In [18]:
methods = ('permanova', 'anosim', 'permdisp')
for method in methods:
    for metric, beta_diversity in beta_diversities.items():
        print(f'Calculating beta group significance with method {method} and metric {metric}')
        try:
            beta_view = beta_group_significance(distance_matrix=beta_diversity, 
                                                metadata=metadata_qa.get_column(class_col), 
                                                pairwise=True, method=method).visualization
            view_name = os.path.join(beta_path, f'beta-group-significance-{metric}-{method}.qzv')
            beta_view.save(view_name)
            print(f"DONE: Calculating beta group significance: {method} {metric}")
        except Exception as e:
            print(f"ERROR: Calculating beta group significance: {method} {metric}")

Calculating beta group significance with method permanova and metric euclidean
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc804026d0>}


DONE: Calculating beta group significance: permanova euclidean
Calculating beta group significance with method permanova and metric dice
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc84098ac0>}


DONE: Calculating beta group significance: permanova dice
Calculating beta group significance with method permanova and metric braycurtis
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83a5a400>}


DONE: Calculating beta group significance: permanova braycurtis
Calculating beta group significance with method permanova and metric matching
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83a592e0>}


DONE: Calculating beta group significance: permanova matching
Calculating beta group significance with method permanova and metric jaccard
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc84149d60>}


DONE: Calculating beta group significance: permanova jaccard
Calculating beta group significance with method anosim and metric euclidean
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc84149820>}


DONE: Calculating beta group significance: anosim euclidean
Calculating beta group significance with method anosim and metric dice
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83921bb0>}


DONE: Calculating beta group significance: anosim dice
Calculating beta group significance with method anosim and metric braycurtis
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83fa0640>}


DONE: Calculating beta group significance: anosim braycurtis
Calculating beta group significance with method anosim and metric matching
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83776ee0>}


DONE: Calculating beta group significance: anosim matching
Calculating beta group significance with method anosim and metric jaccard
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc849624f0>}


DONE: Calculating beta group significance: anosim jaccard
Calculating beta group significance with method permdisp and metric euclidean
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83971c40>}


  warn(


  warn(


DONE: Calculating beta group significance: permdisp euclidean
Calculating beta group significance with method permdisp and metric dice
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83ca84c0>}


DONE: Calculating beta group significance: permdisp dice
Calculating beta group significance with method permdisp and metric braycurtis
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc837a0d60>}


DONE: Calculating beta group significance: permdisp braycurtis
Calculating beta group significance with method permdisp and metric matching
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83e260a0>}


  warn(


  warn(


DONE: Calculating beta group significance: permdisp matching
Calculating beta group significance with method permdisp and metric jaccard
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83c5b910>}


DONE: Calculating beta group significance: permdisp jaccard


<Figure size 432x288 with 0 Axes>

In [19]:
# Expand tests using UNIFRAC metrics
methods = ('permanova', 'anosim', 'permdisp')
for method in methods:
    for metric, beta_diversity in distance_matrix.items():
        print(f'Calculating beta group significance with method {method} and metric {metric}')
        try:
            beta_view = beta_group_significance(distance_matrix=beta_diversity, 
                                                metadata=metadata_qa.get_column(class_col), 
                                                pairwise=True, method=method).visualization
            view_name = os.path.join(beta_path, f'beta-group-significance-{metric}-{method}.qzv')
            beta_view.save(view_name)
            print(f"DONE: Calculating beta group significance: {method} {metric}")
        except Exception as e:
            print(f"ERROR: Calculating beta group significance: {method} {metric}")

Calculating beta group significance with method permanova and metric unweighted_unifrac_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83b68d90>}


DONE: Calculating beta group significance: permanova unweighted_unifrac_distance_matrix
Calculating beta group significance with method permanova and metric weighted_unifrac_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83ff5d60>}


DONE: Calculating beta group significance: permanova weighted_unifrac_distance_matrix
Calculating beta group significance with method permanova and metric jaccard_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83b3dbe0>}


DONE: Calculating beta group significance: permanova jaccard_distance_matrix
Calculating beta group significance with method permanova and metric bray_curtis_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permanova', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83abcca0>}


DONE: Calculating beta group significance: permanova bray_curtis_distance_matrix
Calculating beta group significance with method anosim and metric unweighted_unifrac_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc8006b910>}


DONE: Calculating beta group significance: anosim unweighted_unifrac_distance_matrix
Calculating beta group significance with method anosim and metric weighted_unifrac_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc8420ae80>}


DONE: Calculating beta group significance: anosim weighted_unifrac_distance_matrix
Calculating beta group significance with method anosim and metric jaccard_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc8004cc40>}


DONE: Calculating beta group significance: anosim jaccard_distance_matrix
Calculating beta group significance with method anosim and metric bray_curtis_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'anosim', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83ad3550>}


DONE: Calculating beta group significance: anosim bray_curtis_distance_matrix
Calculating beta group significance with method permdisp and metric unweighted_unifrac_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc840cba30>}


DONE: Calculating beta group significance: permdisp unweighted_unifrac_distance_matrix
Calculating beta group significance with method permdisp and metric weighted_unifrac_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc846e8fd0>}


  warn(


  warn(


DONE: Calculating beta group significance: permdisp weighted_unifrac_distance_matrix
Calculating beta group significance with method permdisp and metric jaccard_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83654b80>}


DONE: Calculating beta group significance: permdisp jaccard_distance_matrix
Calculating beta group significance with method permdisp and metric bray_curtis_distance_matrix
{'metadata': <CategoricalMetadataColumn name='group-id' id_count=14>, 'method': 'permdisp', 'pairwise': True, 'permutations': 999, 'distance_matrix': <skbio.stats.distance._base.DistanceMatrix object at 0x7fdc83b3d7f0>}


DONE: Calculating beta group significance: permdisp bray_curtis_distance_matrix


<Figure size 432x288 with 0 Axes>

### Beta group Rarefaction

- [diversity beta_rarefaction](https://docs.qiime2.org/2022.8/plugins/available/diversity/beta-rarefaction/): Repeatedly rarefy a feature table to compare beta diversity results within a given rarefaction depth.  For a given beta diversity metric, this visualizer will provide: an Emperor jackknifed PCoA plot, samples clustered by UPGMA or neighbor joining with support calculation, and a heatmap showing the correlation between rarefaction trials of that beta diversity metric.