### STEP : Diversity Analysis



#### Example

- [“Moving Pictures” tutorial](https://docs.qiime2.org/2022.8/tutorials/moving-pictures-usage/?highlight=ancom)
- [Biostats ANCOM](http://scikit-bio.org/docs/0.4.2/generated/generated/skbio.stats.composition.ancom.html)


#### Methods
- [composition](https://docs.qiime2.org/2022.8/plugins/available/composition/)
- [composition add_pseudocount](https://docs.qiime2.org/2022.8/plugins/available/composition/add-pseudocount/): Increment all counts in table by pseudocount.
- [composition ancom](https://docs.qiime2.org/2022.8/plugins/available/composition/ancom/): Apply ANCOM to identify features that differ in abundance.

## Setup and settings

In [1]:
# Importing packages
import os
import pandas as pd
from qiime2 import Artifact
from qiime2 import Visualization
from qiime2 import Metadata

from qiime2.plugins.composition.visualizers import ancom
from qiime2.plugins.composition.methods import add_pseudocount

%matplotlib inline

### Receiving the parameters

The following cell can receive parameters using the [papermill](https://papermill.readthedocs.io/en/latest/) tool.

In [2]:
metadata_file = '/home/lauro/nupeb/rede-micro/redemicro-miliane-nutri/data/raw/metadata/miliane-metadata-CxAC.tsv'
base_dir = os.path.join('/', 'home', 'lauro', 'nupeb', 'rede-micro', 'redemicro-miliane-nutri')
experiment_name = 'miliane-CxAC-trim'
class_col = 'group-id'
replace_files = False

In [3]:
# Parameters
experiment_name = "ana-flavia-STD-NRxHSD-NR-trim"
base_dir = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri"
manifest_file = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/data/raw/manifest/manifest-ana-flavia-STD-NRxHSD-NR.csv"
metadata_file = "/home/lauro/nupeb/rede-micro/redemicro-ana-flavia-nutri/data/raw/metadata/metadata-ana-flavia-STD-NRxHSD-NR.tsv"
class_col = "group-id"
classifier_file = "/home/lauro/nupeb/rede-micro/models/silva-138-99-nb-classifier.qza"
top_n = 20
replace_files = False
phred = 20
trunc_f = 0
trunc_r = 0
overlap = 12
threads = 6
trim = {
    "overlap": 8,
    "forward_primer": "CCTACGGGRSGCAGCAG",
    "reverse_primer": "GGACTACHVGGGTWTCTAAT",
}


In [4]:
experiment_folder = os.path.abspath(os.path.join(base_dir, 'experiments', experiment_name))
img_folder = os.path.abspath(os.path.join(experiment_folder, 'imgs'))

### Defining names, paths and flags

In [5]:
# QIIME2 Artifacts folder
qiime_folder = os.path.join(experiment_folder, 'qiime-artifacts')

# Input - DADA2 Artifacts
dada2_tabs_path = os.path.join(qiime_folder, 'dada2-tabs.qza')

## Step execution

### Load input files

This Step import the QIIME2 `FeatureTable[Frequency]` Artifact and the `Metadata` file.

In [6]:
#Load Metadata
metadata_qa = Metadata.load(metadata_file)

#Load FeatureTable[Frequency]
tabs = Artifact.load(dada2_tabs_path)
tabs_df = tabs.view(Metadata).to_dataframe().T

# ANCOM

Apply Analysis of Composition of Microbiomes (ANCOM) to identify features
that are differentially abundant across groups.

- [composition add_pseudocount](https://docs.qiime2.org/2022.8/plugins/available/composition/add-pseudocount/): Increment all counts in table by pseudocount.
- [composition ancom](https://docs.qiime2.org/2022.8/plugins/available/composition/ancom/): Apply ANCOM to identify features that differ in abundance.

In [7]:
# Select class column
column = metadata_qa.get_column(class_col)

In [8]:
# Create the pseudocount table
composition_tab = add_pseudocount(table = tabs).composition_table

In [9]:
# Create visualizations for each parameter combination for the ANCOM  method.
transform_functions = ('sqrt', 'log', 'clr')
difference_functions = ('mean_difference', 'f_statistic')
for t in transform_functions:
    for d in difference_functions:
        print(f"Calculating ANCOM with: {t} {d}")
        try:
            ancom_viz = ancom(
                table = composition_tab, 
                metadata = column, 
                transform_function = t, 
                difference_function = d
            ).visualization
            view_name = os.path.join(qiime_folder, f'ancom-{t}-{d}.qzv')
            ancom_viz.save(view_name)
        except Exception as e:
            print(f"ERROR: Calculating ANCOM with: {t} {d}")
            print(e)

Calculating ANCOM with: sqrt mean_difference
ERROR: Calculating ANCOM with: sqrt mean_difference
The following IDs are not present in the metadata: 'S210421121673', 'S210421121674', 'S210421121675', 'S210421121676', 'S210421121677', 'S210421121678', 'S210421121679', 'S210421121680', 'S210421121681', 'S210421121682', 'S210421121683', 'S210421121684', 'S210421121688', 'S210421121689', 'S210421121690', 'S210421121694', 'S210421121695', 'S210421121696', 'S210421121697', 'S210421121698', 'S210421121699', 'S210421121700', 'S210421121701', 'S210421121702', 'S210421121711', 'S210421121712', 'S210421121713', 'S210421121714', 'S210421121715', 'S210421121716', 'S210421121717', 'S210421121718', 'S210421121719', 'S210421121720', 'S210421121721', 'S210421121722', 'S210421121723', 'S210421121724', 'S210421121725', 'S210421121726', 'S210421121727', 'S210421121728', 'S210421121729', 'S210421121730', 'S210421121731', 'S210421121732', 'S210421121733', 'S210421121734', 'S210707163906', 'S210707163907', 'S

ERROR: Calculating ANCOM with: clr f_statistic
The following IDs are not present in the metadata: 'S210421121673', 'S210421121674', 'S210421121675', 'S210421121676', 'S210421121677', 'S210421121678', 'S210421121679', 'S210421121680', 'S210421121681', 'S210421121682', 'S210421121683', 'S210421121684', 'S210421121688', 'S210421121689', 'S210421121690', 'S210421121694', 'S210421121695', 'S210421121696', 'S210421121697', 'S210421121698', 'S210421121699', 'S210421121700', 'S210421121701', 'S210421121702', 'S210421121711', 'S210421121712', 'S210421121713', 'S210421121714', 'S210421121715', 'S210421121716', 'S210421121717', 'S210421121718', 'S210421121719', 'S210421121720', 'S210421121721', 'S210421121722', 'S210421121723', 'S210421121724', 'S210421121725', 'S210421121726', 'S210421121727', 'S210421121728', 'S210421121729', 'S210421121730', 'S210421121731', 'S210421121732', 'S210421121733', 'S210421121734', 'S210707163906', 'S210707163907', 'S210707163908', 'S210707163909', 'S210707163910', '